Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis jailhouse repo with PaSta #42

Closed
rsarky opened this issue Mar 17, 2020 · 19 comments
Closed

Analysis jailhouse repo with PaSta #42

rsarky opened this issue Mar 17, 2020 · 19 comments

Comments

@rsarky
Copy link
Contributor

rsarky commented Mar 17, 2020

I wanted to analyse the jailhouse repo as it is much smaller than the other repos.
I was getting stuck at some places so needed some help in getting the setup done.
What I have done till now:

  • Ran ./pasta select jailhouse
  • Ran ./pasta sync
    The above two commands run successfully.
    My next step was to run ./pasta sync -mbox
    The output of this command is as follows:
2020-03-17 18:52:12,674 PaStA           INFO     Cmdline: ./pasta sync -mbox
2020-03-17 18:52:12,675 pypasta.Config  INFO     Active configuration: jailhouse
2020-03-17 18:52:12,680 Repository.Mbox INFO     Loading mailbox subsystem
2020-03-17 18:52:12,681 Repository.Mbox INFO       ↪ loaded invalid mail index: found 0 invalid mails
2020-03-17 18:52:12,682 pypasta.Config  INFO     Renewing upstream commit hash file
2020-03-17 18:52:12,728 pypasta.Config  INFO       ↪ done
2020-03-17 18:52:12,729 Repository.Mbox INFO     Loading mailbox subsystem
2020-03-17 18:52:12,729 Repository.Mbox INFO       ↪ loaded invalid mail index: found 0 invalid mails
2020-03-17 18:52:12,730 tory.Repository INFO     Loading upstream commit cache
2020-03-17 18:52:12,923 tory.Repository INFO       ↪ Loaded 2706 commits from cache file
2020-03-17 18:52:12,926 tory.Repository INFO     Writing 2706 commits to cache file
2020-03-17 18:52:13,326 tory.Repository INFO     Loading mbox commit cache
2020-03-17 18:52:13,326 tory.Repository INFO       ↪ Warning, commit cache file /home/rohit/Projects/PaStA/resources/jailhouse/resources/commit_cache_mbox.pkl not found!
2020-03-17 18:52:13,326 tory.MailThread WARNING  MailThread cache not existing
2020-03-17 18:52:13,326 tory.MailThread INFO     Updating mail thread cache
2020-03-17 18:52:13,326 tory.MailThread INFO     Cache is already up to date
2020-03-17 18:52:13,327 PaStA           INFO     Shutting down

However I think I havent set up my mbox properly. Can someone guide me with this?
Thanks!

@rralf
Copy link
Member

rralf commented Mar 17, 2020

Generally, for development that's an excellent idea to work on small datasets. The problem is, we lack public inboxes from Jailhouse. I could give you my local mbox, but it's easier for you to choose Linux.

Choose Linux, edit the config in resources/linux/config and simply deactivate all huge lists. Just leave one small list activated. Let's say the alsa mailing list.

Then, choose a timewindow of one month, and reduce the amount of commits to roughly the same month. This makes things manageable on a desktop machine.

@rsarky
Copy link
Contributor Author

rsarky commented Mar 17, 2020

Choose Linux, edit the config in resources/linux/config and simply deactivate all huge lists. Just leave one small list activated. Let's say the alsa mailing list.

This does help. I was succesfully able to run pasta analyse rep on the repository.
Although pasta analyse upstream tries to cache around ~83k commits which hangs my system :(

Also as a side:
Since I already had a local clone of the linux clone on my system instead of running git submodule update linux I generated a symlink called repo and pointed it to my local clone of the linux repository

@rralf
Copy link
Member

rralf commented Mar 17, 2020

Ok, we can fix that.

Try in your config:
UPSTREAM = "v5.5-rc6..origin/master"
[...]
[mbox]
MINDATE = 2020-02-01
MAXDATE = 2020-03-01

and only activate the alsa-devel ML.

Thanks

@rsarky
Copy link
Contributor Author

rsarky commented Mar 17, 2020

Hmm in that case I guess I should also set the date filter such that it sort of overlaps the commit range specified in UPSTREAM

Oh I notice you have already mentioned this in your comment. Nevermind.

@rralf
Copy link
Member

rralf commented Mar 17, 2020

Yep, exactly, see at the UPSTREAM range above. That roughly matches. Should be sufficient for playing around with PaStA.

@rsarky
Copy link
Contributor Author

rsarky commented Mar 17, 2020

Hmm, the given commit range and date filter did give me some output. But couldn't get any mappings between patches and commits. Need to play around with the above 2 parameters I guess.
Closing this thread.

@rsarky rsarky closed this as completed Mar 17, 2020
@rralf
Copy link
Member

rralf commented Mar 17, 2020

Did not give you any mappings? That's strange. I had this configuration at a democase today, and we saw at least some (i guess it was 70 or so) mappings. Did you recreate the caches?

$ ./pasta sync -clear all
$./pasta sync -mbox -create all

@rralf
Copy link
Member

rralf commented Mar 17, 2020

Ah, another tip:
$ cd resources
$ git checkout master
$ git submodule update

Maybe you're running on a too old state of the resources.

@rsarky
Copy link
Contributor Author

rsarky commented Mar 18, 2020

$ git submodule update

This was taking an immense amount of time for me over a choppy network so I instead decided to use a local linux repo clone . That shouldn't be an issue I guess?

@rsarky
Copy link
Contributor Author

rsarky commented Mar 18, 2020

Did not give you any mappings? That's strange. I had this configuration at a democase today, and we saw at least some (i guess it was 70 or so) mappings. Did you recreate the caches?

$ ./pasta sync -clear all
$./pasta sync -mbox -create all

This didn't help,
My output file basically shows all patch equivalence classes.
Followed by all the upstream commits.
There are no mappings.
The only reason for this that I can see is me using a local clone instead of running git submodule update. Will try with that I guess

@rralf
Copy link
Member

rralf commented Mar 18, 2020

No, that should not be a problem.

so you did run, in this order:

  • analyse rep
  • rate
  • analyse upstream
  • rate
    ?

@rralf
Copy link
Member

rralf commented Mar 18, 2020

Submodules are only used to have everything tied together. You can use local checkouts as well.

@rsarky
Copy link
Contributor Author

rsarky commented Mar 18, 2020

No, that should not be a problem.

so you did run, in this order:

  • analyse rep
  • rate
  • analyse upstream
  • rate
    ?

Yup the same order.
I also increased the mailbox span to 3 months.

@rralf
Copy link
Member

rralf commented Mar 18, 2020

Okay that's really strange. Please find my config here: http://vmexit.de/~ralf/config

Try to delete all caches (e.g.: rm resources/linux/resources/*pkl rm resources/linux/mbox-result), copy over the config and try:

$ pasta sync -mbox -create all
$ pasta analyse rep
$ pasta rate
$ pasta analyse upstream
$ pasta rate

Here on my machine, this gives me 84 mappings against upstream with default thresholds in that time window.

@rsarky
Copy link
Contributor Author

rsarky commented Mar 18, 2020

Will try this right now. Thanks for being so patient

@rsarky
Copy link
Contributor Author

rsarky commented Mar 18, 2020

Got some mappings now!! ✨

One thing I noticed was when I deleted the mbox-result file and then it got recreated it had a significantly lesser number of lines.

This makes me wonder whether on rerunning analysis on a mailbox do we append to the mbox-result file instead of rewriting it?

@rralf
Copy link
Member

rralf commented Mar 18, 2020

Aah, I might know what went wrong:

Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.

So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.

The best thing is to start with a clean mbox-result after committing changes to the config.

@rsarky
Copy link
Contributor Author

rsarky commented Mar 19, 2020

Aah, I might know what went wrong:

Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.

You mean analyse upstream right?

So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.

The best thing is to start with a clean mbox-result after committing changes to the config.

We could add a flag to clean mbox-result instead of doing so manually.

@rralf
Copy link
Member

rralf commented Mar 19, 2020

Aah, I might know what went wrong:
Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.

You mean analyse upstream right?

Yes, sorry, mixed it up.

So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.
The best thing is to start with a clean mbox-result after committing changes to the config.

We could add a flag to clean mbox-result instead of doing so manually.

Hmm. I'd rather abort in that case and ask the user for manual intervention. But wait, we actually already do: https://github.com/lfd/PaStA/blob/master/bin/pasta_analyse.py#L190

Did you see that warning during your analysis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants