Compare patch recipients vs. `get_maintainer.pl` recommendation #21

bulwahn · 2019-05-09T04:12:45Z

Motivation:
If we find systematic shortcomings between to whom the patches are usually sent to and get_maintainer.pl recommendation, we can improve get_maintainer.pl in this regard.

The text was updated successfully, but these errors were encountered:

bulwahn · 2019-07-20T06:19:08Z

On small scale (for getting started):

To start understanding what needs to be done, we can just quickly run this investigation on a few hundred patches (selected from one week of an release candidate, where we checked or can just assume the MAINTAINERS file to be not changing).

Doing this at large scale will be more difficult because we would really like to collect this information for every patch emails and there are millions of patch emails, so we need to tweak and tune the operation.

rsarky · 2020-03-19T19:13:04Z

Since this issue has been opened @rralf has created the LinuxMaintainers class. Can we leverage that instead of get_maintainers?

rsarky · 2020-03-22T21:04:25Z

Aah after going over the codebase a bit more I see the module LinuxMailCharacteristics (particularly the _get_maintainer method) essentially acquires heuristics required for this

ShubhamPandey28 · 2020-07-04T08:17:28Z

A classification approach :
If possible we can classify the patch for its maintainers considering a target vector containing the probability of patch belonging to a maintainer. for eg. suppose there are 5 maintainers so the patch is classified into its maintainers as <0.7, 0.8, 0.67, 0.09, 0.9> if we set thresold probability to 0.7 in this case, so we have maintainer 0,1,4 to whom the patch could be sent.
There is one drawback of this approach that I can think of right now is that, if we add a maintainer we have to either train the model over all the data again including the maintainer in the target vector or compute a different model for each maintainer (considering the classification among maintainers mutually independent events)(This also results in storing the weights separately for every maintainer using more memory).

Please object me if this solution is inappropriate.

bulwahn · 2020-07-04T16:15:56Z

@ShubhamPandey28 I fear you are providing such a dense description that we are not able to judge if this makes sense.
How are you determining any model? Neither get_maintainers.pl nor the history might be a ground truth because both contain mistakes, either developers sending patches to the wrong maintainers (or maintainers handing over to someone else) or get_maintainers.pl containing false entries.

How about starting with writing a script to identify significant systematic differences between get_maintainers.pl and email recipients? If you need to use vectors, go ahead, but you would really need to first describe your model.

fun-akhil · 2020-07-06T19:09:23Z

I have observed that in some cases, the authors of a patch are mailing the patches to addresses/recipients which are not present in the MAINTAINER file. In such a scenario, it is almost impossible to suggest the appropriate mailing list by running get_maintainer.pl script. Is there any possible solution to this problem?

bulwahn · 2020-07-06T20:06:28Z

@quantum109678 Well, you can just add that person to the MAINTAINERS file, right?

The real question is how can you train a "suitable" model on a set of files in a directory and keywords in patches for recipients and then determine the "minimal invasive" but "maximal effective" change in the MAINTAINERS file.

I would be surprised if some crazy machine learning could solve that, but let us start easy and collect where there is a systematic difference, e.g., in 80% of the patches, between the recipient list of all patches belonging to a MAINTAINERS section and the information in the MAINTAINERS section. That probably already indicates that the entry might need adjustment.

bulwahn · 2020-07-14T13:32:49Z

I have formulated issue #65 as nice side investigation to this question here.

bulwahn mentioned this issue Jul 20, 2019

Linux Kernel Internship - Patch Trace Analysis with PaStA #29

Closed

bulwahn mentioned this issue Jul 1, 2020

Compute relation between patch series #34

Closed

rralf closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare patch recipients vs. `get_maintainer.pl` recommendation #21

Compare patch recipients vs. `get_maintainer.pl` recommendation #21

bulwahn commented May 9, 2019

bulwahn commented Jul 20, 2019

rsarky commented Mar 19, 2020

rsarky commented Mar 22, 2020

ShubhamPandey28 commented Jul 4, 2020

bulwahn commented Jul 4, 2020

fun-akhil commented Jul 6, 2020

bulwahn commented Jul 6, 2020

bulwahn commented Jul 14, 2020

Compare patch recipients vs. get_maintainer.pl recommendation #21

Compare patch recipients vs. get_maintainer.pl recommendation #21

Comments

bulwahn commented May 9, 2019

bulwahn commented Jul 20, 2019

rsarky commented Mar 19, 2020

rsarky commented Mar 22, 2020

ShubhamPandey28 commented Jul 4, 2020

bulwahn commented Jul 4, 2020

fun-akhil commented Jul 6, 2020

bulwahn commented Jul 6, 2020

bulwahn commented Jul 14, 2020

Compare patch recipients vs. `get_maintainer.pl` recommendation #21

Compare patch recipients vs. `get_maintainer.pl` recommendation #21