Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix erroneous behaviour in LinuxMaintainers #41

Closed
rralf opened this issue Mar 17, 2020 · 8 comments
Closed

Fix erroneous behaviour in LinuxMaintainers #41

rralf opened this issue Mar 17, 2020 · 8 comments
Assignees
Labels

Comments

@rralf
Copy link
Member

rralf commented Mar 17, 2020

Lukas:

I was using your LinuxMaintainers class in Pasta to find files that
are not part of any maintainer entry other than THE REST. Joe Perches
pointed out that get_maintainers.pl has a check to add slashes when it
is directory. See below. I think we need to add that to the
LinuxMaintainers file as well. This will require to have the list of
all directories in the git repository, though; so some refactoring of
LinuxMaintainers.

@rralf rralf added the bug label Mar 17, 2020
@rralf
Copy link
Member Author

rralf commented Mar 17, 2020

Affected subsystem: ISDN/mISDN section

@lfd lfd deleted a comment from ditzlike Apr 3, 2020
@ditzlike
Copy link
Contributor

ditzlike commented Apr 3, 2020

I can try to solve it

@rralf
Copy link
Member Author

rralf commented Jun 9, 2020

ping :-)

@bulwahn
Copy link
Contributor

bulwahn commented Jun 10, 2020

@rralf
Copy link
Member Author

rralf commented Jun 10, 2020

Uff, okay, I guess I understood the issue. Argh, yikes. Let me just summarise to see if I understood correctly:

As MAINTAINERS puts it:

F:   drivers/net/    all files in and below drivers/net
F:   drivers/net/*   all files in drivers/net, but not below
F:   */net/*         all files in "any top level directory"/net

But appears that this is not the whole truth. Assume the following entry:

TEST
M:  bla <john@doe.org>
F:  arch

arch is in fact a directory, and there's a file called arch/bla.c. get_maintainers will assign that file to the TEST section, while in PaStA we won't.

Oh boy, this is going to be ugly to implement. In particular, this is going to be real fun, if this case is intermixed with wildcards. E.g.:

FUN1
F:    arch/arm*/boot

FUN2
F:    arch/arm/boot

I just tested it: get_maintainers.pl will assign arch/arm/boot/install.sh to FUN2, but not to FUN1. I'm glad that get_maintainers disrespects the wildcard. So this is how we should handle it as well.

@bulwahn could you please verfiy this statement:
For each section, we have to check for each F-entry that does not end on / and that does not contain a wildcard if it is a directory in the corresponding repository. If it is a directory, we must treat it as if there would be a / at the end.

Pia, let me explain the issue in detail in a short call...

@rralf
Copy link
Member Author

rralf commented Jun 10, 2020

And by the way, this kind of issue raises a completely new pattern of issues:

How can we be sure, that PaStA's implementation aligns with get_maintainers.pl for any patch for any point in time?

I mean, we work on historical data, and people used get_maintainers.pl at the time of writing their patch. And I'm pretty sure that there were some changes to get_maintainers.pl that changed the assignment of sections without actual changes to MAINTAINERS.

Our current assumption is: get_maintainers.pl tells the truth, as it is the tool of choice for developers.

I see no real possibility that PaStA's implementation will 100% reflect get_maintainers.pl's output for any patch for any point in time.

But we need PaStA's implementation as we want to do mass bulk analysis, and get_maintainers.pl is horribly slow.

So this is why Pia's compare_getmaintainers.pl gains importance: If we want to use PaStA's LinuxMaintainers.py to argue about behaviour of authors, we need to ensure, that our output is "good enough" to reflect the reality of get_maintainers.pl

@rralf
Copy link
Member Author

rralf commented Jun 10, 2020

Another hypothetical scenario:

In future, we might want to decide if a maintainer was correctly addressed by a patch. We already tried this with Sebastian's work and realised that this question is hard to answer, and strongly depends on the metric how we define "correctly addressed" -- we had a lot of discussions on that topic.

So at the moment, if we want to check if a patch was correctly addressed, we take it's date, try to assign it roughly to the kernel version "at that point in time", and compare recipients vs. maintainers.

But who says that there were no changes of MAINTAINERS or get_maintainers.pl in the meanwhile? Those files evolve as well.

We need to be extremely careful with the choice of metrics.

For example, a question that is pretty easy to answer with high accuracy is: "Was a patch shot completely off target". E.g., a mail patches a soundcard and was sent to KVM. That's easy to answer with high accuracy. The opposite of the question "Was a patch correctly addressed" is hard to answer and we have (probably) bad accuracy due to metrics.

But we can answer "What amount of patches were not completely off target" with high accuracy, if we ask the question "was a patch completely off shot" for all patches.

@rralf
Copy link
Member Author

rralf commented Jun 18, 2020

Might be fixed with 40f11f2

Requires further testing.

@rralf rralf closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants