Fix erroneous behaviour in LinuxMaintainers #41

rralf · 2020-03-17T10:48:36Z

Lukas:

I was using your LinuxMaintainers class in Pasta to find files that
are not part of any maintainer entry other than THE REST. Joe Perches
pointed out that get_maintainers.pl has a check to add slashes when it
is directory. See below. I think we need to add that to the
LinuxMaintainers file as well. This will require to have the list of
all directories in the git repository, though; so some refactoring of
LinuxMaintainers.

rralf · 2020-03-17T11:08:36Z

Affected subsystem: ISDN/mISDN section

ditzlike · 2020-04-03T15:04:27Z

I can try to solve it

rralf · 2020-06-09T19:46:03Z

ping :-)

bulwahn · 2020-06-10T11:43:55Z

Look into the mail thread here:

https://lore.kernel.org/lkml/08d88848280f93c171e4003027644a35740a8e8e.camel@perches.com/

rralf · 2020-06-10T14:28:05Z

Uff, okay, I guess I understood the issue. Argh, yikes. Let me just summarise to see if I understood correctly:

As MAINTAINERS puts it:

F:   drivers/net/    all files in and below drivers/net
F:   drivers/net/*   all files in drivers/net, but not below
F:   */net/*         all files in "any top level directory"/net

But appears that this is not the whole truth. Assume the following entry:

TEST
M:  bla <john@doe.org>
F:  arch

arch is in fact a directory, and there's a file called arch/bla.c. get_maintainers will assign that file to the TEST section, while in PaStA we won't.

Oh boy, this is going to be ugly to implement. In particular, this is going to be real fun, if this case is intermixed with wildcards. E.g.:

FUN1
F:    arch/arm*/boot

FUN2
F:    arch/arm/boot

I just tested it: get_maintainers.pl will assign arch/arm/boot/install.sh to FUN2, but not to FUN1. I'm glad that get_maintainers disrespects the wildcard. So this is how we should handle it as well.

@bulwahn could you please verfiy this statement:
For each section, we have to check for each F-entry that does not end on / and that does not contain a wildcard if it is a directory in the corresponding repository. If it is a directory, we must treat it as if there would be a / at the end.

Pia, let me explain the issue in detail in a short call...

rralf · 2020-06-10T14:45:36Z

And by the way, this kind of issue raises a completely new pattern of issues:

How can we be sure, that PaStA's implementation aligns with get_maintainers.pl for any patch for any point in time?

I mean, we work on historical data, and people used get_maintainers.pl at the time of writing their patch. And I'm pretty sure that there were some changes to get_maintainers.pl that changed the assignment of sections without actual changes to MAINTAINERS.

Our current assumption is: get_maintainers.pl tells the truth, as it is the tool of choice for developers.

I see no real possibility that PaStA's implementation will 100% reflect get_maintainers.pl's output for any patch for any point in time.

But we need PaStA's implementation as we want to do mass bulk analysis, and get_maintainers.pl is horribly slow.

So this is why Pia's compare_getmaintainers.pl gains importance: If we want to use PaStA's LinuxMaintainers.py to argue about behaviour of authors, we need to ensure, that our output is "good enough" to reflect the reality of get_maintainers.pl

rralf · 2020-06-10T14:56:31Z

Another hypothetical scenario:

In future, we might want to decide if a maintainer was correctly addressed by a patch. We already tried this with Sebastian's work and realised that this question is hard to answer, and strongly depends on the metric how we define "correctly addressed" -- we had a lot of discussions on that topic.

So at the moment, if we want to check if a patch was correctly addressed, we take it's date, try to assign it roughly to the kernel version "at that point in time", and compare recipients vs. maintainers.

But who says that there were no changes of MAINTAINERS or get_maintainers.pl in the meanwhile? Those files evolve as well.

We need to be extremely careful with the choice of metrics.

For example, a question that is pretty easy to answer with high accuracy is: "Was a patch shot completely off target". E.g., a mail patches a soundcard and was sent to KVM. That's easy to answer with high accuracy. The opposite of the question "Was a patch correctly addressed" is hard to answer and we have (probably) bad accuracy due to metrics.

But we can answer "What amount of patches were not completely off target" with high accuracy, if we ask the question "was a patch completely off shot" for all patches.

rralf · 2020-06-18T23:22:58Z

Might be fixed with 40f11f2

Requires further testing.

rralf added the bug label Mar 17, 2020

rralf assigned ditzlike Apr 3, 2020

lfd deleted a comment from ditzlike Apr 3, 2020

rralf unassigned ditzlike Apr 3, 2020

rralf assigned ditzlike Apr 3, 2020

rralf closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix erroneous behaviour in LinuxMaintainers #41

Fix erroneous behaviour in LinuxMaintainers #41

rralf commented Mar 17, 2020

rralf commented Mar 17, 2020

ditzlike commented Apr 3, 2020

rralf commented Jun 9, 2020

bulwahn commented Jun 10, 2020

rralf commented Jun 10, 2020

rralf commented Jun 10, 2020

rralf commented Jun 10, 2020

rralf commented Jun 18, 2020

Fix erroneous behaviour in LinuxMaintainers #41

Fix erroneous behaviour in LinuxMaintainers #41

Comments

rralf commented Mar 17, 2020

rralf commented Mar 17, 2020

ditzlike commented Apr 3, 2020

rralf commented Jun 9, 2020

bulwahn commented Jun 10, 2020

rralf commented Jun 10, 2020

rralf commented Jun 10, 2020

rralf commented Jun 10, 2020

rralf commented Jun 18, 2020