Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Is reporter a developer' as a feature #83

Closed
marco-c opened this issue Jan 20, 2019 · 24 comments
Closed

'Is reporter a developer' as a feature #83

marco-c opened this issue Jan 20, 2019 · 24 comments

Comments

@marco-c
Copy link
Collaborator

marco-c commented Jan 20, 2019

Similar to #30.

This feature is going to be a boolean saying whether the reporter of the bug is a developer. As a first implementation, we can check if there is any commit where they are authors.

@poojan124
Copy link
Contributor

poojan124 commented Jan 23, 2019

I have a couple of questions about this.

  1. This feature will be true if reporter of a bug made a commit for the same bug or any bug?
  2. And to check both are the same author ( for bug and commit) we need to check their email. Or there is other way to do it ?

@marco-c
Copy link
Collaborator Author

marco-c commented Jan 23, 2019

  1. If the reporter of the bug made a commit for any bug. In theory we should only check commits before the bug opening time, but for simplicity we will consider all commits.
  2. It is not completely precise, but yes for now we will use the email.

@poojan124
Copy link
Contributor

poojan124 commented Jan 24, 2019

  • For this, I am thinking to store all mail id from commit data in a set.
  • And for each bug check whether or creator_mail is present in that set or not. (Or is there any better way to do this.)

"In theory we should only check commits before the bug opening time"

  • And to achive this with mail_id we can also store their oldest commit date( i am not sure where this information is ). From that we can check whether reporter have made any commits previously.
  • For example : If some developer with email xxx@yyy.com made 3 commits on a,b,c date then we will store oldest of this three. Lets say b. So when processing bug we can check if he is developer or not by comparing bug creation_time with b.

@marco-c
Copy link
Collaborator Author

marco-c commented Jan 24, 2019

For this, I am thinking to store all mail id from commit data in a set.
And for each bug check whether or creator_mail is present in that set or not. (Or is there any better way to do this.)

Sounds good, it should work. You'll need to make this smarter:
https://github.com/mozilla/bugbug/blob/master/bugbug/bug_features.py#L235

You can check the number of arguments to the function, and if the number of arguments is only 1 you only pass the bug, otherwise, you pass an object containing some data (for now, only the developers).

@marco-c
Copy link
Collaborator Author

marco-c commented Jan 24, 2019

And to achive this with mail_id we can also store their oldest commit date( i am not sure where this information is ). From that we can check whether reporter have made any commits previously.
For example : If some developer with email xxx@yyy.com made 3 commits on a,b,c date then we will store oldest of this three. Lets say b. So when processing bug we can check if he is developer or not by comparing bug creation_time with b.

Good idea!

@poojan124
Copy link
Contributor

poojan124 commented Jan 29, 2019

You can check the number of arguments to the function, and if the number of arguments is only 1 you only pass the bug, otherwise, you pass an object containing some data (for now, only the developers).

  • As you suggested this idea. I have come up with a couple of implementation but don't know which will be better to use.
  1. Using a list of the class which requires more than 1 argument.
class is_developer():
    def __call__(self, bug, extra):
        pass

# in BugExtractor we can do this
if type(f) in [is_developer]:
    res = f(bug, extra)
else:
    res = f(bug)
  • But in this case, we need to maintain list manually.
  1. Using inspect module signature method
  • using signature method we can get the number of parameters function takes.
from inspect import signature
if len(signature(f).parameters)>1:
    res = f(bug, extra)
else:
    res = f(bug)
  • This implementation looks good and we don't have to maintain anything when we add new features.
  • I can't decide which pattern to use? Please comment this so i can work on this.

@marco-c
Copy link
Collaborator Author

marco-c commented Jan 29, 2019

I prefer the more generic solution

@marco-c
Copy link
Collaborator Author

marco-c commented Feb 5, 2019

@poojan124 are you working on this?

@poojan124
Copy link
Contributor

No as of now I am not working on this. Was stuck on how to send extra data to required features.

@marco-c
Copy link
Collaborator Author

marco-c commented Feb 5, 2019

You have an example at

class reporter_experience(object):
, called like this
res = f(bug, reporter_experience=reporter_experience_map[bug['creator']])
.

@poojan124
Copy link
Contributor

poojan124 commented Feb 5, 2019

@marco-c thanks . now since we have added **kwargs to all features i can add is_developer feature.

@poojan124
Copy link
Contributor

poojan124 commented Feb 12, 2019

Sorry for the late reply. I need clarification on a couple of things.
First, we are going to identify bug author made any commits. To do this i need to get list of all authors from commits data.
In commits.json data author is stored in one of the formats of name <email> or name (email) or name email.
Some of the emails are in weird formats also

  1. 'Daniel Veditz dveditz@mozilla.com / Reed Loden reed@mozilla.com', 7275)
  2. mook.moz+hg%gmail.com
  3. Honza Bambas <honzab.moz>
  4. bobbyholley+bmo@gmail.com

Above list is commits['author'] values.

  1. We can split the author with r'[<>() ]' and if it contains '@' I add it to the author list. This is the list of authors which did not match the pattern. https://pastebin.com/raw/0Fn03z23 . For the rest of the commits there is valid user email. Total 6080 unique authors and 376 unique invalid authors.
    Is there a better way to do this?

poojan124 added a commit to poojan124/bugbug that referenced this issue Feb 13, 2019
poojan124 added a commit to poojan124/bugbug that referenced this issue Feb 13, 2019
'Is reporter a developer' as a feature mozilla#83
@poojan124
Copy link
Contributor

@marco-c Please review and comment whether this is a correct implementation of the feature or i need to change?

@marco-c
Copy link
Collaborator Author

marco-c commented Feb 13, 2019

Yes, for now we can use this solution. We can improve on it later by using information from https://hg.mozilla.org/mozilla-central/file/tip/.mailmap, and by trying to match the full name from the commit with the full name from Bugzilla.

Could you regenerate the list, putting the commit hash alongside the invalid emails? I want to take a look at a few of the commits with invalid authors.

@poojan124
Copy link
Contributor

Can you give me a reference from where can I get this commit hash?

@poojan124
Copy link
Contributor

I hope this is ok: https://pastebin.com/raw/mgu3vRzv. Since I don't know where to get those commit hash i have created this file from commits.json.

@marco-c
Copy link
Collaborator Author

marco-c commented Feb 14, 2019

Yeah that is fine too, from the bug ID I can retrieve the commit.

@poojan124
Copy link
Contributor

@marco-c so are we going with this implementation or need to change anything?

@marco-c
Copy link
Collaborator Author

marco-c commented Feb 15, 2019

Yes, it's OK as a first implementation.

@poojan124
Copy link
Contributor

@marco-c Ok then i will make PR.
And i will work on this you mentioned before.

  1. In theory we should only check commits before the bug opening time, but for simplicity we will consider all commits.

And will also look into better way to extract author list from commit data.

poojan124 added a commit to poojan124/bugbug that referenced this issue Feb 15, 2019
'Is reporter a developer' as a feature mozilla#83
@poojan124
Copy link
Contributor

poojan124 commented Feb 15, 2019

We can get author email or name directly using hg template with {author|email} and {author|user}.

# repository.py
>>> template = '{node}\\0{author}\\0{desc}\\0{date}\\0{bug}\\0{backedoutby}\\0{author|user}\\0{author|email}\\0'
>>> revs.append(Commit(
            node=rev[0],
            author=rev[1],
            desc=rev[2],
            date=dt,
            bug=rev[4],
            ever_backedout=(rev[5] != b''),
            author_name=rev[6],
            author_email=rev[7]
        ))

This way of extracting email is lot better than using regex. Which one do you prefer?

UPD: https://pastebin.com/ZJ4i8GR8 This is generated commit.json file using the above template.

@marco-c
Copy link
Collaborator Author

marco-c commented Feb 15, 2019

This would definitely be better, but I'd be curious to see what are author_name and author_email in those cases where the email was missing.

@poojan124
Copy link
Contributor

poojan124 commented Feb 16, 2019

  • Here are some examples and their respective template outputs.

$ hg log --user 'Cosmin Sabou :cosmin_s'

changeset: 403320:46a579031752
user: Cosmin Sabou :cosmin_s
date: Sun Feb 11 12:46:14 2018 -0700
summary: Bug 1421183 - Disabled browser/base/content/test/tabs/browser_reload_deleted_file.js on Linux/OSX for frequent leaked windows. r=gbrown

$ hg log --user 'Cosmin Sabou :cosmin_s' -T  '{author|user}\n{author|email}'

Cosmin
Cosmin Sabou :cosmin_s


$ hg log -r25424

changeset: 25424:a884555d99c5
user: Daniel Veditz dveditz@mozilla.com / Reed Loden reed@mozilla.com
date: Tue Feb 24 10:50:51 2009 -0600
summary: Bug 479336 - "IDN blacklist needs to include box-drawing characters" [r=dveditz]

 hg log -r25424 -T '{author|user}\n{author|email}'

dveditz
dveditz@mozilla.com


$ hg log --user lba_2
changeset:   449958:4a7a9e081b75
parent:      449925:231fdb84b8d7
user:        lenpel <lba_2@yahoo.com>
date:        Mon Nov 26 15:58:33 2018 +0100
summary:     Bug 1506073 - URL in Headers panel is wrapping now, plus 2 tests adapted for netmonitor. r=Honza

changeset:   447879:bbe0fe93abc1
user:        lenpel <lba_2@yahoo.com>
date:        Fri Nov 16 23:02:15 2018 +0100
summary:     Bug 1506073 - URL in Headers panel is wrapping now. r=Honza

changeset:   446025:82925e53eab9
user:        lenpel <lba_2@yahoo.com>
date:        Mon Nov 12 16:25:04 2018 +0100
summary:     Bug 1503554 - Network header panel now has 1 vertical scrollbar for entire area; r=Honza

changeset:   443738:44cacc066596
user:        lenpel <lba_2@yahoo.com>
date:        Thu Oct 25 00:41:41 2018 +0200
summary:     Bug 1500018 - Polish Network panel filter buttons UI; r=Honza

changeset:   443627:a9d64f4b6a3f
user:        lenpel <lba_2@yahoo.com>
date:        Thu Oct 25 00:41:41 2018 +0200
summary:     Bug 1500018 - Polish Network panel filter buttons UI; r=Honza

changeset:   442107:646f2d021c11
user:        <lba_2>
date:        Thu Oct 18 06:54:00 2018 +0300
summary:     Bug 1499042 - "Fix learn more links in the Network panel". r=Honza
  • here look at the last commit. hg log -r442107 -T '{author|user}\n{author|email}' will give following results.

lba_2>
lba_2

  • So from results it seems like some commits are missing email. In that case {author|email} will still work and return the name of the user.
  • One more thing {author|user} will just return user name part of the email and not the actual name of the user. Its just part before @ in an email. So just storing {author|email} is enough.
  • From the results I think we can safely use {author|email} to uniquely identify authors of commits.

@marco-c
Copy link
Collaborator Author

marco-c commented Feb 17, 2019

OK, thanks for the investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants