Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide referenced issue diffing tool. #640

Merged
merged 1 commit into from Jan 30, 2017
Merged

Conversation

@jberry-suse
Copy link
Contributor

jberry-suse commented Jan 17, 2017

Compare packages from a project against factory for differences in referenced issues and present changes to allow whitelisting before creating bugzilla entries.

First draft of tool as discussed with @coolo. Note that at the moment I am unable to confirm the bugzilla portion works due to 401 Client Error: Authorization Required which is also referenced in a mailing list thread and never resolved. It seems the api server expects HTTP Authr and does not work properly with the api tool. Perhaps someone knows something?

My understanding of the requirements:

  • manually run script
  • ability to whitelist issue references
  • ignore issues newer than X
  • create bugzilla entries for issues missing from factory

The current design creates a $PROJECT.yml file containing the list of references and an indication of being whitelisted or a bugzilla entry created. The discussed idea was to sync the file to a git repo. This could be automated with git-sync in order to run anytime before and after the script is run.

Example from SUSE:SLE-12-SP2:GA.yml:

'appstream-glib':
  'bsc#973385': '1234'
'booth':
  'bsc#952426': '1234'
  'bsc#968865': 'whitelist'
  'bsc#994999': '1234'

I considered a few structures and settled on this for providing an easy to scan/lookup summary while the history of when entries are added can be kept by git.

When the tool is run it will present a similar format, bugzilla ID/whitelist is replaced by issue URL for easy review, in EDITOR where the user can comment out or delete any issue references to be whitelisted (ie ignored in this context). From that input the 'whitelist' or '$BUGZILLA_ID' are inserted.

Given this script is to be run manually and the number of different references may become quite large I added a --limit option to limit the number of packages with new different entries (ie differences not seen before) so that it can be run repeatedly in batches.

With the cache enabled the script should be able to take advantage of the check for updated packages to avoid requesting packages that have not changed since the last run. Granted the max ttl comes in to effect and may be worth considering increasing it.

Also note the openSUSE.org link project cannot be used as the request for /source/project/package?view=issues seems to crap.

@jberry-suse jberry-suse force-pushed the jberry-suse:issue-diff branch from 058d9a3 to 8219dbf Jan 17, 2017
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jan 17, 2017

Coverage Status

Coverage remained the same at 46.307% when pulling 058d9a3 on jberry-suse:issue-diff into b4ab905 on openSUSE:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jan 17, 2017

Coverage Status

Coverage remained the same at 46.307% when pulling 8219dbf on jberry-suse:issue-diff into b4ab905 on openSUSE:master.

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 17, 2017

Here is a full run (obviously the 1234 being a placeholder) that provides the baseline. https://gist.github.com/jberry-suse/84de819aa2d13404d673cb9d0aeff764

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Jan 19, 2017

  File "./issue-diff.py", line 95, in main
    if db is None:
UnboundLocalError: local variable 'db' referenced before assignment
@lnussel

This comment has been minimized.

Copy link
Member

lnussel commented Jan 19, 2017

wrt bugzilla try filing a bug against the bugzilla component. maybe special permissions are needed

@lnussel

This comment has been minimized.

Copy link
Member

lnussel commented Jan 19, 2017

note there's also apibugzillastage for testing

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Jan 19, 2017

there are several issues I found while trying the tool for real.

OBS and IBS report bugzilla references with a different label:

@@ -41,6 +43,8 @@ def issues_get(apiurl, project, package, db, newest, oldest):
     oldest = timedelta(days=oldest)
     for issue in root.findall('issue'):
         label = issue.find('label').text
+        if issue.find('tracker').text == 'bnc':
+            label = re.sub(r'^boo#', 'bsc#', label)
         if issue_found(package, label, db):
             continue

Some issues have dates on one instance and not on the other - avoid creating diff for those, OBS team has to sort this out:

-            # Pre API dating should be safe to ignore.
-            continue
+            # just invent something
+            date = "2007-12-12 00:00 GMT+1"

Some issues are within your range on one instance and not on the other - avoid creating a diff for those

-        if delta >= newest and delta <= oldest:
-            issues[label] = issue.find('url').text
+        issues[label] = { 'url': issue.find('url').text,
+                          'age': delta.days }

@@ -106,7 +108,10 @@ def main(args):
         issues_factory = issues_get(apiurl_default, args.factory, package, db, args.newest, args.oldest)
 
         missing_from_factory = set(issues_project.keys()) - set(issues_factory.keys())
-
+        for label in set(missing_from_factory):
+            if issues_project[label]['age'] < args.newest:
+                missing_from_factory.remove(label)

Your package_list function does not work for service packs - and is using the more expensive search route

@@ -65,11 +69,11 @@ def issue_found(package, label, db):
     return not(package not in db or db[package] is None or label not in db[package])
 
 def package_list(apiurl, project):
-    url = osc.core.makeurl(apiurl, ['search', 'package'], "match=[@project='%s']" % project)
+    url = osc.core.makeurl(apiurl, ['source', project], { 'expand': 1 })
     root = ET.parse(osc.core.http_GET(url)).getroot()
 
     packages = []
-    for package in root.findall('package'):
+    for package in root.findall('entry'):
         packages.append(package.get('name'))
 
     return sorted(packages)
@coolo

This comment has been minimized.

Copy link
Member

coolo commented Jan 19, 2017

And after reviewing the examples I think one bug per package is good enough. So what is required additionally is a nice text summary about the reason the bug is opened - and as we know the summary from obs we can include it in the summary.

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Jan 19, 2017

So the workflow would be: per package open an editor with the bug text supposed to be opened, I review it and can edit it - and afterwards I'm asked by the tool how to proceed. Bug report / ignore package / Cancel loop.

If I chose bug report the bug is reported as I saved it, but the issues presented in the original report are all marked in the DB. So if I've seen them once, I will not see them again.

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Jan 19, 2017

adrian is fixing the timestamps of the issues - so we won't need the workaround. It should be noted though that the date of issue is the date when the bug was created. So we need to change the logic a bit - because the age of the bug is not the condition we're interested in

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 19, 2017

@lnussel Bugzilla will not even authenticate much less get to the stage of filing a bug.

@coolo:

OBS and IBS report bugzilla references with a different label:

Yeah, I remember that now. The original web interface I wrote normalized them, but that would mean loading the list of issue trackers and normalizing locally which seems a bit odd when using the API since the local code has to run the regexes. Unless we only care about those two, but I know for example there were different casings of FATE and other variations. Perhaps adrian can add the normalized version to the API response as it seems a bit lacking without it.

Specifically, the /issue_trackers list already includes label which includes @@@ to be replaced by ID. I assumed label in the issue list was the same thing, but it appears it is just a copy of what was in .changes file. Somewhat misleading, but perhaps just adding a normalized entry where the regex result is fed back into the label pattern.

Some issues have dates on one instance and not on the other - avoid creating diff for those, OBS team has to sort this out:

That's rather awkward. The code I wrote before just parsed the changelog date entry. Thankfully, I see it is being resolved by your comment below.

Some issues are within your range on one instance and not on the other - avoid creating a diff for those

Alright.

Your package_list function does not work for service packs - and is using the more expensive search route

That suggestion should be applied elsewhere in the existing code then as well. Always struck me as weird that a search was performed for such structural information.

And after reviewing the examples I think one bug per package is good enough. So what is required additionally is a nice text summary about the reason the bug is opened - and as we know the summary from obs we can include it in the summary.

I looked to include that, but apparently missed that it is present for suse bugzilla entries.

So the workflow would be: per package open an editor with the bug text supposed to be opened, I review it and can edit it - and afterwards I'm asked by the tool how to proceed. Bug report / ignore package / Cancel loop.

If I chose bug report the bug is reported as I saved it, but the issues presented in the original report are all marked in the DB. So if I've seen them once, I will not see them again.

Alright. Assuming the bug summary is available what sort of summary would you like? For entries that do not have a summary available just drop the : ... portion?

The following issues were referenced in the changelog for $project/$package, but where not found in $factory/package after $min days. Review the issues and submit changes to $factory as necessary.

- boo#1234 owned by @$owner: $issues_summaries['boo#1234']
...

Not sure if bugzilla notifies based on @ or similar syntax to ensure original bug owners are notified specifically.

For the interface, perhaps providing control functions to stop loop and such at the bottom of each file would be better than saving and having to respond to a prompt to continue?

Do you want to be able to edit all the text or simply be presented with the list of issues (and the extra detail like summary for reference) to which you can add remove and have them placed into the template? An optional text field being available for additional notes. If you want the whole text it will make parsing it a bit more work than yaml which is necessary to figure out which bug the user ignored.

adrian is fixing the timestamps of the issues - so we won't need the workaround. It should be noted though that the date of issue is the date when the bug was created. So we need to change the logic a bit - because the age of the bug is not the condition we're interested in

Is the date in changelog or submission date accessible? What sort of change did you have in mind? Short of either the created_at or updated_at fields in api being applicable I imagine parsing the .changes files like I did before will be necessary at which point the only additional piece the API provides is the summary for certain entries.

What is the mapping from package to bugzilla component for filing the bugs?

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 19, 2017

I checked the /issue_trackers response from OBS and IBS which differ for the bnc tracker.

osc api /issue_trackers
  <issue-tracker>
    <name>bnc</name>
    <kind>bugzilla</kind>
    <description>openSUSE Bugzilla</description>
    <url>https://apibugzilla.novell.com/</url>
    <show-url>https://bugzilla.opensuse.org/show_bug.cgi?id=@@@</show-url>
    <regex>(?:bnc|BNC|bsc|BSC|boo|BOO)\s*[#:]\s*(\d+)</regex>
    <label>boo#@@@</label>
    <enable-fetch>true</enable-fetch>
  </issue-tracker>

vs

osc -A ibs api /issue_trackers
  <issue-tracker>
    <name>bnc</name>
    <kind>bugzilla</kind>
    <description>SUSE Bugzilla</description>
    <url>https://apibugzilla.novell.com/</url>
    <show-url>https://bugzilla.suse.com/show_bug.cgi?id=@@@</show-url>
    <regex>(?:bnc|BNC|bsc|BSC|boo|BOO)\s*[#:]\s*(\d+)</regex>
    <label>bsc#@@@</label>
    <enable-fetch>true</enable-fetch>
  </issue-tracker>

Which is perhaps the issue so I'll just end up having to normalize in the script by applying the name to the label from /issue_trackers for the given tracker.

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

Short of a solution regarding what dates to use everything else has been incorporated. Quite a bit reworking the workflow to be package based and in a way that seems clean to use. Anytime differences are detected the editor is presented and a continue prompt allows for skipping the package entirely, creating the bugs and then stopping or standard yes and no to full stop and continue after saving. Anytime the differences are handled they are immediately written to the db file to ensure they are not lost after a crash or interrupt of any kind.

I implemented the git-sync portion which will automatically pull down (even initial clone) the repository, used for storing the database of what has been processed, and sync it after changes are made. Locking is not done, but that should be overkill for this. The sync should make it easy for others to run the script by automating setup and ensuring others that use the script always commit their changes. If the script crashes running it again will sync the previous changes on startup.

My test repository is jberry-suse/osc-plugin-factory-issue-db which should provide an updated view of what the tool sees after a full run when compared to the previous gist. I would expect the repo to be given a proper home (and created from scratch). Perhaps openSUSE/osc-plugin-factory-issue-db. If someone with access would create an empty repo and give me write access I can initialize it and update this PR to point at it.

Also note the actual creation of the bugs needs to be worked out. Can someone point me to any existing tools that create bugs (if any) so I can see what they do?

@jberry-suse jberry-suse force-pushed the jberry-suse:issue-diff branch from 8219dbf to 59cd0c2 Jan 20, 2017
@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jan 20, 2017

Coverage Status

Coverage remained the same at 46.307% when pulling 59cd0c2 on jberry-suse:issue-diff into 4f2afcf on openSUSE:master.

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

I also ran into a bug summary with a unicode character. So that works too. :-)

translation-update-upstream:

{
  'url': 'https://bugzilla.suse.com/show_bug.cgi?id=877707',
  'owner': 'sbrabec',
  'age': 856,
  'summary': u'SLED12_LOC : ALL_LANGS\uff1a Untranslated in Network section on right of Gnome toolbar'
}
@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

Another thought, should the tool remove entries from db after they have been merged or keep them there as a history? If kept in git they'll be in the git history obviously, but that isn't as direct as being in the list. If kept forever the file will just continue to grow is size which does not seem ideal, but probably won't be an issue either way. More about preference. Auto-cleaning the file would make it act as a white-list and active missing lookup which seems nice. Thoughts?

@jberry-suse jberry-suse force-pushed the jberry-suse:issue-diff branch from 59cd0c2 to c88650b Jan 20, 2017
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jan 20, 2017

Coverage Status

Coverage remained the same at 46.307% when pulling c88650b on jberry-suse:issue-diff into 4f2afcf on openSUSE:master.

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

One note when looking at the output the oldest age was dropped off for simplification and did not seem like it was desired, which means there were additional entries not considered before and the cases where a ref fell on either side of the barrier are removed. It may make sense to add the --oldest option back to make the it a moving window especially in combination with the cleanup. It really depends on what the usage and workflow surrounding the tool will be.

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Jan 20, 2017

I wonder why you didn't find the issues in kernel-source this time. They were massive - but then again, I checked SP3

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

@coolo The script had not yet finished at the time I commented. They appear to be there now.

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

I figured out how to get around the www-authenticate wall so the bugzilla api package works properly. The remaining question then is how to determine the component, product, etc in which to create the bug. One thought could be to update the existing bug that are being reported on, but that depends on how things are viewed. Obviously, the original issue is fixed, but has not been pushed everywhere. If pushing is considered part of the fix it somewhat makes sense, but the original bug may also have more people involved than are useful in this context. Filing a new bug also allows for batching a set of newly detecting issues together.

If creating a new bug, which is what my understanding is desired, then it seems to make sense to pull the meta information from the bugs for which the new bug is being created unless there is a better mapping from package to bugzilla meta information. Short of some more complex or appropriate method copying the meta information from the first bug in the batch seems like a sensible way forward. If desired looking for the most common set of meta information from the set of bugs could also be done, but is likely overkill.

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

Issues outside of suse bugzilla will be referenced which will require a new bug either way. If no bsc bugs related to a package exist the script will not have anyway to know where to file the bug. A fallback could be used, but preferably if the markdown syntax works or similar the relevant owners will be notified and can move the bug as needed.

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Jan 20, 2017

I setup a local bugzilla server for testing and successfully created bugs and what not. I have not attempted to create a bug on b.s.c, but the code that loads bugs to copy meta information successfully authenticates and queries from b.s.c.

Note that due to the way the b.s.c. server is setup authentication must be present in the --bugzilla-apiurl and cannot be handled interactively as the bugzilla package supports. As such the standard form --bugzilla-apiurl https://jberry:somepass@apibugzilla.suse.com is used. I added a number of bugzilla related options for setting default meta information (if not b.s.c. bugs are present to copy meta data - could also fallback to any other bugs for that package and not just the ones in the diff before using defaults). I also added a --bugzilla-cc so the user of the tool can cc themselves on all created bugs if desired. All the bug owners are automatically added if available.

The innerdiff https://gist.github.com/jberry-suse/ebef106a308ec22d082519d42dc3277e.

@jberry-suse jberry-suse force-pushed the jberry-suse:issue-diff branch from c88650b to 0517774 Jan 20, 2017
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jan 20, 2017

Coverage Status

Coverage remained the same at 46.307% when pulling 0517774 on jberry-suse:issue-diff into 4f2afcf on openSUSE:master.

Compare packages from a project against factory for differences in
referenced issues and present changes to allow whitelisting before
creating bugzilla entries. A database is kept of previously handled
issues to avoid repeats and kept in sync via a git repository.
@jberry-suse jberry-suse force-pushed the jberry-suse:issue-diff branch from 0517774 to 92f10b2 Jan 21, 2017
@coveralls

This comment has been minimized.

Copy link

coveralls commented Jan 21, 2017

Coverage Status

Coverage remained the same at 46.307% when pulling 92f10b2 on jberry-suse:issue-diff into 4f2afcf on openSUSE:master.

@lnussel lnussel merged commit 0b7a871 into openSUSE:master Jan 30, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jberry-suse jberry-suse deleted the jberry-suse:issue-diff branch Jan 31, 2017
@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Feb 3, 2017

@lnussel: did we want to find a better home for the db repository?
@coolo: did you have a chance to try it out after changes?

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Feb 6, 2017

no, on my todo

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Feb 6, 2017

I assume this was expected :)

ERROR: Permission to jberry-suse/osc-plugin-factory-issue-db.git denied to coolo.

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Feb 6, 2017

but you implemented some component guessing - that's fine, but only if the component exists in sle13 too. More important is the assignee - that I don't see set. This should come from the maintainer information in IBS.

And the bugzilla text is suboptimal - i.e. little undermotivated for the receiever :)

@jberry-suse

This comment has been minimized.

Copy link
Contributor Author

jberry-suse commented Feb 7, 2017

Yeah, I had a note, but it was missed, I assume, about changing that to a proper repo before merging. Are we still good with that concept then? or should I just clear out my repo and grant @coolo write access?

@coolo

This comment has been minimized.

Copy link
Member

coolo commented Feb 7, 2017

I don't really want that as openSUSE repo, so put it in your namespace and grant access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.