New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Github backend to use Github API v4 #582

Merged
merged 8 commits into from Aug 28, 2018

Conversation

Projects
None yet
3 participants
@Zlopez
Copy link
Member

Zlopez commented Aug 21, 2018

Implementation of Github API v4 as requested in #567
This should also solve #581

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Aug 21, 2018

Codecov Report

Merging #582 into master will increase coverage by 0.35%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #582      +/-   ##
==========================================
+ Coverage   89.65%   90.01%   +0.35%     
==========================================
  Files          55       55              
  Lines        2591     2644      +53     
  Branches      332      341       +9     
==========================================
+ Hits         2323     2380      +57     
+ Misses        203      201       -2     
+ Partials       65       63       -2
Impacted Files Coverage Δ
anitya/config.py 100% <ø> (ø) ⬆️
anitya/lib/backends/github.py 100% <100%> (+19.04%) ⬆️
anitya/lib/exceptions.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64e4bde...97bbef1. Read the comment docs.

@jeremycline
Copy link
Member

jeremycline left a comment

There's a few things that can be refactored here, but overall it's looking like a big improvement 🍰

@@ -4,7 +4,7 @@
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "fedora/27-cloud-base"
config.vm.box = "fedora/28-cloud-base"

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

It's generally good practice to break this sort of stuff out into a separate commit. I don't see anyone wanting to revert this or anything, but it's a nice separation of concerns thing. Not a huge deal, don't feel like you need to fix it here.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

I will move it to separate commit.

req.add_header('User-Agent', user_agent)
req.add_header('From', from_email)
req.add_header('User-Agent', headers.user_agent)
req.add_header('From', headers.from_email)

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

headers is a dictionary, right? I don't think this works.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

Yes, you are right. This means, that there was not test for this, I will add it.

if url:
(owner, repo) = url.split('/')

if (not owner) or (not repo):

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

It's exactly equivalent, but if not (owner and repo) might read a little cleaner.

Also, I know this is just a refactor of what was there before, but it makes more sense to validate this before it goes in the database, rather than right here. It'd tell the user up-front they've put in incorrect data. Might be a nice refactor for a future PR.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

Yes, this should be probably validated on frontend in javascript, but you also need to validate it before saving it to db (in case somebody is trying something nasty). This could be different for every backend, so it will be good to have validation method as part of backend and the db method could call it before inserting the data.

What do you think?

This comment has been minimized.

@jeremycline

jeremycline Aug 23, 2018

Member

That sounds reasonable, yes. It can be called during the pre-commit event.

return versions

@classmethod
def prepare_query(cls, owner, repo, after=''):

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

There's no reason for this to be a class method. There's also no reason for the class inheritance structure in the backends generally, so while this matches the general style, it's not a good style. You can either make it a private instance method or just rip this out into a private module-level function.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

I will change it to private method. It doesn't need to be public.

def prepare_query(cls, owner, repo, after=''):
''' Method for preparing GraphQL query for specified repository
:arg owner: owner of the repository

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

I find the "Google" style of comments more pleasant to write, you might as well.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

I used the style, that was there before, but I will change it.

This comment has been minimized.

@jeremycline

jeremycline Aug 23, 2018

Member

Yeah, it was the old style and I was slowly converting it all as I touched things

return query

@classmethod
def call_url_post(cls, url, json, insecure=False, use_token=False):

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

I think you can just factor this into the get_versions call. I think it's reasonable for the token to be required, and we should never use insecure=True with GitHub so it boils down to a single call to another function.

The get_headers interface (while matching the existing style) is more complicated than just setting a default set of headers somewhere, then making a copy here and injecting the access token. I recommend doing that.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

I didn't understand the second paragraph in your comment. You want me to copy the get_headers method to this backend.
The get_headers method is new method in BaseBackend and it only fills basic header fields.

This comment has been minimized.

@jeremycline

jeremycline Aug 23, 2018

Member

Ah, what I'm saying is I don't like the use of inheritance in the backends. It's complicated and adds no value, but I never got a chance to refactor it. Rather than add to it, this could be implemented as:

In constants or BaseBackend's module or something:

REQUEST_HEADERS = <default_headers>

Then for your Github backend you just make a copy of REQUEST_HEADERS and add your auth token:

github_headers = REQUEST_HEADERS.copy()
github_headers['Authentication'] = ...

There's no need for a get_headers method that subclasses override. The whole call_url function has no business being a class method at all. If should just be a module-level function that acts as a wrapper that handles the special FTP case and otherwise is the requests API.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

I thought that the inheritance is intentional, otherwise BaseBackend doesn't have much sense. I thought it is used as some kind of interface, that should be implemented by every backend.

This comment has been minimized.

@jeremycline

jeremycline Aug 23, 2018

Member

Inheritance is fine to define the interface, but the only interface we're really interested in is for a given project, new versions are returned. Everything else is an implementation detail of the particular backend and doesn't need to be part of the class.

'%s: Server responded with status "%s": "%s"' % (
project.name, resp.status_code, resp.reason))

if 'errors' in json:

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

Consider making a function that accepts the json and spits out the versions (or raises an exception). That'll make testing easier, and make this function a digestible length.

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

Ok, I will do this. It sounds good.

# cursor = json['data']['repository']['refs']['edges']['cursor']

remaining = json['data']['rateLimit']['remaining']
reset_time = json['data']['rateLimit']['resetAt']

This comment has been minimized.

@jeremycline

jeremycline Aug 22, 2018

Member

We should handle the rate limiting gracefully. If we're getting rate limited, it should raise an exception (RateLimitError or similar) that contains the time the rate limit is over so that the caller can handle it by rescheduling for later (or exploding)

This comment has been minimized.

@Zlopez

Zlopez Aug 23, 2018

Member

This looks like elegant solution to handle the rate limit.

@Zlopez Zlopez force-pushed the Zlopez:github_API branch 3 times, most recently from 2bad5d5 to d8cb24b Aug 24, 2018

from anitya.lib.backends import BaseBackend, http_session, REQUEST_HEADERS
from anitya.lib.exceptions import AnityaPluginException, RateLimitException
from anitya.config import config
import logging

This comment has been minimized.

@jeremycline

jeremycline Aug 24, 2018

Member

Historically we've followed PEP 8, which includes rules for import ordering: the first group is stdlib imports, the second group is third-party libs, and the final group is imports from the current package. Each group is separated by a blank line.

@jeremycline
Copy link
Member

jeremycline left a comment

A few small comments, but the only thing I think really needs to change is the decoding of FTP responses.


try:
headers = REQUEST_HEADERS.copy()
token = config.get('GITHUB_ACCESS_TOKEN')

This comment has been minimized.

@jeremycline

jeremycline Aug 24, 2018

Member

Since you set a default for the token, you don't need to use get.

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

You are right

def __init__(self, reset_time):
self._reset_time = reset_time

@property

This comment has been minimized.

@jeremycline

jeremycline Aug 24, 2018

Member

I recommend just calling self._reset_time self.reset_time and not making a property. Properties are handy if you need to do some sort of transformation or you want your internal representation to differ from the public API in some way, but here it's the same.


import unittest

This comment has been minimized.

@jeremycline

jeremycline Aug 24, 2018

Member

Following the PEP 8 rules, unittest should stay up in the top group and six should move to this group

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

Didn't know this

date: ['Thu, 23 Aug 2018 09:39:54 GMT']
keep-alive: ['timeout=5, max=100']
server: [Apache/2.2.22 (Fedora)]
x-clacks-overhead: [GNU Terry Pratchett]

This comment has been minimized.

@jeremycline

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

I was surprised by this value when I fetched the URL.

This comment has been minimized.

@jeremycline

jeremycline Aug 27, 2018

Member

In case you're not familiar with Terry Pratchett's Discworld books, one of his books includes a lot of amusing references to the Free Software movement and this HTTP header is in reference to one of his books.

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

I know his work. I read almost all his books from Discworld, but I was surprised to see the header field, I never saw it before.


_log.debug('Received %s tags for %s' % (total_count, project.name))

# TODO: Save cursor to project and use it to fetch only newer versions

This comment has been minimized.

@jeremycline

jeremycline Aug 24, 2018

Member

I recommend turning these TODOs into tickets and removing them from the code

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

Will do

try:
dir_listing = resp.text
except AttributeError:
dir_listing = resp.decode('utf-8')

This comment has been minimized.

@jeremycline

jeremycline Aug 24, 2018

Member

I just skimmed https://tools.ietf.org/html/rfc2640, but it sounds like there's no guarantees that the response will be UTF-8 encoded, and even if the spec said it MUST be, I'd be willing to bet there are some non-conforming servers out there. If it should only be ASCII/UTF-8 then this is fine, but this should handle an Unicode decoding exception and raise the appropriate Anitya exception explaining that the FTP service is misbehaving.

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

I will add another try-except when trying to decode the binary string and if it's not UTF-8, I will just raise AnityaPluginException.
Maybe there is some way, how to get the encodings of the binary string and then use the appropriate decode method. This will be nice way how to do it.
I will check this.
It will be much easier, if Requests library could work with FTP.

This comment has been minimized.

@jeremycline

jeremycline Aug 27, 2018

Member

There is a requests-ftp extension, although I'm not sure if it works with current requests. It'd be nice to drop the FTP support, but I'm not sure if there's still a number of projects only available over FTP. I think just bailing is a reasonable approach. The specification indicates it should only ever be ASCII or UTF-8, but because FTP is so old I have no doubt there are servers out there not following the latest RFC about that.

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

I looked at it, but it's only a template, not real library. It's only some basic functionality for future work.

url, e.reason))
except UnicodeDecodeError:
raise AnityaPluginException(
'FTP response is not unicode: %s' % url)

This comment has been minimized.

@jeremycline

jeremycline Aug 27, 2018

Member

A slightly more accurate error message would be 'FTP response cannot be decoded with UTF-8' since Unicode defines the code points and UTF-8 is a particular way to map Unicode code points to a binary representation, but this is also fine

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

Sounds better than current error message.

@@ -115,7 +115,6 @@ class RateLimitException(AnityaException):
def __init__(self, reset_time):
self._reset_time = reset_time

@property
def reset_time(self):

This comment has been minimized.

@jeremycline

jeremycline Aug 27, 2018

Member

Ah, what I meant was just delete this method and change the name from self._reset_time to self.reset_time

This comment has been minimized.

@Zlopez

Zlopez Aug 27, 2018

Member

I only want the time to be retrievable by the code which catches it.

Maybe bad habit from other object oriented languages.
I'm used to have getter and setter (when needed) for every attribute the object has.

But in this case it could be probably better to have direct access to reset_time.

This comment has been minimized.

@jeremycline

jeremycline Aug 27, 2018

Member

I wouldn't really call it a bad habit (this is the only way to do things in Java), but it's not "Pythonic". Python uses _ it indicate a variable, method, function, etc is private and might change at any time, but doesn't enforce it. If the interface you'd like to present to the public differs from the one you'd like to manage internally, it's a perfect place for the setter/getter pattern with properties. Otherwise, just exposing the variable itself is fine.

@Zlopez Zlopez force-pushed the Zlopez:github_API branch from 04cef28 to ced8663 Aug 28, 2018

@Zlopez Zlopez force-pushed the Zlopez:github_API branch from ced8663 to 97bbef1 Aug 28, 2018

@Zlopez

This comment has been minimized.

Copy link
Member

Zlopez commented Aug 28, 2018

Update with latest master git rebase master

@jeremycline
Copy link
Member

jeremycline left a comment

🍰

:shipit:

@mergify mergify bot merged commit bcb24f7 into release-monitoring:master Aug 28, 2018

4 checks passed

codecov/patch 100% of diff hit (target 89.65%)
Details
codecov/project 90.01% (+0.35%) compared to 64e4bde
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
mergify/pr Merged

@Zlopez Zlopez deleted the Zlopez:github_API branch Aug 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment