New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [Meta] Design Reconsiderations #670

Closed
sigmavirus24 opened this Issue Jan 21, 2017 · 18 comments

Comments

Projects
None yet
6 participants
@sigmavirus24
Owner

sigmavirus24 commented Jan 21, 2017

So I'm beginning to have some doubts about the current direction of github3.py's design, specifically around:

  • Empty
  • NullObject

It seems far too many people want to deal with something that's not a "NullObject" or want to treat it like None. The way it's currently implemented, it actually does behave like None. You can still do

import github3

gh = github3.login(...)
real_organization = gh.organization('github3py')
fake_organization = gh.organization('non-existent-org')

if real_organization:
    print('github3py exists')

if fake_organization:
    print('non-existent-org exists')

And see what you expect

That said, people seem to prefer None to NullObject and that's fine, but, I'm wondering how appropriate None is. We do return it carefully from methods. In the example above, the request for non-existent-org will return a 404. That is the only time we return NullObject (or None). With that in mind, I think using None there is valid.

That brings us to using None for attribute values parsed from the API. Previously, we were using None for attributes returned as null as well as attributes that didn't exist in the response. Why did we do this? Because I believed (and still do believe) that having an attribute is better than having an AttributeError, especially when on one instance of an object you get a value and on another you might get an AttributeError.

This + a false idea of DRY led me to think that having something like Empty would be the best option for users. I'm now convinced that I was wrong.

Instead, I think we should approach this differently. For some resources there are two different (although intersecting) representations. For example, for users there is a short user listing and a long one. The short listing is returned when you list all users, e.g., gh.all_users(). The long one is returned when you request a specific one, e.g., gh.user('itsmemattchung').

Thus, I'm proposing a new way of dealing with this:

  1. Let's create separate classes (that don't inherit from each other) for resources that have these separate representations. For example (names are up for debate) ShortUser and User.

    The Short<X> classes should understand their relation to X classes so that the Short<X> classes can return an instance of X when a user calls refresh(). This is because currently, users can do the following to get full information on users:

    for user in gh.all_users():
        user = user.refresh()
        # ...

    This will just mean that we introduce something that inherits from the base class for the short classes and overrides the refresh method and looks for a class attribute that defines the longer class.

  2. Remove Empty and its usage.

  3. After we've done a few of these classes, determine the best way to reduce duplication. I suspect what we'll end up doing is something like this:

    class _User(models.GitHubObject):
        def _update_attributes(self, json):
            # ... Set up attributes for ShortUser
    
        # define all methods already on a User
    
    
    class User(_User):
        """ Docstring with examples """
    
        def _update_attributes(self, json):
            super(User, self)._update_attributes(json)
            # finish setting up attributes that are present for /users/:user_name
    
    
    class ShortUser(models.ShortRefresh, _User):
        """ Docstring with examples """
        long_form = User

Is this more convoluted than using Empty? From an implementation stand-point, yes. From a user stand-point it offers the explicit that was missing. It also eliminates the confusion that users seem to be encountering.

Work Items


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Jan 28, 2017

I'm going to interpret no response as tacit approval. =P

sigmavirus24 added a commit that referenced this issue Jan 29, 2017

Excise github3.empty.Empty
After a couple weeks of users trying this out, it has revealed itself to
be far too confusing. Users do not enjoy it and it makes them scratch
their heads.

This still needs to also be excised from our testsuite.

Related-to gh-670

sigmavirus24 added a commit that referenced this issue Jan 29, 2017

Remove tests that affirm existence and use of Empty
Now that we've excised the usage of Empty and its presence in
github3.py, let's remove all references to it in our testsuite.

Related-to #670

sigmavirus24 added a commit that referenced this issue Jan 29, 2017

Remove Empty from our documentation
Now that we've removed Empty from our code and tests, remove it from our
documentation.

Related-to gh-670

sigmavirus24 added a commit that referenced this issue Jan 29, 2017

Excise NullObject from github3.py
As part of on going work to improve the overall design and simplicity,
we're removing the NullObject and it's usage. At the present time, it
causes a lot of confusion for users (not as much as Empty, but still up
there) and there are simpler ways of designing the library than this.

Related-to #670
@esacteksab

This comment has been minimized.

Collaborator

esacteksab commented Jan 30, 2017

Apologies for the lack of response.

  1. Life got a little chaotic.
  2. I haven't touched this repo in quite some time so felt my voice shouldn't matter.
  3. I haven't touched python in any real capacity in over two years, so I'm a dead fish.

Here's the only thing I want to add. Don't hide Github's API responses. They annoy me because if you search for a repo that does exist, but you're not logged in, they return a 404 rather than a 401/403. I think this is some security by obscurity and a script kidding will stop with a 404, but a 401/403 says try harder. /tangent /rant

Anyways, I want to know that Github returned that 404, so don't obscure/obfuscate their response.

My $.03 cents.

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Jan 31, 2017

Don't hide Github's API responses.

So let's approach this with you being a user. What would you want back to not hide/obscure it? An exception? A different object that's a little like a nullobject? You can still rely on it being falsey, but it has extra data and doesn't pretend to have every method imaginable? I'm not sure what the right thing is here that won't obscure important information and won't confuse users.

@esacteksab

This comment has been minimized.

Collaborator

esacteksab commented Jan 31, 2017

My stupid is gonna shine through, please allow some ignorant pseudo code

Scenario:
I'm not logged-in/auth'd. "my_repo" is a private repo.

repo = gh3.get_repo("my_repo")

I'd expect something like

repo.response() or repo.reponse_text()

And it pukes "404. LOLNOPE" << this being the response from GH's API.

Don't hide that response.

@guyzmo

This comment has been minimized.

guyzmo commented Jan 31, 2017

if I may add $0.02 to @esacteksab to make it $0.05:

it'd be nice to offer the option of raising an error if the call does not succeed and we've passed some sort of raise_errors attribute to github3.py. That way, you might offer a general use case where it returns None if there's no result, but get the full HTTP response (as a NullObject in Exceptions.args or something like that) if you really want to discriminate the error.

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Jan 31, 2017

@guyzmo so presently all exceptions contain the response (or if there is no response, the original exception).

Here's my only counter-point to @esacteksab's thoughts: The only time we return None from a method is a 404, and there's nothing in the 404 response that has any value for someone trying to debug the issue.

I'm also firmly of the opinion that libraries like this should choose one design and should not be configurable to change fundamental behaviour.

We can absolutely default to raising 404s as exceptions if people would find that useful, but I still fail to see the value in forcing users to litter their code with

try:
    gh.<something>(...)
except github3.exceptions.NotFound:
    # handle it, or not
@guyzmo

This comment has been minimized.

guyzmo commented Jan 31, 2017

Your library is by far the most comfortable one to use to control a git service (compared to gitlab's ones or bitbucket's ones), so whatever you decide, I'll be happy using it ☺ That said:

[…] I still fail to see the value in forcing users to litter their code […]

it depends on the use case, the real advantage with exceptions is to let them flow and catch them up in the stack:

def do_something_with_issue(…):
        gh = …
        repository = gh.repository(…)
        issue = repository.issue(…)
        use(issue)

def main():
    try:
        do_something_with_issues(…)
        …
    except github3.exceptions.NotFound:
        # nice error
    except github3.exceptions.NotAuthorized:
        # nice error

there you don't need to clutter your code with neither:

if not repository:
    # return or raise?

you could even safely write:

gh.repository(…).issue(…).owner

and get a proper error instead of an AttributeError.

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Jan 31, 2017

True. All that said, it's rather easy to flip this switch right now. I'm hacking on splitting up the User object right now. I think I'll tack this onto the list.

This will simplify our internals too. If doing self._json(response, 200) raises an exception, then we don't have to worry about it returning something that isn't json to pass into an object. the code then looks like

url = self._build_url(...)
params = {...}
response = self._request(VERB, url, params=params)
json = self._json(response, SUCCESS_STATUS_CODE)
return Object(json, self)
@guyzmo

This comment has been minimized.

guyzmo commented Jan 31, 2017

Generally speaking, I think the most pythonic strategy with errors is 'fail early, fail loudly' (explicit etc.).

I'd say, though, it'd be nice to have in the exception in information about the github object that was queried and triggered the exception. In my example earlier, it'd be great to print an error with print("Not found happened when pulling {} from github.".format(err.args[0]).

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Jan 31, 2017

I'd say, though, it'd be nice to have in the exception in information about the github object that was queried and triggered the exception.

As in the object/object type and method where the exception came from? Yeah no. That will never happen. A few reasons why:

  1. That will mean that objects will have longer lifetimes, including objects that hold onto sessions (which hold onto sockets, etc.) and which will cause no end of headaches for anyone using this library in a long-running application.

  2. To do that, we would have to lift the exception raising pieces up out of where they would live and add a ton more information. That's far too much work to do for every method that could raise a 404.

  3. Introspection about the current type is flimsy and may (read: probably) will break if further updates aren't carefully reviewed. That's far too much pressure for an entirely volunteer team.

@guyzmo

This comment has been minimized.

guyzmo commented Jan 31, 2017

well, of course to ①, ② and ③!

I was just thinking of something like a non-instanciated Repository class, or Issue class, so we know what we could have had but failed to get.

An alternative would be to have in the exception message the name of the thing we tried to get, but failed to. But I guess, we can also get that by looking at the Response object carried by the exception.

@itsmemattchung

This comment has been minimized.

Contributor

itsmemattchung commented Jan 31, 2017

Not much comment, only a side note: Sorry for lagging so hard—a lot of free time chewed up in the last couple weeks.

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Jan 31, 2017

I was just thinking of something like a non-instanciated Repository class, or Issue class, so we know what we could have had but failed to get.

Nah. I don't like that at all. I know I'll get bug reports in that case because people will have a repo object but no data.

sigmavirus24 added a commit that referenced this issue Feb 4, 2017

Split up Repository object
So far we have two Repository objects (ShortRepository, Repository). In
looking at API responses, it seems GitHub has removed many Repository
attributes altogether. I'll need to investigate this further.

Related-to: gh-670
@Julian

This comment has been minimized.

Julian commented Mar 9, 2017

I'm quite late to the party, and quite underinformed overall, just a new user :), so take the following with a grain of salt, but:

To me a few things are pretty clear:

As a user, I'd definitely prefer an exception over both a null object and an error object like None.

The null object pattern is fantastic, but I think it's the wrong thing here. This is pretty standard Python seemingly -- an error condition has happened, and as a user I want immediate notification that something has gone wrong. A data-less Repository object (or any other) just delays that inevitable fact, and None, or any other status response, just isn't rich enough.

As a specific example, the current app I'm porting over wants to know what scopes the token that is being used was configured with, with the knowledge that the data we need requires the repo scope. The GitHub API will respond back with an X-OAuth-Scopes header in any 404 response with that info, and my app would love to pull that out to say "you misconfigured your token" instead of simply reporting "there's no such repo". With a None, there's no such option.

Having the option to make github3.py abstract even more of that away from users, like doing the work to pull out the X-OAuth-Scopes header itself and possibly turn it into its own richer exception, seems tempting, but I suspect will involve quite a bit of work and maintenance to keep up with API changes.

As for splitting objects up: that also seems to make sense to me, but I'm skeptical about how much that seems to be being exposed to end-users of the library. I'd have to look closer, but my intuition tells me that the fact that GitHub returns partial objects is something that can be much more hidden from end users -- e.g. you have some _IncompleteResponse object which encapsulates the partial response and also a way to make it reify into a complete one, but for users, that reification is transparent, it happens just if a user tries to use things that were not present in the original partial response.

I think this probably shouldn't require individual implementations for each model object, it'd just require each model to be a bit more transparent about the data it needs to be whole, but I could be wrong.

On the other hand, I've already been bitten by github3.py making network requests at times I did not expect it to, so potentially users should want that implementation detail to be more obvious, but even so, I suspect it could be done in a way that did not require individual implementations for each model.

Would love to discuss any of this further, especially if I'm way off here.

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Mar 9, 2017

As a specific example, the current app I'm porting over wants to know what scopes the token that is being used was configured with, with the knowledge that the data we need requires the repo scope. The GitHub API will respond back with an X-OAuth-Scopes header in any 404 response with that info, and my app would love to pull that out to say "you misconfigured your token" instead of simply reporting "there's no such repo". With a None, there's no such option.

So, every exception includes a response attribute. That should allow you to introspect that once we start raising exceptions instead of returning None.

On the other hand, I've already been bitten by github3.py making network requests at times I did not expect it to

That's concerning. We've tried to make that completely transparent sans adding to every method that it does/does not make a network request.

Can you demonstrate the code (in a separate issue) that surprised you?

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 16, 2017

Split up Organization objects
This creates 3 new objects, one for iterations (Short), one for direct
GETs (Organization), and one for events (EventOrganization). Most
attributes can now be directly assigned, aside from a few that are only
returned in the API if they are set on the server. Default those to
None.

Many test cassettes needed to be updated to pick up attributes added to
orgs since the cassette was recorded (yay for additive REST APIs...).

Introduce a 'auto_login' method to handle either token auth or
username/password auth. Use it in the places I needed to update the
cassette.

Fix a few POST calls that were sending malformed (according to GitHub
API) json data. Specifically an empty permission is not accepted.

Related-to: sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 16, 2017

Split up Organization objects
This creates 3 new objects, one for iterations (Short), one for direct
GETs (Organization), and one for events (EventOrganization). Most
attributes can now be directly assigned, aside from a few that are only
returned in the API if they are set on the server. Default those to
None.

Many test cassettes needed to be updated to pick up attributes added to
orgs since the cassette was recorded (yay for additive REST APIs...).

Introduce a 'auto_login' method to handle either token auth or
username/password auth. Use it in the places I needed to update the
cassette.

Fix a few POST calls that were sending malformed (according to GitHub
API) json data. Specifically an empty permission is not accepted.

Related-to: sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 17, 2017

Split up Organization objects
This creates 3 new objects, one for iterations (Short), one for direct
GETs (Organization), and one for events (EventOrganization). Most
attributes can now be directly assigned, aside from a few that are only
returned in the API if they are set on the server. Default those to
None.

Many test cassettes needed to be updated to pick up attributes added to
orgs since the cassette was recorded (yay for additive REST APIs...).

Introduce a 'auto_login' method to handle either token auth or
username/password auth. Use it in the places I needed to update the
cassette.

Fix a few POST calls that were sending malformed (according to GitHub
API) json data. Specifically an empty permission is not accepted.

Related-to: sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

sigmavirus24 added a commit to omgjlk/github3.py that referenced this issue Dec 18, 2017

Clean up docstrings in github3.orgs
Clean them up as done in sigmavirus24#680 for users.py

Related-to: sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

sigmavirus24 added a commit to omgjlk/github3.py that referenced this issue Dec 18, 2017

Split up Organization objects
This creates 3 new objects, one for iterations (Short), one for direct
GETs (Organization), and one for events (EventOrganization). Most
attributes can now be directly assigned, aside from a few that are only
returned in the API if they are set on the server. Default those to
None.

Many test cassettes needed to be updated to pick up attributes added to
orgs since the cassette was recorded (yay for additive REST APIs...).

Introduce a 'auto_login' method to handle either token auth or
username/password auth. Use it in the places I needed to update the
cassette.

Fix a few POST calls that were sending malformed (according to GitHub
API) json data. Specifically an empty permission is not accepted.

Related-to: sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 19, 2017

Update docstrings for pulls.py
Related-to sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 19, 2017

Split up PullRequest objects
This creates 3 classes; one for iterations (Short), one for direct GETs
(PullRequest), and one for events (EventPullRequest). Most attributes
are directly assigned, except where otherwise commented.

A couple cassettes needed to be updated, and the sample json for
pull_request needed to be updated as well. Updating that one required
updating the tests for the now current data.

Related-to sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 19, 2017

Split up PullRequest objects
This creates 3 classes; one for iterations (Short), one for direct GETs
(PullRequest), and one for events (EventPullRequest). Most attributes
are directly assigned, except where otherwise commented.

A couple cassettes needed to be updated, and the sample json for
pull_request needed to be updated as well. Updating that one required
updating the tests for the now current data.

Related-to sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 20, 2017

Split up Issue objects
This creates 3 classes; one for iterations (Short), one for direct GETs
(Issue), and one for events (EventIssue). Most attributes
are directly assigned, except where otherwise commented.

Some cassettes needed to be updated for the relatively new 'assignees'
attribute that now comes back. A sample json needed to be updated as
well. Cassettes that needed updated that were associated with
authenticated calls got the calls updated to use auto_login as well.

The search tests needed to be updated as well, label name changed and
syntax changed slightly.

Related-to sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

omgjlk added a commit to omgjlk/github3.py that referenced this issue Dec 21, 2017

Split up Issue objects
This creates 3 classes; one for iterations (Short), one for direct GETs
(Issue), and one for events (EventIssue). Most attributes
are directly assigned, except where otherwise commented.

Some cassettes needed to be updated for the relatively new 'assignees'
attribute that now comes back. A sample json needed to be updated as
well. Cassettes that needed updated that were associated with
authenticated calls got the calls updated to use auto_login as well.

Some cassettes will break between older requests and newer, so tag those
tests accordingly. Allow the tests to work in older requests land for
now.

The search tests needed to be updated as well, label name changed and
syntax changed slightly.

Related-to sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>

sigmavirus24 added a commit that referenced this issue Dec 26, 2017

Split-up Gist-related objects as necessary
The Gist and GistFile objects have both short and regular
representations. Further, we needed a separate representation for Gist
forks that are part of the Gist representation.

Refs gh-670

sigmavirus24 added a commit that referenced this issue Dec 26, 2017

Split-up Gist-related objects as necessary
The Gist and GistFile objects have both short and regular
representations. Further, we needed a separate representation for Gist
forks that are part of the Gist representation.

Refs gh-670

omgjlk added a commit to omgjlk/github3.py that referenced this issue Jan 4, 2018

Clean up Release classes
This started as an effort to break up Release classes for "Short"
versions. However I discovered that there is no difference in data when
iterating over a Release or an Asset vs a direct GET of them. I still
cleaned up the classes to be more in line with other efforts.

Remove custom header as Releases are out of prerelease.

Update the example data sets with real release data from our own
repository.

Account for upload url differences from real data.

Related: sigmavirus24#670

Signed-off-by: Jesse Keating <jkeating@j2solutions.net>
@omgjlk

This comment has been minimized.

Collaborator

omgjlk commented Jan 24, 2018

So @sigmavirus24 , where do we stand on this? I haven't gone digging to see if there is more to revamp, but is there an overall test or something we can use to see if our coverage is right? What else needs to be done to consider this ready, and to consider a release?

@sigmavirus24

This comment has been minimized.

Owner

sigmavirus24 commented Jan 27, 2018

Sorry @omgjlk there's a lot to parse in this thread. I added two more items that I think help summarize what I understand to be left. I'm kind of covering the last item in my WIP Pr.

@omgjlk

This comment has been minimized.

Collaborator

omgjlk commented Jan 27, 2018

Thanks. We should probably add the open PR (I'll be fixing it this weekend) to make session non-optional.

sigmavirus24 added a commit that referenced this issue Mar 13, 2018

Raise exceptions for unexpected status codes
Also raise exceptions on any 400 or greater status code as discussed in
gh-670

As it stands today, it's hard for people to determine whether None is
returned because that's the API's response or because None is what we
return in the event of a 404 or some other "error"-like status code.

This change undoes that ambiguity and allows people to handle missing or
permission-related problems using exceptions.

sigmavirus24 added a commit that referenced this issue Mar 13, 2018

Allow .refresh to return the longer object
This adds logic so that Short* classes can return the fuller
representation. For example:

    >>> import github3
    >>> users = [(u, u.refresh()) for u in github3.all_users(10)]
    >>> for short_user, user in users:
    ...     assert isinstance(short_user, github3.users.ShortUser)
    ...     assert isinstance(user, github3.users.User)

Will run without exceptions.

Related to gh-670

AndreasBackx added a commit to AndreasBackx/github3.py that referenced this issue Apr 8, 2018

Added Collaborator, Contributor and Repository.collaborators(affiliat…
…ion).

See sigmavirus24#730 for some background information. This still fails 1 test because the
cassette Repository_collaborators.json needs to be updated with the
permissions as it somehow does not contain that information. The GitHub API
does say that it should be there, but the recording is from 2014 so perhaps it
wasn't returned back then.

Note that I have currently put the named parameter `affiliation` at the front
of the arguments in `Repository.collaborators`. I did this to be in line with
the other similar iterators. This can be moved to the back, but I assume some
backwards-incompatible changes are fine for 1.0.0?

This uses separate classes for the specific User object uses as I understood
the idea was from sigmavirus24#670.

AndreasBackx added a commit to AndreasBackx/github3.py that referenced this issue Apr 8, 2018

Added Collaborator, Contributor and Repository.collaborators(affiliat…
…ion).

See sigmavirus24#730 for some background information.

Note that I have currently put the named parameter `affiliation` at the front
of the arguments in `Repository.collaborators`. I did this to be in line with
the other similar iterators. This can be moved to the back, but I assume some
backwards-incompatible changes are fine for 1.0.0?

This uses separate classes for the specific User object uses as I understood
the idea was from sigmavirus24#670.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment