Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up. #17

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Clean up. #17

wants to merge 1 commit into from

Conversation

Danilka
Copy link
Contributor

@Danilka Danilka commented Feb 22, 2018

  • Added 500 automatic retry, since Ensembl returns it quite often.
  • Cleaned up tests and updated comparison logic to be more universal. (Had to comment one out, since it is constantly returning 500, even on the official site.)
  • More PEP8 compliance.
  • Bumped the version.
  • Updated examples to be compatible with Python3

More PEP8 compliance.
Bumped the version.
@bunop
Copy link
Contributor

bunop commented Feb 23, 2018

Thank you @Danilka for your contribute. I write this consideration since I've contribuited too to this project before. In my idea, I want the user deal explicitely with exceptions when a 500 status is found. This because EnsEMBL enpoints are very fragile and when things go wrong with a higher rate, a user need to contact the EnsEMBL team and report the problem. When working with EnsEMBL API, I understand that this API endpoint acts as proxy for different REST instances. Sometime one of them need to be restarted so any request that reach such endpoint could fail for some reason. Consider the two test failing now in CI: the json response is not deterministic and the order of data returned changes. This is the reason why we don't call the assertEqual method of unittest but a custom method defined in pyEnsemblRest test module. Here, we do a reference request with curl and then a test request with the library, and test fails since the two response differ! I don't know how to handle this, is difficult to determine if the result is completed or not; I think that interation with the user will suggest to ensembl team to support more those features.
Moreover, could this library be sponsored by the EnsEMBL team? we deal with the rate limiting conditions and with some errors, I think it could be useful to enlarge the community to support this project instead of writing own custom implementation.
Regards.

Copy link
Contributor

@bunop bunop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions. Nice work! thank you!

elif resp.status_code == 500:
# Retrying when we get a 500 error.
# Due to Ensembl's condition on randomly returning 500s on valid requests.
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll think of a scenario. However, I'm not sure of where to get a 500 predictably.

message = json_message["error"]
except ValueError:
# In this case we didn't even get a JSON back.
message = "Server returned invalid JSON."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. However, is this called? could you check this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to cover this in the next version of testing.



curl_cmd = """curl 'http://rest.ensembl.org/lookup/id' -H 'Content-type:application/json' """\
"""-H 'Accept:application/json' -X POST -d '{ "ids" : ["ENSG00000157764", "ENSG00000248378" ] }'"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be expressed in a more pep8 way as:

curl_cmd = (
        """curl 'http://rest.ensembl.org/lookup/id' -H 'Content-type:"""
        """application/json' -H 'Accept:application/json' -X POST -d '"""
        """{ "ids" : ["ENSG00000157764", "ENSG00000248378" ] }'"""
        )

You will have less than 80 column and I think it will be more readable (I learn this after writing those test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you on this one. I just quickly went through all of them. Feel free to change any on these to be more readable. It was just way too many of them to fix each one properly this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bunop I believe, as a contributor you should have write access to my repo as well.

@Danilka
Copy link
Contributor Author

Danilka commented Feb 23, 2018

My understanding is that EnsEMBL's Rest client is done via these packages that just looks into their complex database: https://github.com/search?utf8=%E2%9C%93&q=api+language%3Aperl+org%3AEnsembl+&type=Repositories

You might be right about restarting, however I've seen multiple ties where request would return a 500 and then properly respond a second later. That's why I thought that a retry makes sense. Otherwise it will cause the package user to send a bunch of emails and the result could not even be reproducible.

As for sponsorship by the organization. It's a great idea, but I'm not sure if any of us is up for pushing this through. @gawbul correct me if I'm wrong on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants