Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine new API URL structure for warehouse (starting with new JSON API) #284

Open
ctheune opened this issue Apr 15, 2014 · 35 comments
Open
Labels
APIs/feeds feature request needs discussion a product management/policy issue maintainers and users should discuss

Comments

@ctheune
Copy link

ctheune commented Apr 15, 2014

At the PyCon2014 sprint I have started to make bandersnatch easier to cache. This means moving away from XML-RPC in general.

I'm leveraging the existing /pypi//json API which already helps, but I'll need two more endpoints:

  • get a list of all packages and their most recent serial
  • get the changelog

I implemented the necessary code on a branch for PyPI:
https://bitbucket.org/ctheune/pypi/branch/ctheune-bandersnatch-json

However, I don't wanna force this through but have a decision how the URLs should look like.

Ideally we can implement this in both warehouse and PyPI in a way that bandersnatch can support both of them without breaking when you guys switch the public server (and I might be on vacation. ;) )

@r1chardj0n3s r1chardj0n3s changed the title Decide for better json API URL structure Determine new API URL structure for warehouse (starting with new JSON API) Apr 15, 2014
@dstufft
Copy link
Member

dstufft commented Apr 15, 2014

So I have some ideas on both a new API for accessing data about PyPI and also some rough ideas for a new mirroring API in general. I'll take a look at what you have so far.

@dstufft
Copy link
Member

dstufft commented Apr 15, 2014

So if I read your PR correctly, the new URLs would be https://pypi.python.org/json/changes and https://pypi.python.org/json/packages? If that's the case then I'm not really a big fan for adding those in Warehouse.

Ideally what i'd like to do is get a nice hypermedia based API setup probably rooted at /api/. Using something based on https://jsonapi.org/ is a possibility. There are a few options and need to dive into it to figure out what exactly needs done. Ideally the new API will also replace the existing JSON api and we can deprecate the old JSON api (but leave it in place until (or if!) it's no longer getting traffic.

@r1chardj0n3s
Copy link

I echo @dstufft in this. The question I have is whether we go all the way to /api/v0/ to future-proof us a little too. Unless we'd be happy with /api-v1/ or similar later on?

@dstufft
Copy link
Member

dstufft commented Apr 15, 2014

So there are two ways to deal with that, one way is to version using the content type, so it's always /api/ but it'll select the version based on the content type, github uses this like: Accept: application/vnd.github.beta+json or Accept: application/vnd.github.v3+json. The other way is to do /api/v0/ etc. I lean towards using the content type but we'll need to figure out in general how we want to handle versioning going forward and how the code to handle that looks like.

@steveklabnik
Copy link

Just let me know if I can help out at all regarding JSON API stuff.

@nlhkabu nlhkabu added the requires triaging maintainers need to do initial inspection of issue label Jul 2, 2016
@brainwane brainwane added feature request needs discussion a product management/policy issue maintainers and users should discuss labels Jan 24, 2018
@brainwane
Copy link
Contributor

As I understand it, this issue (designing and implementing a new Warehouse API) is a prerequisite for integrating twine into pip and thus dealing with pypa/packaging-problems#76 and pypa/packaging-problems#60 , per @dstufft's comment in pypa/twine#127. Is that correct? If so, I'd suggest we add this to one of our upcoming milestones.

@brainwane brainwane added this to the 5: Shut Down Legacy PyPI milestone Jan 30, 2018
@brainwane
Copy link
Contributor

We talked about this issue in today's bug triage meeting and folks explained to me: Even though this may be necessary for some Twine improvements, this is not a ticket we will address before launch. This is a new feature and is best suited for post-launch; Warehouse needs to be done before we can improve twine.

@phildini
Copy link

Hello! Quick call-out that some of us would really enjoy the JSON API containing info like owners/maintainers before the XMLRPC API is shut down. See ticket linked right above. Cheers, thanks for all your work!

@brainwane brainwane modified the milestones: 5: Shut Down Legacy PyPI, 6. Post Legacy Shutdown Mar 6, 2018
@brainwane
Copy link
Contributor

I've marked #2914 as something we should address before shutting down legacy PyPI, but developing the structure for the new API can wait till after we shut down the legacy site.

@brainwane
Copy link
Contributor

As we develop the new API we should consider #347 as well. And I've added this issue to the list of things we might work on at the PyCon sprints.

@theacodes
Copy link
Contributor

I would also love an API for managing both my account and my projects. For some examples of where this is useful:

  1. We have an account that owns all of the projects our organization publishes. I want to rotate its password every week.
  2. Likewise, I want to audit all of my organization's project and verify that no more than n people have admin access to it.
  3. I am actually in the process of migrating all of my projects in my personal account to a new account. It would be cool to do that programmatically.

I'm happy to help with the design and discussions around this (my day job is helping design APIs and implement clients for Google Cloud Platform).

@di
Copy link
Member

di commented Apr 17, 2018

I am actually in the process of migrating all of my projects in my personal account to a new account. It would be cool to do that programmatically.

We could probably just call this "the ability to add/remove collaborators via API" I think, since actual account migration is probably not something that happens very often.

@dstufft
Copy link
Member

dstufft commented Apr 17, 2018

I'm hoping to carve out some ideas on this soon, maybe next week? Ideally the output of this ticket is the basic framework/skeleton of the API, and then further tasks can extend the functionality of it.

Defining APIs for PyPI is a tad bit trickier than the general case, because we generally have to design for a decade+ (for instance, XMLRPC got added, but it has not aged or scaled well! From my investigations so far, GraphQL would be a similar mistake). I'm almost certain that something Hypermedia based is the way forward here, but there's a lot of different ways to take that, we'll also need to be ensure to include all of the typical scaling things like pagination and the like.

@theacodes
Copy link
Contributor

theacodes commented Apr 17, 2018

We could probably just call this "the ability to add/remove collaborators via API" I think, since actual account migration is probably not something that happens very often.

Yep just calling a a specific use case.

I'm hoping to carve out some ideas on this soon, maybe next week? Ideally the output of this ticket is the basic framework/skeleton of the API, and then further tasks can extend the functionality of it.

Sounds good, happy to review and be around to bounce ideas off of (I'm on IRC during PST working hours as thea).

Hypermedia based is the way forward here, but there's a lot of different ways to take that, we'll also need to be ensure to include all of the typical scaling things like pagination and the like.

Agreed - REST/JSON (and to some extent RPC/JSON) has more or less stood the test of time (in tech years, at least). Happy to provide feedback on that sort of stuff as well.

@dstufft
Copy link
Member

dstufft commented May 18, 2018

@theacodes I guess I'm just not seeing what an IDL actually gets us here? The example of JSON Hyper Schema has JSON-Schema as part of it, it's just instead of your client hardcoding URLs and actions all over the place, it can discover them at runtime. You can also ship them as part of your client so that a network access isn't required in the common case (unless you introduce a new schema).

@dstufft
Copy link
Member

dstufft commented May 18, 2018

@theacodes If it would be helpful, I'm happy to jump on a call or into IRC to go over the two things in a higher bandwidth setting instead of throwing github comments back and forth. I feel like there are probably some misconceptions on both sides about RPC and Hypermedia, and perhaps a higher bandwidth mechanism would help to work out what those are?

@theacodes
Copy link
Contributor

I don't want to hold up progress. I would love to see a design doc or proof of concept if/when we have one.

@asmacdo
Copy link
Contributor

asmacdo commented May 25, 2018

POC is up #4078, I'm using this etherpad to document design proposal.
https://pad.sfconservancy.org/p/hypermedia_api_design

I did my best to incorporate the ideas discussed here, as well as in person discussions at pycon. I've set aside some time to keep working on this, so all feedback is welcome.

swhmirror pushed a commit to SoftwareHeritage/swh-lister that referenced this issue Jul 31, 2018
Following discussion with team, the xmlrpc api is not deprecated
today. It will not disappear soon.

Also, as:

- parsing the legacy html api [1] is considered bad practice

- discussions exist to create equivalent apis to their
  deprecated/legacy apis [1] [2]

We chose to implement the xmlrpc one.

[1] https://warehouse.readthedocs.io/api-reference/legacy/#simple-project-api

[2] pypi/warehouse#284

[3] pypi/warehouse#4078

Related T422
phildini pushed a commit to phildini/warehouse that referenced this issue May 8, 2019
Adds a new API that covers the usage of the XMP-RPC and the simple api.
pypi#284

This work is intended as a proof of concept for how a hypermedia API
could be implemented, setting up the patterns that can be extended to
cover the rest of the API. The new API introduces pagination to reduce
the load for list views.  Serializers are used to increase
maintainability and code reuse. Some filtering is added to meet the use
cases of XML-RPC. Many thanks to @werwty for hacking out an initial
implementation which has been squashed.

Introduces new dependencies:
   apispec==0.37.0 : Used to generate an api spec at /api/
   marshmallow==3.0.0b10 : Used to serialize responses
   PyYAML==3.12 : Dependency of apispec

All new endpoints are added to a new domain, "sandbox".

Note: Locally, all subdomains were treated just like the actual domain
so I was unable to make the subdomain works as expected. I followed the
pattern that forklift uses, and guessed how it should work.
@brainwane
Copy link
Contributor

Per discussion in IRC just now -- the author closed #4078 last year, and it's unclear whether this kind of feature would be welcome if someone were to try again at implementing it.

@brainwane
Copy link
Contributor

Maintainers' opinions are welcome. Also, in my opinion, it would be easier to finalize design and implementation, test and review, and deploy this if we had funding for it.

@brainwane
Copy link
Contributor

@asmacdo that Etherpad has now dissolved and reset -- did you keep a copy of your design proposal anywhere else?

Reminder to others that work on this could probably use funding.

@asmacdo
Copy link
Contributor

asmacdo commented Nov 10, 2020

@brainwane unfortunately I dont have a backup , but the PR could still be distilled down into a design proposal.

Key points:

  • Create a resource based, hypermedia API
  • Use Marshmallow to serialize
  • use apispec to generate OpenAPI schema

Additional necessary features

  • Pagination
  • Filtering
  • CDN caching (consider how pagination/filtering will affect)

@di
Copy link
Member

di commented Jun 26, 2022

Bit of a related update here: PEP 691 has been accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APIs/feeds feature request needs discussion a product management/policy issue maintainers and users should discuss
Projects
None yet
Development

No branches or pull requests