PyPI Integration Planning #1957

Closed
ericholscher opened this Issue Jan 26, 2016 · 21 comments

Comments

Projects
None yet
5 participants
@ericholscher
Member

ericholscher commented Jan 26, 2016

PyPI Integration Planning

This ticket will track the planning of our integration with PyPI.
It will act as a design document for now,
and track progress over time.

Large Questions

  • Will our integration live on readthedocs.org, or another domain?
    • <pypi-slug>.pydoc.org, <pypi-slug>.pypi.readthedocs.org?
    • ANS: Use a non-RTD domain, either language branded (pydoc.org?), or generic (apidocs.org?)
  • Will our integration use the same Rackspace infra?
    • Same build servers & web servers?
    • ANS: Yes
  • Will this use our existing code base, or become a subset built on top of readthedocs-build?
    • ANS: Our code base
  • Will it be configurable with a readthedocs.yml in the PyPI distribution tarball?
    • ANS: Yes, but only small changes (eg. theme colors, logo). Keep a consistent UI for users benefit, not project owners "branding"
  • Do we use a CDN because packages can't change once they are released?
    • ANS: Perhaps, not super material to development
  • How will we handle integration with Warehouse, so that we can trigger builds?

Implementation

Parsing

We currently plan to generate API documentation for all packages uploaded to PyPI.
This will be done in a parse-only way,
instead of attempting to import every package that exists,
which would be folly.

Currently there are two packages that might work for this with modifications:

  • epydoc's parse only mode
  • pydoctor

Both are currently Python 2 only,
and don't fully support what we need them to do.

We might also end up just building our own implementation that works for our specific use case,
instead of trying to hack something else that exists but doesn't quite do what we want.

Code

We will abstract the Project object, allowing it to specify how it gets versions:

  • Pypi: API
  • VCS: local clone
  • Tarball: None
  • Pull Requests: This will likely be a tarball, but would be neat to include in this work.

This will then simplify the Doc building Tasks, and allow us to have a proper build abstraction. Also unify the VCS code to use standard build environments (and get logging for commands)

We should build some kind of "immutable" object type that applies to Pypi packages and VCS tags. This will allow us to have special logic (CDN, ignoring future builds) that apply to those types of packages.

Planned Implementation Phases

  • Build automated API documentation for a small set of projects we control (DUE: End of Sept.)
  • Slowly add third party projects as test cases (Django, Requests, Pip, etc)
  • Enable opt-in generations of docs for all projects on PyPI
  • (Optional) Turn on auto-building for all projects

Notes

Warehouse currently doesn't have an API, but we can poll this RSS feed, and keep track of updates that way: https://warehouse.python.org/rss/updates.xml

Checklist

  • Improve AST parsing (rtfd/sphinx-autoapi#78)
  • Abstract RTD backend to allow code from non-VCS endpoints
  • Update RTD deployment to allow support for pypi packages
@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Mar 9, 2016

Member

We should probably work the ability to build GitHub pull requests into this project as well, as it requires breaking up the "Version" concept as well.

Member

ericholscher commented Mar 9, 2016

We should probably work the ability to build GitHub pull requests into this project as well, as it requires breaking up the "Version" concept as well.

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Jun 6, 2016

Member

Another interesting library: https://github.com/PyCQA/pydocstyle

Member

ericholscher commented Jun 6, 2016

Another interesting library: https://github.com/PyCQA/pydocstyle

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Aug 19, 2016

Member

This issue is following the AST issue here, we should adopt whatever gets created: davidhalter/jedi#630

Member

ericholscher commented Aug 19, 2016

This issue is following the AST issue here, we should adopt whatever gets created: davidhalter/jedi#630

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Aug 26, 2016

Member

Another sizable issue with pydocstyle is that it doesn't support variables defined at the module or class level. This is a useful thing that epydoc does. Redbaron seems to do it too in my initial exploration, but is quite slow.

Member

ericholscher commented Aug 26, 2016

Another sizable issue with pydocstyle is that it doesn't support variables defined at the module or class level. This is a useful thing that epydoc does. Redbaron seems to do it too in my initial exploration, but is quite slow.

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Aug 26, 2016

Member

PyPI API's:

dstufft: there's no new APIs in Warehouse as of yet-- plan on doing up a new one at some point and deprecating old stuff (particularly xmlrpc...) but been more focused on feature parity

Member

ericholscher commented Aug 26, 2016

PyPI API's:

dstufft: there's no new APIs in Warehouse as of yet-- plan on doing up a new one at some point and deprecating old stuff (particularly xmlrpc...) but been more focused on feature parity

@ericholscher

This comment has been minimized.

Show comment
Hide comment
Member

ericholscher commented Oct 10, 2016

Interesting design decisions here: http://www.rubydoc.info/gems/yard/file/docs/Overview.md

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Oct 10, 2016

Member

Had a chat in the ##python-code-quality IRC channel.

10:53 ericholscher: 
Hi -- does anyone have an opinion on what AST tooling might be the best to get docstrings out of python code?
10:53 ericholscher:
Currently I'm using the pydocstyle AST, but I'm hitting issues with a few things (including Django) -- and wondering if there might be another more mature option
10:53 ericholscher:
I read through this GH issue, which touches on it: https://github.com/davidhalter/jedi/issues/630 -- but doesn't seem to have a clean conclusion
10:53 ericholscher:
perhaps I should ask there :)
10:53 sigmavirus:
ericholscher: hm, other than just using the AST in the stdlib?
10:54 ericholscher:
well, there are a bunch of fancy tools that exist, which I assume makes it easier? -- the stdlib AST module isn't exactly....documented :)
10:54 sigmavirus:
ericholscher: true, I don't think there exist tools that make doing what you want specifically easier though
10:54 sigmavirus:
Or at least, I don't know of them :/
10:54 sigmavirus:
And most things build atop the ast module in the stdlib :/
10:55 ericholscher:
I guess another question is, then, why did pydocstyle create its own AST parser, instead of using the stdlib?
10:55 ericholscher:
it seems to use the stdlib tokenizer, but do its own parsing, afaict
10:56 Bram:
ericholscher: redbaron might work for you http://redbaron.readthedocs.io/en/latest/ it's its own ast (cst/fst actually), docstring support isn't buildin but it's very easy to query things
10:56 sigmavirus:
I'm not sure which parts you're talking about but I"m certain the person who wrote the code isn't around anymore
10:56 Bram:
downsides: slow && python 2 only
10:57 ericholscher:
heh, that's kind of a non-starter for me, sadly
10:57 sigmavirus:
ericholscher: to be fair, the only thing I ever do is pass in a compiles AST to pydocstyle via flake8-docstrings, so I'm not sure what it did
10:58 ericholscher:
aye
10:58 sigmavirus:
But I don't know that it built it's own AST builder from scratch (I don't know of any code-quality tool that does that)
10:58 ericholscher:
https://github.com/PyCQA/pydocstyle/blob/master/src/pydocstyle/parser.py#L309-L337 is a good example
10:59 ericholscher:
I just feel like I'm almost certainly reinventing wheels here (trying to basically rebuild epydoc on top of Sphinx & a parse-only AST module) -- but I guess perhaps it's actually new things that need to be written
11:01 sigmavirus:
So there have been demands from users to keep track of functions that have decorators applied because those affect docstrings, but Idk why they chose that route
11:01 ericholscher:
I think for now I'll just go ahead w/ the pydocstyle parser, and probably end up writing my own once I fully understand the problem
11:01 sigmavirus:
ericholscher: seems fair
11:02 ericholscher:
or publish a blog post saying it can't be done, and wait :D
11:03 ericholscher:
anyway -- thanks for the help, it confirms my suspicions, and allows me to move forward with less anxiety
11:04 sigmavirus:
ericholscher: always happy to help reduce anxiety
Member

ericholscher commented Oct 10, 2016

Had a chat in the ##python-code-quality IRC channel.

10:53 ericholscher: 
Hi -- does anyone have an opinion on what AST tooling might be the best to get docstrings out of python code?
10:53 ericholscher:
Currently I'm using the pydocstyle AST, but I'm hitting issues with a few things (including Django) -- and wondering if there might be another more mature option
10:53 ericholscher:
I read through this GH issue, which touches on it: https://github.com/davidhalter/jedi/issues/630 -- but doesn't seem to have a clean conclusion
10:53 ericholscher:
perhaps I should ask there :)
10:53 sigmavirus:
ericholscher: hm, other than just using the AST in the stdlib?
10:54 ericholscher:
well, there are a bunch of fancy tools that exist, which I assume makes it easier? -- the stdlib AST module isn't exactly....documented :)
10:54 sigmavirus:
ericholscher: true, I don't think there exist tools that make doing what you want specifically easier though
10:54 sigmavirus:
Or at least, I don't know of them :/
10:54 sigmavirus:
And most things build atop the ast module in the stdlib :/
10:55 ericholscher:
I guess another question is, then, why did pydocstyle create its own AST parser, instead of using the stdlib?
10:55 ericholscher:
it seems to use the stdlib tokenizer, but do its own parsing, afaict
10:56 Bram:
ericholscher: redbaron might work for you http://redbaron.readthedocs.io/en/latest/ it's its own ast (cst/fst actually), docstring support isn't buildin but it's very easy to query things
10:56 sigmavirus:
I'm not sure which parts you're talking about but I"m certain the person who wrote the code isn't around anymore
10:56 Bram:
downsides: slow && python 2 only
10:57 ericholscher:
heh, that's kind of a non-starter for me, sadly
10:57 sigmavirus:
ericholscher: to be fair, the only thing I ever do is pass in a compiles AST to pydocstyle via flake8-docstrings, so I'm not sure what it did
10:58 ericholscher:
aye
10:58 sigmavirus:
But I don't know that it built it's own AST builder from scratch (I don't know of any code-quality tool that does that)
10:58 ericholscher:
https://github.com/PyCQA/pydocstyle/blob/master/src/pydocstyle/parser.py#L309-L337 is a good example
10:59 ericholscher:
I just feel like I'm almost certainly reinventing wheels here (trying to basically rebuild epydoc on top of Sphinx & a parse-only AST module) -- but I guess perhaps it's actually new things that need to be written
11:01 sigmavirus:
So there have been demands from users to keep track of functions that have decorators applied because those affect docstrings, but Idk why they chose that route
11:01 ericholscher:
I think for now I'll just go ahead w/ the pydocstyle parser, and probably end up writing my own once I fully understand the problem
11:01 sigmavirus:
ericholscher: seems fair
11:02 ericholscher:
or publish a blog post saying it can't be done, and wait :D
11:03 ericholscher:
anyway -- thanks for the help, it confirms my suspicions, and allows me to move forward with less anxiety
11:04 sigmavirus:
ericholscher: always happy to help reduce anxiety
@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Nov 18, 2016

Member

Initial deploy is now live at https://www.pydoc.io

Member

ericholscher commented Nov 18, 2016

Initial deploy is now live at https://www.pydoc.io

@Carreau

This comment has been minimized.

Show comment
Hide comment
@Carreau

Carreau Nov 18, 2016

Contributor

Thanks @ericholscher for your blog post announcing pydoc.io. Happy to see something like this moving forward, and the effort you are putting behind this.

First I want to just letting you know (informally) that some people in the bay area are considering hosting a "Docathon" probably around end of january. Would you be interested in participating ? If so I can try to contact the organizers and see if you can join. The exact scope of this is still not really set yet, but I'm thinking a project like this could be interesting.

Second I see that the IPython docs returns a 404. Our docs is a bit weird for API generating because of metaclasses and traitlets. Let me know if I can do anything for that.

Third. I would use a subfolder for versions to potentially get aliases. 5.1.0/5.x/stable to redirect to the same version.

Fourth. There Seem to some similar efforts as a cross-language way in http://devdocs.io/ and DashApp. I completely see the interest in having a pydoc.io, are you aware of above tools ? Is there any consideration of generating fileformats for these tools directly ?

Fifth, I had other comments, but I forgot now.

Thanks !

Contributor

Carreau commented Nov 18, 2016

Thanks @ericholscher for your blog post announcing pydoc.io. Happy to see something like this moving forward, and the effort you are putting behind this.

First I want to just letting you know (informally) that some people in the bay area are considering hosting a "Docathon" probably around end of january. Would you be interested in participating ? If so I can try to contact the organizers and see if you can join. The exact scope of this is still not really set yet, but I'm thinking a project like this could be interesting.

Second I see that the IPython docs returns a 404. Our docs is a bit weird for API generating because of metaclasses and traitlets. Let me know if I can do anything for that.

Third. I would use a subfolder for versions to potentially get aliases. 5.1.0/5.x/stable to redirect to the same version.

Fourth. There Seem to some similar efforts as a cross-language way in http://devdocs.io/ and DashApp. I completely see the interest in having a pydoc.io, are you aware of above tools ? Is there any consideration of generating fileformats for these tools directly ?

Fifth, I had other comments, but I forgot now.

Thanks !

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Nov 18, 2016

Member

Thanks @ericholscher for your blog post announcing pydoc.io. Happy to see something like this moving forward, and the effort you are putting behind this.

First I want to just letting you know (informally) that some people in the bay area are considering hosting a "Docathon" probably around end of january. Would you be interested in participating ? If so I can try to contact the organizers and see if you can join. The exact scope of this is still not really set yet, but I'm thinking a project like this could be interesting.

Interesting. Definitely something I'd be interested in. Perhaps we should follow up in Twitter or email about that.

Second I see that the IPython docs returns a 404. Our docs is a bit weird for API generating because of metaclasses and traitlets. Let me know if I can do anything for that.

Yea, this is one of the issues with the AST tooling we're using. It is pretty naive and breaks on a lot of the larger and more popular libraries.

Third. I would use a subfolder for versions to potentially get aliases. 5.1.0/5.x/stable to redirect to the same version.

This is an issue I'm quite familiar with. I meant to fix that before this went live, and will definitely fix here soon.

Fourth. There Seem to some similar efforts as a cross-language way in http://devdocs.io/ and DashApp. I completely see the interest in having a pydoc.io, are you aware of above tools ? Is there any consideration of generating fileformats for these tools directly ?

Yep, super familiar. There are already a few ways of turning eg. Sphinx documentation into Dash and other formats. In general, once we get a standard way of introspecting code via parse-only AST's, we can output multiple different formats from there. We're starting with Sphinx just so that it can plug into the existing Python ecosystem (with intersphinx, etc.) -- but definitely happy to extend that functionality into other formats.

Member

ericholscher commented Nov 18, 2016

Thanks @ericholscher for your blog post announcing pydoc.io. Happy to see something like this moving forward, and the effort you are putting behind this.

First I want to just letting you know (informally) that some people in the bay area are considering hosting a "Docathon" probably around end of january. Would you be interested in participating ? If so I can try to contact the organizers and see if you can join. The exact scope of this is still not really set yet, but I'm thinking a project like this could be interesting.

Interesting. Definitely something I'd be interested in. Perhaps we should follow up in Twitter or email about that.

Second I see that the IPython docs returns a 404. Our docs is a bit weird for API generating because of metaclasses and traitlets. Let me know if I can do anything for that.

Yea, this is one of the issues with the AST tooling we're using. It is pretty naive and breaks on a lot of the larger and more popular libraries.

Third. I would use a subfolder for versions to potentially get aliases. 5.1.0/5.x/stable to redirect to the same version.

This is an issue I'm quite familiar with. I meant to fix that before this went live, and will definitely fix here soon.

Fourth. There Seem to some similar efforts as a cross-language way in http://devdocs.io/ and DashApp. I completely see the interest in having a pydoc.io, are you aware of above tools ? Is there any consideration of generating fileformats for these tools directly ?

Yep, super familiar. There are already a few ways of turning eg. Sphinx documentation into Dash and other formats. In general, once we get a standard way of introspecting code via parse-only AST's, we can output multiple different formats from there. We're starting with Sphinx just so that it can plug into the existing Python ecosystem (with intersphinx, etc.) -- but definitely happy to extend that functionality into other formats.

@glyph

This comment has been minimized.

Show comment
Hide comment
@glyph

glyph Nov 18, 2016

Sorry for the noise, but: It would be amazing if this supported pydoctor. I understand why sphinx-autoapi is the more popular option but pydoctor has a lot of features we really like and rewriting all of Twisted's epytext to be sphinx is probably not tractable.

Please let us know over at https://twistedmatrix.com/ if there's something we could change, or some metadata we could add to our project, to make this easier on your end.

glyph commented Nov 18, 2016

Sorry for the noise, but: It would be amazing if this supported pydoctor. I understand why sphinx-autoapi is the more popular option but pydoctor has a lot of features we really like and rewriting all of Twisted's epytext to be sphinx is probably not tractable.

Please let us know over at https://twistedmatrix.com/ if there's something we could change, or some metadata we could add to our project, to make this easier on your end.

@glyph

This comment has been minimized.

Show comment
Hide comment
@glyph

glyph Nov 18, 2016

Or over at https://github.com/twisted/pydoctor if you would like to file tool issues on pydoctor itself to make integration easier.

glyph commented Nov 18, 2016

Or over at https://github.com/twisted/pydoctor if you would like to file tool issues on pydoctor itself to make integration easier.

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Nov 18, 2016

Member

@glyph I've looked into Pydoctor before, in this ticket: twisted/pydoctor#93

Member

ericholscher commented Nov 18, 2016

@glyph I've looked into Pydoctor before, in this ticket: twisted/pydoctor#93

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Nov 18, 2016

Member

Granted, that was for using it to just get the docstring data, not actually do rendering of pydoctor-style rST. It sounds like what you want is for us to support pydoctor style docstrings in rST?

Member

ericholscher commented Nov 18, 2016

Granted, that was for using it to just get the docstring data, not actually do rendering of pydoctor-style rST. It sounds like what you want is for us to support pydoctor style docstrings in rST?

@glyph

This comment has been minimized.

Show comment
Hide comment
@glyph

glyph Nov 18, 2016

Pydoctor's input format is epytext, not rST.

However, pydoctor also does a few things Sphinx doesn't, like correct rendering of zope interface relationships (implements, provides) and I believe a few other decorators. I don't know where the appropriate integration point would be for that stuff.

glyph commented Nov 18, 2016

Pydoctor's input format is epytext, not rST.

However, pydoctor also does a few things Sphinx doesn't, like correct rendering of zope interface relationships (implements, provides) and I believe a few other decorators. I don't know where the appropriate integration point would be for that stuff.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Nov 20, 2016

Contributor

Is this issue where you want feedback? The blog post asks for feedback, but I can't see any particular indication of where to send it. ;-)

Looking through some of the docs on the beta, I see that _underscore_prefixed names are included. I'm sure you're aware of the Python convention that a leading underscore indicates a function/variable is private. As a module author, I'd like these to be hidden by default, or at least to have some way to control what is publicly documented API. Probably both, in fact - there are various things without a leading underscore that I don't want to be documented. We have an @undoc decorator in IPython for that purpose.

On the flip side, we need a way to include certain variables which are not functions or classes in the documentation. For instance, the possible values of some enums.

Other than that, 👍 to the overall concept of autogenerated API docs for Python packages. Thanks for working on it!

Contributor

takluyver commented Nov 20, 2016

Is this issue where you want feedback? The blog post asks for feedback, but I can't see any particular indication of where to send it. ;-)

Looking through some of the docs on the beta, I see that _underscore_prefixed names are included. I'm sure you're aware of the Python convention that a leading underscore indicates a function/variable is private. As a module author, I'd like these to be hidden by default, or at least to have some way to control what is publicly documented API. Probably both, in fact - there are various things without a leading underscore that I don't want to be documented. We have an @undoc decorator in IPython for that purpose.

On the flip side, we need a way to include certain variables which are not functions or classes in the documentation. For instance, the possible values of some enums.

Other than that, 👍 to the overall concept of autogenerated API docs for Python packages. Thanks for working on it!

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Nov 21, 2016

Contributor

Oh, and also possibly a way to opt out of autogenerated API docs. For some packages on PyPI, such as pip, the only explicitly public interface is the command line interface, and using it as a Python module is discouraged.

(This would force projects to think more about what is actually a public interface: I often release primarily command-line tools without really deciding whether I also expect people to use the Python API.)

Contributor

takluyver commented Nov 21, 2016

Oh, and also possibly a way to opt out of autogenerated API docs. For some packages on PyPI, such as pip, the only explicitly public interface is the command line interface, and using it as a Python module is discouraged.

(This would force projects to think more about what is actually a public interface: I often release primarily command-line tools without really deciding whether I also expect people to use the Python API.)

@ericholscher

This comment has been minimized.

Show comment
Hide comment
@ericholscher

ericholscher Nov 22, 2016

Member

@takluyver Thanks for the feedback. We definitely need to do more thinking about the display of private methods, methods with no docstrings, and other edge cases.

We'll probably end up doing some kind of blacklist for modules that don't make sense to document, like pip perhaps. Still need to think that through!

Member

ericholscher commented Nov 22, 2016

@takluyver Thanks for the feedback. We definitely need to do more thinking about the display of private methods, methods with no docstrings, and other edge cases.

We'll probably end up doing some kind of blacklist for modules that don't make sense to document, like pip perhaps. Still need to think that through!

@choldgraf choldgraf referenced this issue in docathon/docathon Nov 29, 2016

Open

Survey existing tools on documentation #4

@agjohnson

This comment has been minimized.

Show comment
Hide comment
@agjohnson

agjohnson Nov 16, 2017

Contributor

I'm going to close this issue here as we have a repo for this at rtfd/pydoc.io. I think we should be tracking work and additional changes to the site there.

Contributor

agjohnson commented Nov 16, 2017

I'm going to close this issue here as we have a repo for this at rtfd/pydoc.io. I think we should be tracking work and additional changes to the site there.

@agjohnson agjohnson closed this Nov 16, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment