Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add architecture explanation to application structure overview #2794

Open
brainwane opened this issue Jan 16, 2018 · 8 comments
Open

Add architecture explanation to application structure overview #2794

brainwane opened this issue Jan 16, 2018 · 8 comments
Labels
developer experience Anything that improves the experience for Warehouse devs documentation

Comments

@brainwane
Copy link
Contributor

Let's update our application structure overview with a writeup like Zulip's architecture summary or a curated list of links to conference talks, blog posts, etc. that would get us 30% of the way towards a history and application overview like this MediaWiki overview. We'd mention frameworks and components we use, like:

  • Pyramid
  • Alembic
  • Postgres

and engineering approaches we recommend people know about as they learn Warehouse.

Reasoning: Developers who are new to a codebase need is to know the design rationale of confusing bits -- why it was made this way, what decisions are embedded in particular choices, whether particular components are the result of a feature request, a quick fix after an outage, an experiment, etc. (This is based on research summarized in Making Software.)

Discussed a bit on the pypa-dev mailing list.

@brainwane
Copy link
Contributor Author

brainwane commented Feb 12, 2018

@lgh2 just chatted with Ernest and got some notes that I'll be turning into a PR. Here are those notes for reference -- they are very rough because I requested very quick notes, so that's my fault, not hers:

The Warehouse codebase

Warehouse uses the Pyramid web framework, the SQLAlchemy ORM, and Postgres for its database. Warehouse's front end uses Jinja2 templates.

The application exists within two Docker containers, one of which contains static files for the website, and the other which contains the Python web application code running in a virtual environment and the database. In the development environment, Docker Compose manages running the containers and the connections between them.

The top-level directory of the Warehouse repo contains a number of files. Among them are the license file, contributing.rst and readme. The requirements.txt file is for the Warehouse virtual environment. The Dockerfile creates the Docker containers that Warehouse runs in, and the docker-compose yml file configures docker compose. Test configuration is in setup.cfg. Heroku uses runtime.txt. The makefile contains commands to spin up Docker compose and the Docker containers. There are also some files associated with Warehouse's front end.

# add files

Since Warehouse was built on top of a pre-existing database, some of the code in the ORM may not look like code from SQLAlchemy’s documentation in order to make it fit the existing tables. There are some places where joins are done using logic instead of a foreign key.

Warehouse also uses Pyramid’s hybrid URL traversal and dispatch. Using factory classes, URLs are pre-populated before the view is requested.

bin/ - high-level scripts for Docker
dev/ - assets for dev env
tests/ - tests
warehouse/ - code in modules
    legacy/ - most of the implementation
    forklift/ - APIs for upload
    accounts/ - user accounts
    admin/ - administrator-specific
    cache/ - Warehouse - more goes out than goes in - cache as much as possible
    classifiers/ - frame classifiers
    cli/ - entry scripts
    i18n/ - internationalization
    locales/ - internationalization
    manage/ - DB
    migrations/ - DB
    packaging/ - models
                - rate limiting to prevent abuse
                - RSS feeds
                - site maps
    utils/
  • sql - some code not in docs because relations already existed

  • some use logic and not foreign key and are joined on names (this may change)

  • factory methods prepopulate url before view requested

Pyramid hybrid URL Traversal and Dispatch:

https://docs.pylonsproject.org/projects/pyramid/en/latest/narr/hybrid.html

Pyramid: https://docs.pylonsproject.org/projects/pyramid/en/latest/index.html
SQLAlchemy: https://docs.sqlalchemy.org/en/latest/
Postgres: https://www.postgresql.org/docs/

Docker: https://docs.docker.com/

Docker Compose: https://docs.docker.com/compose/overview/

@brainwane
Copy link
Contributor Author

brainwane commented Feb 12, 2018

It would be great if this documentation also explained what files/directories/libraries Warehouse uses to produce its various APIs.

@di
Copy link
Member

di commented Feb 23, 2018

@brainwane Could you outline what was missing from #2937 that would fully resolve this issue?

@brainwane
Copy link
Contributor Author

brainwane commented Feb 24, 2018

Thanks for asking @di. I'd like the Warehouse developer documentation to include:

  • PyPI's general load and performance expectations
    • how big the database is
    • our Service Level Agreement-type expectations of the service; "pre-production" on Warehouse has always generally meant "we're not monitoring it, we make no promises on uptime, some things are missing and objects may move around", but "beta" and "production" mean higher reliability promises
    • (nice to have) distinction between app and infrastructure?
  • any other components or related repositories developers should know about, and (if it's not obvious) why/how Warehouse uses them (alembic, celery, readme-renderer, Elastic Search, pypi-theme)
  • usage assumptions/concepts, such as:
    • we focus on architecture and features for the PyPI and Test PyPI, and people who want to run their own package indexes usually use this, bandersnatch, or devpi
    • most people browsing are not logged in; there are probably <10 users who'd access the admin UI, tens of thousands of project owners/maintainers, etc.
    • projects are mutable but releases are not (and relationship to file name/hash reuse)
    • projects, releases, packages, authors, owners, maintainers (De-duplicate "Owner" and "Maintainer" roles  #2745, Author vs Maintainer in the metadata vs Owner vs Maintainer in PyPI #2059)
    • Warehouse has never been deployed with its own database, and has always shared a database with production; the very first thing warehouse did was take over ownership of migrations to the database (it was easier to do that than to try to create a mirror database, but part of it was purposeful, since if it was affecting production from day 1, then there could be no mistake where you did something sloppy the first time around (e.g. don't check permissions or something) with the idea that you'd come back and fix it before launch, which you might forget) -- explanation from Donald
  • any special security concerns Warehouse has above and beyond your usual web app
  • (nice to have) likely future architectural changes (e.g., user groups)
  • (nice to have) pointer to curated list of presentations

@brainwane brainwane added the developer experience Anything that improves the experience for Warehouse devs label Mar 6, 2018
@brainwane brainwane modified the milestones: 4: Launch: redirect pypi.python.org to pypi.org, 6. Post Legacy Shutdown Mar 6, 2018
@brainwane
Copy link
Contributor Author

In today's Warehouse developers' meeting we decided to pare down our near-future milestones on our development roadmap so they really only contain the essential bugfixes and features we need to launch, replace legacy PyPI, and shut down the old site. So I'm moving this issue into a milestone further in the future.

brainwane added a commit to brainwane/warehouse that referenced this issue Mar 13, 2018
brainwane added a commit to brainwane/warehouse that referenced this issue Mar 14, 2018
brainwane added a commit to brainwane/warehouse that referenced this issue Mar 14, 2018
brainwane added a commit that referenced this issue Mar 14, 2018
* Add usage assumptions about user types

Ref. #2794.

* Improve directory listing
@alanbato
Copy link
Contributor

alanbato commented Apr 3, 2018

While talking with @brainwane on the IRC, I came up with two ideas:

I think a Glossary regarding terms like "project, distribution, maintainer" could be helpful to clear confusions between similar concepts and synonims found both in the codebase and the docs. e.g. project, distribution, package, version, author, maintainer, etc.

Also, I think it would be valuable to include architecture beyond the codebase, and include things like design preferences for tests, how the docker containers are setup right now, descriptions with detail of what each make command does, and other "development" parts of the workflow for completeness. Adding things besides the code layout that are also part of the system. :)

We should be careful (I almost made the mistake myself) with mixing contribution guidelines with the system architecture, design choices and codebase information.

@ewdurbin
Copy link
Member

I'm reconsidering the directory layout specifying what each subdirectory concerns itself with as that is almost guaranteed to change over time and become out of date.

The Glossary might provide enough context on what the module names mean, and a basic primer on Pyramid app/module layout would probably suffice.

@rixx
Copy link
Contributor

rixx commented Jul 28, 2018

Just my unqualified 2 cents as a first-time user of warehouse: For me, the directory structure and the "assumptions and concepts" block were the most helpful parts of the documentation once I was set up and trying to get my bearings, because it was helpful in figuring out where to start exploring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer experience Anything that improves the experience for Warehouse devs documentation
Projects
None yet
Development

No branches or pull requests

5 participants