Create `requires_python` column on `release_files` to efficiently query `requires_python` with files #1448

Carreau · 2016-08-31T19:15:35Z

Assuming there are much less updates of the release and release_file
table than reads then we take the overhead of building this materialized
view only rarely.

Building it concurrently is slow but does not block reads so the
building cost might be invisible.

The more efficient alternative would be :

   - To make this an actual table and update only the affected row but
     seem much more complicated and prone to errors and confusions for
     maintenance as this table should not be manipulated directly. But
     would not be that hard since Posgres 9.5 supports UPSERTs.

     It would be easy to convert back-and from a virtual-table to a real
     table without actually changing Python code.

   - To duplicate the `requires_python` column on both `releases` and
     `release_files` which seem redundant and might lead to
     inconsistencies.

   - Transfer `requires_python` to `release_files`, though it is also
     queried from `releases` for we might still have the overhead of
     join but in the other direction and will require careful checking
     of the legacy-pypi codebase.

Working with @michaelpacer on pypi/legacy#506 we came up with this solution for now, and woudl appreciate any feedback and timing statistics on replicate of production DB.

Carreau · 2016-08-31T22:33:15Z

Pushed a change to try to make the linter happy.

Carreau · 2016-08-31T22:40:59Z

The test on 3.5.2 seem to have failed before starting can one of you restart this specific build of travis ?

Thanks !

di · 2016-08-31T22:44:01Z

@Carreau I restarted it for you.

dstufft · 2016-08-31T22:45:55Z

Why not just add a column to the release_files table and setup a trigger to cascade the values over? Using a Materialized view here seems to be harder than that?

Carreau · 2016-08-31T22:58:53Z

Why not just add a column to the release_files table and setup a trigger to cascade the values over? Using a Materialized view here seems to be harder than that?

I'm not sure how to cascade on the same table, I can give it a try; @michaelpacer and I basically learn SQL for that, and I can see many way to get that wrong:
- recursion ? I don't know if it's possible for triggers ?
How to react to row update, if row update in release_files there is and it update a release_files.requires_python, does the trigger revert back to the corresponding value in release.requires_python ? Globally I'm concern that would would get internal source of truth and that any further development later on warehouse might get confused to know which of the two column should be updated.

I might just misunderstand the features available in SQL, though and there might be way to make sure there is consistency.

Carreau · 2016-09-14T01:01:43Z

Updated with @michaelpacer work, this now adds a column to release_files and update only concern rows on UPDATE or INSERT into releases or release_files.

Carreau · 2016-09-15T03:56:34Z

Travis seem to be stuck on Running setup.py bdist_wheel for psycopg2 ... can you restart this build please ?

I improved the codestyle to pass the linter, and I assume after rebasing I had to update the database revision number (I assume that what's the following was trying to convey:

self = <alembic.script.base.ScriptDirectory object at 0x7fa690fc8dd8>
ancestor = 'Destination %(end)s is not a valid upgrade target from current head(s)'
multiple_heads = "Multiple head revisions are present for given argument 'head'; please specify a specific target revision, '<branchname>@head' to narrow to a specific head, or 'heads' for all heads"

Is there a simpler way to update the db revision number than to re-run docker-compose run web python -m warehouse db revision and move the content from one file to another ?

dstufft · 2016-09-15T03:59:31Z

I think multiple heads means that other changes to the DB has landed since this change and you'll need to either rebase your DB revision so it follows linearly to that one, or you'll need to create a merge db revision. See http://alembic.zzzcomputing.com/en/latest/branches.html

Carreau · 2016-09-15T04:18:29Z

Thanks for restarting the build.

I think multiple heads means that other changes to the DB has landed since this change and you'll need to either rebase your DB revision so it follows linearly to that one, or you'll need to create a merge db revision. See http://alembic.zzzcomputing.com/en/latest/branches.html

Thanks, it appears I still had the revision wrong but because of a missing key... re-attempting.

Carreau · 2016-09-15T04:49:42Z

Yeah ! 🍰 🎊 Test are passing.

Not sure if you prefer commits to be squashed a bit, and/or if extra test are needed for the DB as this will mostly be used by Legacy PyPI, nor exactly what need to be tested.

Carreau · 2016-09-16T23:47:40Z

Why not just add a column to the release_files table and setup a trigger to cascade the values over? Using a Materialized view here seems to be harder than that?

That's now implemented and seem to work.

[edit] fix typo spotted in next comment.

mpacer · 2016-09-17T00:33:47Z

Why not just add a column to the release_files table and setup a trigger to cascade the values over? Using a Materialized view here seems to be harder than that?

I think what @Carreau meant was that it's now implemented. Since it is not not implemented. 😄

Is there anything more that needs to be done on this for it to be included?

Carreau · 2016-09-17T03:17:09Z

I think what @Carreau meant was that it's now implemented. Since it is not not implemented. 😄

Oops. typed to fast. I edited my comment.

Carreau · 2016-09-19T17:24:39Z

I'll appreciate comment or hints on the next step to take on this. In order to tackle the things on PyPI sides as well.

Thanks !

Carreau · 2016-09-26T15:43:12Z

Hello there !

It's Monday 🎊 , usually the day of the week most people hate because it's time to start working again !

I hope you are all fine, and have spent a nice week-end. I personally went to the beach (in Moss landing) and saw otters. I did a wrong move and my broken for is hurting again, so I used that as an excuse to watch Mr Robot on Sunday. I've also read that Postgres 9.6 is going to have BDR, which is cool I guess.

I'm now routinely looking at this piece of code and looking to get any feedback on things to improve (or to fix). Hopefully get things merged soon enough to be able to finish my patch on legacy PyPI.

I'm now going to attempt inserting a Cat Gif here to cheer you up and give you the strength to review.
Sorry if you are a dog person, the dog gif is for next week.

Thanks for all your hard work !

Carreau · 2016-10-03T18:39:06Z

Morning everyone !

Is is monday again ! I know, I know it's often monday. But according to rough calculation only about 14.28% of the time ! I definitively need more samples to be sure.

Also good news it 💧 rained 💧 this morning in California ! Also my ankle did not hurt before it started raining. I'm sad cause I wanted to be like the old grandpa in movies that can predict rain just depending on whether* their rheumatism are hurting. [*or should I say weather, Haha]

I'm also kind of anti-participating to hacktoberfest: instead of submitting PRs I'm trying to get mine merged or closed as I feel too many opened PRs can be scarry to new contributors, and depressing for maintainers. I know the feeling.

Let me tell you a joke to cheer you up and give you the courage to merge this Pull request. You make it hard on me because I have to find one relevant for this Pull request...

A SQL request walk into a bar, where some of it's friends are hanging in the back.
it approaches the tables where they are having multiple row of drinks and ask:
"Can I join you ?"

Well ok, that was lame. But I hope you enjoyed it and got some courage to press the nice green button you can see below. Also if you are afraid to ask from some change or don't have the courage to write a long reply, do not worry I can take a blunt and short answer even negative one. All I want is this thing to move forward. And I know that writing a long nice message that is politically correct can be tough.

Anyway thanks for your hard work, and hopping we can move this forward.

Also I'm still unsure if you are a dog or a cat person, so I'm going to try a dog gif this time. If it does not work I'll likely try squirel, llama, rabbits, raptors or something else next week.

See you before next monday hopefully.

dstufft · 2016-10-04T12:16:48Z

Sorry, this is on my list of things to review, I've just had a really crappy couple of weeks and haven't been able to focus much.

dstufft · 2016-10-04T12:41:03Z

warehouse/migrations/versions/be4cf6b58557_add_requires_python_to_release_files.py

+            BEGIN
+                IF (TG_OP = 'INSERT'
+                        OR
+                    OLD.requires_python IS DISTINCT FROM NEW.requires_python)


I think you can ditch the IF ... THEN statement here and just rely on the trigger to only evaluate when this needs to be evaluated (see below).

Caught that the if statement ended after I tried to rebuild the db. Now it seems to successfully migrate!

dstufft · 2016-10-04T12:43:03Z

warehouse/migrations/versions/be4cf6b58557_add_requires_python_to_release_files.py

+    # release_files with the appropriate requires_python values.
+    op.execute(
+        """ CREATE TRIGGER releases_requires_python
+            AFTER INSERT OR UPDATE ON releases


If you change this to AFTER INSERT OR UPDATE OF requires_python ON releases then this trigger will only fire on either an insert, or when requires_python is updated. This would allow you to remove the logic from the procedure up above.

Oh, nice ! Thanks for the tip !

That is much more elegant; thank you!

Changes should be addressed by 9061cbe.

dstufft · 2016-10-04T12:56:18Z

warehouse/migrations/versions/be4cf6b58557_add_requires_python_to_release_files.py

+    # if someone changes the requires_python value, it is regenerated from
+    # releases.
+    op.execute(
+        """ CREATE TRIGGER release_files_requires_python


I probably wouldn't bother with this. This should be read only from the Python code in Warehouse and so it shouldn't ever got modified anyways. If it does we can always repair the data by executing:

UPDATE release_files SET requires_python = releases.requires_python FROM releases WHERE release_files.name=releases.name AND release_files.version=releases.version

again to fix it.

Does that mean that those commands should be added to the downgrade() function? Or just that there's a clean way to address the issue if it does happen to come up.

Thanks for the feedback either way!

Carreau · 2016-10-04T16:32:12Z

Sorry, this is on my list of things to review, I've just had a really crappy couple of weeks and haven't been able to focus much.

No worries, we all go through these phase. Hope things will go better for you. I was secretly hoping you were not reviewing because you liked the weekly email and were hoping to get more.

Thanks a lot for your time we really appreciate !

…trigger

dstufft

This overall looks good. One thing I missed is that you need to add this to the File module, something like:

class File(db.Model):
    ...

    requires_python = Column(Text)

    @validates("requires_python")
    def validate_requires_python(self, *args, **kwargs):
        raise RuntimeError("Cannot set File.requires_python")

mpacer · 2016-10-05T17:21:33Z

Is the validates decorator that you're suggesting the same as the sqlalchemy.orm.validates described here or is there a place in the code where you define your own validates decorator that I'm not finding?

dstufft · 2016-10-05T17:22:28Z

Yea, it's the SQLAlchemy one, and it will just make it so any attempt in the Warehouse ORM to change the value will bomb out (but the trigger will work fine).

mpacer · 2016-10-05T17:23:51Z

And it should also be included in the File module?

dstufft · 2016-10-05T17:25:12Z

It should only be on the File model. Modifying Release.requires_python is perfectly fine, the trigger will keep File.requires_python up to date, but modifying File.requires_python should trigger a validation error.

mpacer · 2016-10-05T17:26:51Z

Got it, thank you for the pointers! I wouldn't have known to construct it in this way without your help.

mpacer · 2016-10-05T17:32:40Z

I see now! The validation is accomplishing what we were trying to directly overwrite with triggers.

I agree that raising an error is a better than silently overwriting what someone did.

Also, I just realised I never imported it.

mpacer · 2016-10-05T17:43:48Z

@dstufft Does the codecov issue mean that a test needs to be added that hits the RuntimeError and expects a failure?

dstufft · 2016-10-05T17:44:23Z

Yes.

mpacer · 2016-10-05T17:51:27Z

Would that go in the tests/unit/packaging/test_models.py under the TestFile class?

dstufft · 2016-10-05T17:58:43Z

Yup.

willingc · 2016-10-05T18:07:51Z

@dstufft Thanks for helping out the Jupyter folks.

Sorry about the last few weeks. Your Python friends ❤️ all that you do. Take care. Feel free to ping any of us if we can help. You know how much I love docs ;-)

mpacer · 2016-10-05T18:52:05Z

Thank you for merging this!

When can we rely on this being available from test_pypi and pypi so we can make progress on integrating requires_python into pypi-legacy while using more efficient sql requests (in re: pypi/legacy#506)?

dstufft · 2016-10-05T19:13:06Z

Now.

The work in PR pypi#1448 was meant to replicate the information of the `requires_python` of the `release` table to the `release_files` table for efficiency when generating the list of available packages for pip. While the work on pypi#1448 seem sufficient for warehouse itself, it need to consider that legacy-pypi also access the same database, and legacy-pypi violate some constraint. In particular when using `setup.py register` followed by `twine upload`, the file upload insert files into `release_files` after inserting into `releases`. Thus the value in releases is not propagated, leading to an inconsistency and a listing in pip missing information about python-version compatibility. While I doubt there are any packages released between the merge of pypi#1448 and a fix, this update the tables, and bind an already existing trigger to update the information during insertion in `release_files`

The work in PR #1448 was meant to replicate the information of the `requires_python` of the `release` table to the `release_files` table for efficiency when generating the list of available packages for pip. While the work on #1448 seem sufficient for warehouse itself, it need to consider that legacy-pypi also access the same database, and legacy-pypi violate some constraint. In particular when using `setup.py register` followed by `twine upload`, the file upload insert files into `release_files` after inserting into `releases`. Thus the value in releases is not propagated, leading to an inconsistency and a listing in pip missing information about python-version compatibility. While I doubt there are any packages released between the merge of #1448 and a fix, this update the tables, and bind an already existing trigger to update the information during insertion in `release_files`

Carreau force-pushed the mat_view branch from 1f2c9a8 to b6ca6a0 Compare September 14, 2016 00:56

Carreau force-pushed the mat_view branch from c57d4aa to 6f2cf94 Compare September 15, 2016 04:15

Carreau changed the title ~~Create a materialized view to efficiently query requires_python with files~~ Create requires_python column on release_files to efficiently query requires_python with files Sep 15, 2016

Carreau mentioned this pull request Sep 16, 2016

Implement pep 503 for simple repository. pypi/legacy#506

Merged

dstufft requested changes Oct 4, 2016

View reviewed changes

mpacer and others added 6 commits October 4, 2016 10:12

added materialized view and triggers

722da31

add requires python to release_files

5b1c1ca

Pep8-ify code and change db refvision numbers.

6750e61

One more try for revision ID

2f04fc7

dont' test for OR

78888ce

Remove conditional statement, relativize trigger, remove unnecessary …

9061cbe

…trigger

mpacer force-pushed the mat_view branch from 6dea060 to 9061cbe Compare October 4, 2016 17:13

dstufft requested changes Oct 5, 2016

View reviewed changes

Add validation to function

ebd4a3f

dstufft approved these changes Oct 5, 2016

View reviewed changes

Add validates import

d577fb2

Add test for File.requires_python

140d51b

mpacer force-pushed the mat_view branch from 08ceb13 to 140d51b Compare October 5, 2016 18:32

dstufft merged commit 1400ce5 into pypi:master Oct 5, 2016

Carreau deleted the mat_view branch October 5, 2016 18:55

Carreau mentioned this pull request Oct 30, 2016

Practicalities section points to nonexistant information? python3statement/python3statement.github.io#53

Open

Carreau mentioned this pull request Dec 2, 2016

Prevent inconsistency between release and release_files tables. #1513

Merged

Create requires_python column on release_files to efficiently query requires_python with files #1448

Create requires_python column on release_files to efficiently query requires_python with files #1448

Conversation

Carreau commented Aug 31, 2016

Carreau commented Aug 31, 2016

Carreau commented Aug 31, 2016

di commented Aug 31, 2016

dstufft commented Aug 31, 2016

Carreau commented Aug 31, 2016

Carreau commented Sep 14, 2016

Carreau commented Sep 15, 2016

dstufft commented Sep 15, 2016

Carreau commented Sep 15, 2016

Carreau commented Sep 15, 2016

Carreau commented Sep 16, 2016 • edited Loading

mpacer commented Sep 17, 2016

Carreau commented Sep 17, 2016

Carreau commented Sep 19, 2016

Carreau commented Sep 26, 2016

Carreau commented Oct 3, 2016

dstufft commented Oct 4, 2016

dstufft Oct 4, 2016

Choose a reason for hiding this comment

mpacer Oct 4, 2016

Choose a reason for hiding this comment

dstufft Oct 4, 2016

Choose a reason for hiding this comment

Carreau Oct 4, 2016

Choose a reason for hiding this comment

mpacer Oct 4, 2016

Choose a reason for hiding this comment

dstufft Oct 4, 2016

Choose a reason for hiding this comment

mpacer Oct 4, 2016

Choose a reason for hiding this comment

Carreau commented Oct 4, 2016

dstufft left a comment

Choose a reason for hiding this comment

mpacer commented Oct 5, 2016

dstufft commented Oct 5, 2016

mpacer commented Oct 5, 2016

dstufft commented Oct 5, 2016

mpacer commented Oct 5, 2016

mpacer commented Oct 5, 2016

mpacer commented Oct 5, 2016

dstufft commented Oct 5, 2016

mpacer commented Oct 5, 2016

dstufft commented Oct 5, 2016

willingc commented Oct 5, 2016

mpacer commented Oct 5, 2016

dstufft commented Oct 5, 2016

Create `requires_python` column on `release_files` to efficiently query `requires_python` with files #1448

Create `requires_python` column on `release_files` to efficiently query `requires_python` with files #1448

Carreau commented Sep 16, 2016 •

edited

Loading