Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce memory consumption of uploads and reindex task #3470

Merged
merged 5 commits into from
Apr 12, 2018

Conversation

ewdurbin
Copy link
Member

@ewdurbin ewdurbin commented Mar 30, 2018

@ewdurbin ewdurbin requested a review from dstufft March 30, 2018 11:08
Release.created))
.order_by(Release._pypi_ordering.desc())
.all(),
key=lambda r: parse_version(r.version),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be redundant, sorting by Release._pypi_ordering should be the same as sorting by parse_version(r.version).

return sorted(
orm.object_session(self).query(Release)
.filter(Release.project == self)
.options(orm.load_only(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having load_only only a "generic" name like all_releases or latest_release makes me nervous because I feel like it's very easy to trigger N+1 query problems if this gets used in another context without realizing it, since the name doesn't imply anything about the fact it's limiting the data available.

What information do we actually use out of these? I don't remember what the templates do, but it looks like the search indexing really only cares about version numbers, is it the same for the templates? If it is maybe something like:

@property
def all_versions(self):
    return (orm.object_session(self)
               .query(Release.version)
               .filter(Release.project == self)
               .order_by(Release._pypi_ordering.desc())
               .all())

Would be enough to do the trick? We'd probably want a list() or something in there so the return type is just ["1.0", "2.0", 3.0"] in whatever order we expect this in.

Another thing to think about is making these just functions that aren't attached to the Project object, I'm not sure if that ends up being clearer or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forklift/legacy.py also needs to pull in all_releases to re enumerate _pypi_ordering. I guess it'd be OK to do what's proposed here and inline the query for that.

def latest_release(self):
return first(
sorted(
self.all_releases,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is going to cause things that already use Project.all_releases to re-execute the all_releases query instead of re-using the already fetched values like it did previously. If we're going to end up executing a second query, we might as well handle all of the sorting in SQL. I also ask a similar question, is the only thing we really care about getting the latest version? then maybe this would be better off specified as:

@property
def latest_version(self):
    return (orm.object_session(self)
               .query(Release.version)
               .filter(Release.project = self)
               .order_by(
                   Release.is_prerelease.nullslast(),
                   Release._pypi_ordering.desc())
               .first())

default=release.project.releases[0],
).version
obj["version"] = [r.version for r in release.project.all_releases]
obj["latest_version"] = release.project.latest_release.version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget why I did this, but is the latest version not already the one being passed into this method as release? If it is then this is redundant (and with these changes, causing an additional query to be executed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously it skipped prereleases, so I kept that logic in place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that raises a bit of a point, should we use the other data from a pre-release here? Is it confusing to have say the summary and readme from a pre-release being shown with the latest non pre-release version?

I think maybe the version selection logic for indexing should use the same nullslast, order by logic that I posted above that will give the first non-prerelease, unless there is only pre-releases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed.

@ewdurbin ewdurbin force-pushed the lightweight_releases branch 2 times, most recently from 5d83acd to 9c6a748 Compare March 30, 2018 21:27
@ewdurbin ewdurbin changed the title move all_releases/latest_release logic into Project reduce memory consumption of uploads and reindex task Mar 30, 2018
@ewdurbin ewdurbin dismissed dstufft’s stale review April 11, 2018 12:03

addressed review.

@ewdurbin ewdurbin requested a review from di April 11, 2018 12:37
@ewdurbin
Copy link
Member Author

I think the biggest drawback to this implementation is the N+1 query situation introduced by calling all_versions against each Project.

Unfortunately I wasn't able to find a sane way to get stable memory consumption doing it all in one JOIN.

@ewdurbin
Copy link
Member Author

Appears this may still not be working against the full database :/

@ewdurbin ewdurbin merged commit 78cd8a9 into master Apr 12, 2018
@ewdurbin ewdurbin deleted the lightweight_releases branch April 12, 2018 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants