Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the tag index page #7671

Merged
merged 2 commits into from Nov 17, 2020
Merged

Conversation

davidfischer
Copy link
Contributor

Excluding spam projects from the tag index page really slows it down. This involves a project-project self join and then joining against users and the user profile. Each of these tables is a few 100k records. In the worst case, this page can take ~2s to load which can be very bad when the page is crawled repeatedly.

This PR simply removes filtering out spam projects which is the expensive part of this query.

  • PRO: Speeds the page up drastically without getting rid of it. This page is one of the main ways for projects to get indexed by search engines. It also helps link projects to similar projects. In my tests, this sped things up by over an order of magnitude even in cases where the returned result didn't change from the spam filtering.
  • CON: Spam projects might show up in the tag index. This probably wouldn't be noticed by users since spam projects don't have real tags that users might browse. However, it might help spam projects to get picked up by search engines although there are other ways search engines can find spam projects. This is mitigated significantly since we now have automated cleanup of spam projects. In total, there are ~2.7k spam projects that aren't autocleaned.

Excluding spam projects from the tag index page really slows it down.
This involves a project-project self join and then joining against
users and the user profile. Each of these tables is a few 100k records.
@davidfischer davidfischer requested a review from a team November 17, 2020 06:00
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this solution. We are not doing a really good job by marking users/projects as spam anyways, so I'm sure that we are already showing spam projects on this listing currently.

Excluding projects from banned users is a nice to have that sometimes is taking our site down. At this moment, I prefer to show spam projects in that listing than taking the site down because of this.

Besides, if we do a better job by flagging users as spam, their projects will be automatically deleted by a task and they won't appear here anyways.

I may suggest to add a small comment about why we remove this filter, or even keep the line commented out together with the comment.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think this is fine. We need to have better methods of making users as spam, or even denormalizing it onto the Project if we care about query speed.

Agreed with @humitos that it would be good to keep this line commented out explaining why it's been removed.

@davidfischer
Copy link
Contributor Author

I re-added the code with a comment

@davidfischer davidfischer merged commit 6849d74 into master Nov 17, 2020
3 checks passed
@davidfischer davidfischer deleted the davidfischer/optimize-tag-index branch November 17, 2020 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants