Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accessibleRepositoryCondition makes an awful "where condition" in SQL #10457

Open
2 of 7 tasks
bennyscetbun opened this issue Feb 25, 2020 · 8 comments
Open
2 of 7 tasks
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented performance/speed performance issues with slow downs

Comments

@bennyscetbun
Copy link

  • Gitea version (or commit ref): 06cd3e0
  • Operating system: docker image
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant

Description

Hi we have a user with more than 200k repo and more than 600k action on them.
We cannot connect on the website anymore with this user
We found out that the issue was from a request that make the front end time out.
We ve been searching in the code. We ve found the request

SELECT "id", "user_id", "op_type", "act_user_id", "repo_id", "comment_id", "is_deleted", "ref_name", "is_private", "content", "created_unix" FROM "action"
WHERE
    repo_id IN (SELECT id FROM repository WHERE
        (
            "repository".is_private=false AND
            ("repository".owner_id NOT IN (SELECT id FROM "user" WHERE type=1) OR "repository".owner_id NOT IN (SELECT id FROM "user" WHERE visibility IN (2)))
        ) OR
        "repository".id IN (SELECT repo_id FROM "access" WHERE user_id=1 AND mode>0) OR
        "repository".id IN (SELECT id FROM "repository" WHERE owner_id=1) OR
        "repository".id IN (SELECT "team_repo".repo_id FROM team_repo INNER JOIN team_user ON "team_user".team_id = "team_repo".team_id WHERE "team_user".uid=1)
    ) AND
    user_id=1 AND
    is_deleted=false
ORDER BY "id" DESC LIMIT 20

The main issue here (at least in postgres) is the over use of OR and the use of IN with some sub queries.
To fix it we ve used some subqueries with UNION
For info we let the first query run for more than 20min without a response.
Our new query is less than 5 seconds.
We haven't try with all the different type of databases. Do you think there will be issues with other DB?
Do you see any other better/good solution?
Should we make a pull request?

Thanks
...

Screenshots

I am pretty sure you know what a 504 is :)

@guillep2k
Copy link
Member

Hi, this will hopefully be fixed by #9787, but I couldn't advance with it because there are other PRs it's depending on that are still waiting for approval.

Currently xorm produces invalid code for SQLite regarding unions, unfortunately.

@bennyscetbun
Copy link
Author

We tried to keep the logic of your code as much as we could.
If you want to check the changes that we ve done and maybe inspire yourself.
42School@ee77387

Regarding SQLite, we've ran the test after we wrote the issue. And yes. Unions aren't working with xorm and SQLite :)

@zeripath
Copy link
Contributor

zeripath commented Mar 3, 2020

Some of the INs could probably be coalesced without the use of Union I think. When I wrote the code the idea was readability over efficiency - as ever the way people use Gitea always exceeds expectations.

If you can come up with a way of making sqlite work your fix would be useful - if only as a staging step to spur @guillep2k to finish #9787 😛

@bennyscetbun
Copy link
Author

The real issue here, IMHO, are more the OR than the IN.
We've cut the requests in small pieces and removing the OR was the real win.

@zeripath
Copy link
Contributor

zeripath commented Mar 3, 2020

If you move the some of the ORs into the INs so there is only one or very few INs that will likely be much quicker.

@zeripath
Copy link
Contributor

zeripath commented Mar 3, 2020

For example the

            ("repository".owner_id NOT IN (SELECT id FROM "user" WHERE type=1) OR "repository".owner_id NOT IN (SELECT id FROM "user" WHERE visibility IN (2)))

Could be simplified to:

("repository".owner_id NOT IN (SELECT id FROM "user" WHERE type=1 OR visibility IN (2)))

And so on...

@zeripath
Copy link
Contributor

zeripath commented Mar 3, 2020

The assumption was that the inefficiency was unlikely to bite because who would have 200k repositories...

@stale
Copy link

stale bot commented May 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

@stale stale bot added the issue/stale label May 3, 2020
@lunny lunny added performance/speed performance issues with slow downs issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented and removed type/bug labels May 3, 2020
@stale stale bot removed issue/stale labels May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented performance/speed performance issues with slow downs
Projects
None yet
Development

No branches or pull requests

4 participants