Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix milestones with too many repos #10186

Closed
wants to merge 4 commits into from

Conversation

@guillep2k
Copy link
Member

guillep2k commented Feb 7, 2020

This is my wild attempt of fixing #9841 to (hopefully) bring the release of 1.11.0 a little closer.

Just like the problem in #9755, we're attempting queries with too many parameters. This should be solvable by means of rewriting the query, but I found it very difficult in this case, so I attempted a different way instead:

  • For statistics collection (e.g. counting records), break down the query in smaller chunks and accumulate the results (same solution as #10176).
  • For the actual list of milestones (which is paginated), I've used a temporary table (a table that is dropped as soon as the connection is closed/reset).

So the concept is, instead of using the IN (list of ids) operator, I'm using INNER JOIN with a table containing my list of ids. I only attempt this if the list of ids is big enough.

To be honest, I just wanted to test the waters on using temporary tables, which I believe could prove useful in the future. This works just fine in all database engines, so in that regard I think it's a success.

There is room for improvement (for example, the temporary tables could be managed by xorm), but for the time being I'd like to hear your reactions. 馃槃

Fixes #9841

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Feb 7, 2020

Codecov Report

鉂楋笍 No coverage uploaded for pull request base (master@11995bf). Click here to learn what that means.
The diff coverage is 38.46%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #10186   +/-   ##
=========================================
  Coverage          ?    43.4%           
=========================================
  Files             ?      577           
  Lines             ?    79715           
  Branches          ?        0           
=========================================
  Hits              ?    34603           
  Misses            ?    40837           
  Partials          ?     4275
Impacted Files Coverage 螖
models/models.go 61.71% <酶> (酶)
models/temp_table.go 0% <0%> (酶)
models/issue_milestone.go 65.79% <57.14%> (酶)

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 11995bf...a7c6e1e. Read the comment docs.

@GiteaBot GiteaBot added the lgtm/need 2 label Feb 7, 2020
@zeripath

This comment has been minimized.

Copy link
Contributor

zeripath commented Feb 8, 2020

I suppose I don't know about the downsides and upsides of temp tables. Do we risk crashing the database by creating too many of these at once? Is there a garbage collection cost?

@guillep2k

This comment has been minimized.

Copy link
Member Author

guillep2k commented Feb 9, 2020

@zeripath temp tables are used internally by the engines to perform sub-queries, so they are a pretty standard resource. They're also normally held only in memory. I've never heard of them causing problems. I'll look into it. Anyway, how many temp tables can there be? It's never in the hundreds or thousands, since we'd be creating one or two per connection (they're disappear when the connection is released).

The unique number id I've implemented is for ruling out any chance of collision in case we need to use more than one table in the same request. It's a global counter only because it's easier for me than having one per-request.

@guillep2k

This comment has been minimized.

Copy link
Member Author

guillep2k commented Feb 9, 2020

@zeripath I've been digging around and I've found nothing relevant to us.

I've found this article regarding a potential problem regarding innoDB and temporary tables that don't fit in memory buffers (>16MB), but the cases mentioned are pretty extreme (like if we'b be constantly banging the server with queries on >200,000 repositories). Even more, the problem is not about explicitly created temporary tables but any kinds of temporary tables, like those created internally when we use subqueries, so making the temp table explicit would make no difference.

So I'd say in this case it's pretty safe.

@lunny

This comment has been minimized.

Copy link
Member

lunny commented Feb 9, 2020

We should use sub query to instead repoIDs I think.

@guillep2k

This comment has been minimized.

Copy link
Member Author

guillep2k commented Feb 9, 2020

@lunny I agree, but it was too much refactoring for me. Maybe we should push that bug into 1.11.1?

@guillep2k

This comment has been minimized.

Copy link
Member Author

guillep2k commented Feb 10, 2020

I'm closing this one while we wait for a different (subquery) solution. I won't give up on using temporary tables for solving other problems, though, just not for this particular issue. 馃槃

@guillep2k guillep2k closed this Feb 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

6 participants
You can鈥檛 perform that action at this time.