Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema introspection (SHOW FULL COLUMNS ...) became very slow #961

Open
gregsh opened this issue Sep 19, 2019 · 6 comments
Open

Schema introspection (SHOW FULL COLUMNS ...) became very slow #961

gregsh opened this issue Sep 19, 2019 · 6 comments
Labels
blocked Some other issue is blocking this bug Something isn't working performance Performance improvements

Comments

@gregsh
Copy link

gregsh commented Sep 19, 2019

In gitbase 0.20 schema introspection is fast and full.

MySQL Connector/J JDBC metadata call that gets all columns for all tables at once
metaData.getColumns("gitbase", "", "%", "%")
is converted to calls like the following for each table
SHOW FULL COLUMNS FROM `commit_trees` FROM `gitbase` LIKE '%'"

In 0.23 and 0.24-rc the above queries are very slow (several minutes) and even fail for some tables completely in 0.23 (0.24 seems to fix that).

The above prevents from using gitbase in DB tools like JetBrains DataGrip.

@agarciamontoro
Copy link
Contributor

Thank you for the report, @gregsh! We'll take a look as soon as we can.

@agarciamontoro agarciamontoro added bug Something isn't working performance Performance improvements labels Sep 19, 2019
@erizocosmico erizocosmico self-assigned this Sep 24, 2019
@erizocosmico
Copy link
Contributor

Can you provide a step-by-step example we can reproduce? What operating system are you using?

I thought I reproduced it so I started investigating, but everything seemed to work in tests and when I manually wrote the query. Then I realised the trailing " in your query, which was what made it wait forever expecting the end of the query.
So, after all, I cannot reproduce this problem.

@gregsh
Copy link
Author

gregsh commented Sep 24, 2019

I'm introspecting gitbase schema using soon to be released version of DataGrip and Mysql Connector/J 8.0.15.

Here's the part of log, the ending quote does not belong to the query.

DEBU[31609] executing query query="/* ApplicationName=DataGrip 2019.3 Snapshot */ SHOW FULL COLUMNS FROM `blobs` FROM `gitbase` LIKE '%'" INFO[31609] audit trail action=authorization address="127.0.0.1:58216" connection_id=15 permission=read pid=79 query="/* ApplicationName=DataGrip 2019.3 Snapshot */ SHOW FULL COLUMNS FROM `blobs` FROM `gitbase` LIKE '%'" success=true system=audit user=guest INFO[31609] audit trail action=query address="127.0.0.1:58216" connection_id=15 duration="972.847µs" pid=79 query="/* ApplicationName=DataGrip 2019.3 Snapshot */ SHOW FULL COLUMNS FROM `blobs` FROM `gitbase` LIKE '%'" success=true system=audit user=guest

@erizocosmico
Copy link
Contributor

Still cannot reproduce this.
I just connected using MySQL connector/J 8.0.15 with the following code:

    val connection = DriverManager.getConnection(
      "jdbc:mysql://%s?serverTimezone=UTC".format("localhost:3306"),
      "root",
      ""
    )

    val meta = connection.getMetaData

    val rs = meta.getColumns("gitbase", "", "%", "%")
    while (rs.next()) {
      println(rs.getString(3), rs.getString(4), rs.getString(5))
    }

And it prints

(blobs,repository_id,-1)
(blobs,blob_hash,12)
(blobs,blob_size,4)
(blobs,blob_content,-4)
(commit_blobs,repository_id,-1)
(commit_blobs,commit_hash,12)
(commit_blobs,blob_hash,12)
(commit_files,repository_id,-1)
(commit_files,commit_hash,12)
(commit_files,file_path,-1)
(commit_files,blob_hash,12)
(commit_files,tree_hash,12)
(commit_trees,repository_id,-1)
(commit_trees,commit_hash,12)
(commit_trees,tree_hash,12)
(commits,repository_id,-1)
(commits,commit_hash,12)
(commits,commit_author_name,-1)
(commits,commit_author_email,12)
(commits,commit_author_when,93)
(commits,committer_name,-1)
(commits,committer_email,12)
(commits,committer_when,93)
(commits,commit_message,-1)
(commits,tree_hash,12)
(commits,commit_parents,1111)
(files,repository_id,-1)
(files,file_path,-1)
(files,blob_hash,12)
(files,tree_hash,12)
(files,tree_entry_mode,12)
(files,blob_content,-4)
(files,blob_size,4)
(ref_commits,repository_id,-1)
(ref_commits,commit_hash,12)
(ref_commits,ref_name,-1)
(ref_commits,history_index,4)
(refs,repository_id,-1)
(refs,ref_name,-1)
(refs,commit_hash,12)
(remotes,repository_id,-1)
(remotes,remote_name,-1)
(remotes,remote_push_url,-1)
(remotes,remote_fetch_url,-1)
(remotes,remote_push_refspec,-1)
(remotes,remote_fetch_refspec,-1)
(repositories,repository_id,-1)
(tree_entries,repository_id,-1)
(tree_entries,tree_entry_name,-1)
(tree_entries,blob_hash,12)
(tree_entries,tree_hash,12)
(tree_entries,tree_entry_mode,12)

In 413ms.

Tried both with master version and v0.24.0-rc1 using a docker container.

@gregsh
Copy link
Author

gregsh commented Sep 24, 2019

I have found that it depends on the size of a repository. On a small one (70MB .git) it is fast, but on a large one (9GB .git) it is extremely slow.

@erizocosmico
Copy link
Contributor

I just tried with the linux kernel and pytorch and it still takes milliseconds. Introspection never touches the repositories, so it should not depend on them.

Do you have indexes created in gitbase? Perhaps it's the indexes checksum when starting the server.

@erizocosmico erizocosmico added the blocked Some other issue is blocking this label Sep 27, 2019
@erizocosmico erizocosmico removed their assignment Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Some other issue is blocking this bug Something isn't working performance Performance improvements
Projects
None yet
Development

No branches or pull requests

3 participants