Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance run table query performance with indices #1277

Merged
merged 15 commits into from
Oct 4, 2018

Conversation

geoffroth
Copy link
Contributor

Changes proposed in this pull request:

  • Adds two indices to the experiments.db runs table, one on the "exp_id" column and one on the "guid" column.
    This should enhance performance (especially in large tables) for queries that retrieve data by either column. Indices are DESC based on the assumption that we will more likely be querying higher / more recent exp_ids and guids more often than older data.
  • For clarity, indices definitions are grouped directly below the table definitions they modify.
  • For convenience, and to hopefully avoid errors, indices are created after any necessary migrations have completed.

@WilliamHPNielsen

@jenshnielsen
Copy link
Collaborator

I think this fails the tests because it's trying to add the index to the runs table with out a guid colloum in the upgrade 0 to 1 test. I think making this an upgrade too would solve the problem. @WilliamHPNielsen I don't know if that in this case should go in before the dependencies upgrade

@WilliamHPNielsen
Copy link
Contributor

@jenshnielsen, I agree that this should be a distinct upgrade, and I think it makes sense that this PR gets database version number 2, since it is already done.

@geoffroth should I make this into an upgrade or will you do that?

@WilliamHPNielsen
Copy link
Contributor

Also: great PR, much appreciated!

If anyone has time for benchmarking, it would be great to see if we do get the expected O(log(n)) instead of O(n) on selected queries.

@geoffroth
Copy link
Contributor Author

geoffroth commented Sep 20, 2018

@WilliamHPNielsen Since it's just a few lines, would it make sense for you to perhaps simply incorporate this or similar into the db refactor you're currently working on? (I had already finished most of this before I knew about your work-in-progress). Otherwise, I'd certainly be happy to rewrite as upgrade 2.

@WilliamHPNielsen
Copy link
Contributor

@geoffroth, my refactor is still one or two weeks out in the future, so I would prefer to have this be a separate upgrade that we can put in now. Thanks.

@codecov
Copy link

codecov bot commented Sep 25, 2018

Codecov Report

Merging #1277 into master will increase coverage by 0.02%.
The diff coverage is 84.21%.

@@            Coverage Diff             @@
##           master    #1277      +/-   ##
==========================================
+ Coverage   72.51%   72.54%   +0.02%     
==========================================
  Files          74       74              
  Lines        8438     8456      +18     
==========================================
+ Hits         6119     6134      +15     
- Misses       2319     2322       +3

Copy link
Contributor

@WilliamHPNielsen WilliamHPNielsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Let get this in.

@WilliamHPNielsen WilliamHPNielsen merged commit d7605eb into microsoft:master Oct 4, 2018
giulioungaretti pushed a commit that referenced this pull request Oct 4, 2018
Merge: 1868289 756c89f
Author: William H.P. Nielsen <whpn@mailbox.org>

    Merge pull request #1277 from geoffroth/fix/run_indices


GIT_HASHES: Dict[int, str] = {0: '78d42620fc245a975b5a615ed5e33061baac7846',
1: '056d59627e22fa3ca7aad4c265e9897c343f79cf'}

DB_NAMES: Dict[int, List[str]] = {0: ['']}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nein. Should have been removed.

# Version 0: the original table schema, runs, experiments, layouts,
# dependencies, result-tables
#
# Version 1: a GUID column is added to the runs table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is description of version 2 missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will add that when we need it, i.e. when we have version 3 on the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants