New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adds versions to Asset databases #815

Merged
merged 5 commits into from Nov 12, 2015

Conversation

Projects
None yet
5 participants
@jfkirk
Contributor

jfkirk commented Nov 4, 2015

Adds a 'version' table to the asset database, and adds a version check to the AssetDBWriters and AssetFinders.

The version table of the asset database
expected_version : int
The expected version of the asset database
write_version : bool, optional

This comment has been minimized.

@ssanderson

ssanderson Nov 4, 2015

Member

When would you want this to be true?

This comment has been minimized.

@jfkirk

jfkirk Nov 4, 2015

Contributor

When initializing the database -- the AssetDBWriter will write the version in to the table if it is not present.

@jfkirk jfkirk force-pushed the asset-db-versioning branch from e528f3b to 3e940e3 Nov 4, 2015

# Define a version number for the database generated by these writers
# Increment this version number any time a breaking change is made to the
# schema and readers of the database
ASSET_DB_VERSION = 1

This comment has been minimized.

@StewartDouglas

StewartDouglas Nov 4, 2015

Contributor

Since we're calling the SQLite file assets.db, I suggest we call this variable ASSETS_DB_VERSION. Also, how about starting at 0.

This comment has been minimized.

@jfkirk

jfkirk Nov 4, 2015

Contributor

I went back and forth on that - the file is called assets.db, but everything else that references it is fairly uniformly named 'AssetDBBlah' so I stuck with the singular. I could go either way, though.

This comment has been minimized.

@StewartDouglas

StewartDouglas Nov 4, 2015

Contributor

I have a slight preference for ASSETS_DB_VERSION, but am happy to go either way. If you stick with ASSET you should update the following line: https://github.com/quantopian/zipline/pull/815/files#diff-33cdc2260291626609681cc61297818aR192

sa.Column(
'version',
sa.Integer,
unique=True,

This comment has been minimized.

@richafrank

richafrank Nov 4, 2015

Member

I think we want this to be a primary key, instead of just unique?

This comment has been minimized.

@jfkirk

jfkirk Nov 4, 2015

Contributor

Makes sense - thanks!

"""
# Read the version out of the table
table_version = sa.select((version_table.c.version,)).scalar()

This comment has been minimized.

@richafrank

richafrank Nov 4, 2015

Member

Just to confirm, this table will have either a single row or no rows?

This comment has been minimized.

@richafrank

richafrank Nov 4, 2015

Member

Does table_version mean the version of the table?

This comment has been minimized.

@jfkirk

jfkirk Nov 4, 2015

Contributor

The table will have a single row or no rows. Do you know of any way to enforce that on the table?

It means the version in the table - I'll rename it to 'version_in_table' or something more clear.

This comment has been minimized.

@ssanderson

ssanderson Nov 4, 2015

Member

@jfkirk you can do this with a CHECK constraint and a primary key:

sqlite> create table test (id INTEGER PRIMARY KEY AUTOINCREMENT CHECK (id <= 1), version int);
sqlite> insert into test (version) VALUES (1);
sqlite> insert into test (version) VALUES (2);
Error: CHECK constraint failed: test

This comment has been minimized.

@jfkirk

jfkirk Nov 4, 2015

Contributor

@ssanderson Perfect, thank you!

@jfkirk jfkirk force-pushed the asset-db-versioning branch from 3e940e3 to d6b974b Nov 4, 2015

@jfkirk

This comment has been minimized.

Contributor

jfkirk commented Nov 4, 2015

@ssanderson I've added a primary key ID to the version table and implemented a constraint as you suggested

@richafrank I updated the 'table_version' variable to 'version_from_table' for clarity

@StewartDouglas The initial version is now 0

@jfkirk jfkirk force-pushed the asset-db-versioning branch from d2b6c04 to 8fb73f8 Nov 4, 2015

@jfkirk

This comment has been minimized.

Contributor

jfkirk commented Nov 4, 2015

@ssanderson I split up version checking and version insertion since they are thematically distinct

version_found = check_version_info(self.version_table,
ASSET_DB_VERSION)
if not version_found:
write_version_info(self.version_table, ASSET_DB_VERSION)

This comment has been minimized.

@richafrank

richafrank Nov 4, 2015

Member

Could it be the case that we've got an old assets.db with no embedded version info, we then call init_db, which doesn't create anything because we use checkfirst=True, and then we insert the latest db version? Seems like we should only write the version info on first creation of the all the tables, not even just the version table (since create_all will create any that are missing)?

@jfkirk jfkirk force-pushed the asset-db-versioning branch 2 times, most recently from 4e44fa4 to f2e92cf Nov 9, 2015

True if any tables are present, otherwise False.
"""
conn = engine.connect()
for table_name in table_names:

This comment has been minimized.

@StewartDouglas

StewartDouglas Nov 9, 2015

Contributor

I think lines 357-360 are the equivalent of doing the following:

return engine.dialect.has_table(conn, table_names[0])

This is because if any of the tables have been created, it implies that they all have been. This is because init_db is called within an SQL transaction.

This comment has been minimized.

@jfkirk

jfkirk Nov 10, 2015

Contributor

In the case of a schema change, table_names[0] may not exist anymore but we'd want this method to return True. The only way I could think of to guarantee this is by checking all table names. It will return after the first table_name, anyway, which would mean that in the nominal case it is identical in execution to

return engine.dialect.has_table(conn, table_names[0])

This comment has been minimized.

@StewartDouglas

StewartDouglas Nov 10, 2015

Contributor

Could it be the case that all table names may change, and then this method would return false, even though tables are present? If we don't care about the names of the tables then we could use SQLAlchemy's Inspector to check whether any tables are present:

from sqlalchemy.engine import reflection
...
insp = reflection.Inspector.from_engine(self.eng)
return True if insp.get_table_names() else False

http://docs.sqlalchemy.org/en/rel_0_7/core/schema.html?highlight=inspector#sqlalchemy.engine.reflection.Inspector

This comment has been minimized.

@jfkirk

jfkirk Nov 10, 2015

Contributor

To cause this method to fail to meet its contract, all table names would have to be changed including the version table, which I see as a catastrophic change likely meaning this method would be gone as well. By my logic there, though, we could change this method to only check for the version_info table. The reason I don't do that, though, is I still want this method to return True if the other non-version_info tables are present but no version_info is present because that is our backwards-compatible case.

@jfkirk jfkirk force-pushed the asset-db-versioning branch 2 times, most recently from dcc339b to c93f489 Nov 10, 2015

@@ -38,7 +38,8 @@ None
Maintenance and Refactorings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
None
* Asset databases now contain version information to ensure compatibility
with current Zipline version.

This comment has been minimized.

@jfkirk

jfkirk Nov 11, 2015

Contributor

@llllllllll I added a whatsnew in its own commit - does that follow the PR dance you mentioned?

@jfkirk jfkirk force-pushed the asset-db-versioning branch from c93f489 to 14f5d9b Nov 12, 2015

@jfkirk jfkirk force-pushed the asset-db-versioning branch from 14f5d9b to 7a1abc8 Nov 12, 2015

jfkirk added a commit that referenced this pull request Nov 12, 2015

Merge pull request #815 from quantopian/asset-db-versioning
ENH: Adds versions to Asset databases

@jfkirk jfkirk merged commit f730034 into master Nov 12, 2015

2 checks passed

Scrutinizer 13 updated code elements
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jfkirk jfkirk deleted the asset-db-versioning branch Nov 12, 2015

@coveralls

This comment has been minimized.

coveralls commented Sep 5, 2017

Coverage Status

Changes Unknown when pulling 7a1abc8 on asset-db-versioning into ** on master**.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment