Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite-utils index-foreign-keys fails due to pre-existing index #335

Closed
zaneselvans opened this issue Nov 2, 2021 · 11 comments
Closed

sqlite-utils index-foreign-keys fails due to pre-existing index #335

zaneselvans opened this issue Nov 2, 2021 · 11 comments
Labels
bug Something isn't working

Comments

@zaneselvans
Copy link

While running the command:

sqlite-utils index-foreign-keys $SQLITE_DIR/pudl.sqlite

I got the following error:

Traceback (most recent call last):
  File "/home/zane/miniconda3/envs/pudl-dev/bin/sqlite-utils", line 8, in <module>
    sys.exit(cli())
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/cli.py", line 454, in index_foreign_keys
    db.index_foreign_keys()
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/db.py", line 902, in index_foreign_keys
    table.create_index([fk.column])
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/db.py", line 1563, in create_index
    self.db.execute(sql)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.9/site-packages/sqlite_utils/db.py", line 421, in execute
    return self.conn.execute(sql)
sqlite3.OperationalError: index idx_generators_eia860_report_date already exists

This DB was created with the foreign key constraint PRAGMA enabled and a bunch of column-level CHECK constraints. Is this an expected behavior? Should one not try to index foreign keys if FK constraints are already being enforced within the DB?

I'm also noticing that the size of the DB after FK indexes have been added went from 483MB to 835MB, which seems like a much bigger jump than when I've done this previously.

Software versions...

  • sqlite-utils 3.17.1
  • sqlite 3.36.0
  • SQLAlchemy 1.4.26 (used to create the DB)
@simonw simonw added the bug Something isn't working label Nov 6, 2021
@simonw
Copy link
Owner

simonw commented Nov 14, 2021

This is strange - the code already checks that an index doesn't exist before attempting to create it:

def index_foreign_keys(self):
"Create indexes for every foreign key column on every table in the database."
for table_name in self.table_names():
table = self[table_name]
existing_indexes = {
i.columns[0] for i in table.indexes if len(i.columns) == 1
}
for fk in table.foreign_keys:
if fk.column not in existing_indexes:
table.create_index([fk.column])

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

The "index idx_generators_eia860_report_date already exists" error suggests that the problem here is actually one of an index name collision.

                 table.create_index([fk.column]) 

This will derive a name for the index automatically from the name of the table and the name of the passed in columns:

if index_name is None:
index_name = "idx_{}_{}".format(
self.name.replace(" ", "_"), "_".join(columns)
)

So perhaps .create_index() should grow an extra option that creates the index even if the name already exists, by finding a new name.

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

What would such an option be called? Some options:

  • table.create_index([fk.column], force=True) - not obvious what force means here
  • table.create_index([fk.column], ignore_existing_name=True) - not obvious what ignore means here
  • table.create_index([fk.column], pick_unique_name=True) - bit verbose

If the user doesn't pass in an explicit name it seems like their intent is "just create me the index, I don't care what name you use" - so actually perhaps the default behaviour here should be to pick a new unique name if that name is already in use.

Then maybe there should be an option to disable that - some options there:

  • table.create_index([fk.column], error_on_existing_index_name=True) - too verbose
  • table.create_index([fk.column], force=False) - not clear what force means

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

I'm tempted to not provide an opt-out option either: if you call table.create_index(...) without specifying an index name I think the tool should create the index for you, quietly picking an index name that works.

But... it feels wasteful to create an index that exactly duplicates an existing index. Would SQLite even let you do that or would it notice and NOT double the amount of disk space used for that index?

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

SQLite will happily create multiple identical indexes on a table, using more disk space each time:

>>> import sqlite_utils
>>> db = sqlite_utils.Database("dupes.db")
>>> db["t"].insert_all({"id": i} for i in range(10000))
<Table t (id)>
# dupes.db is 98304 bytes
>>> db["t"].create_index(["id"])
<Table t (id)>
# dupes.db is 204800 bytes
>>> db["t"].indexes
[Index(seq=0, name='idx_t_id', unique=0, origin='c', partial=0, columns=['id'])]
>>> db["t"].create_index(["id"], index_name="t_idx_t_id_2")
<Table t (id)>
# 311296 bytes
>>> db["t"].create_index(["id"], index_name="t_idx_t_id_3")
<Table t (id)>
# 417792 bytes
>>> db.vacuum()
# Still 417792 bytes

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

Looking at the method signature:

def create_index(
self,
columns: Iterable[Union[str, DescIndex]],
index_name: Optional[str] = None,
unique: bool = False,
if_not_exists: bool = False,
):

if_not_exists just adds a IF NOT EXISTS clause here:

"""
CREATE {unique}INDEX {if_not_exists}[{index_name}]
ON [{table_name}] ({columns});
"""
)
.strip()
.format(
index_name=index_name,
table_name=self.name,
columns=", ".join(columns_sql),
unique="UNIQUE " if unique else "",
if_not_exists="IF NOT EXISTS " if if_not_exists else "",
)

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

I'm leaning towards table.create_index(columns, ignore_existing_name=True) now.

Or resolve_existing_name - or skip_existing_name?

"ignore" sounds like it might not create the index if the name exists, but we want to still create the index but pick a new name.

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

How to figure out if an index name is already in use? PRAGMA index_list(t) requires a table name. This does it:

SELECT name 
FROM sqlite_master 
WHERE type = 'index';

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

I think I'll attempt to create the index and re-try if it fails with that error.

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

create_index(..., find_unique_name=) is good. Default to false. index_foreign_keys can set it to true.

@simonw
Copy link
Owner

simonw commented Nov 14, 2021

OK, this should fix it.

@simonw simonw closed this as completed Nov 14, 2021
simonw added a commit that referenced this issue Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants