Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only set .last_rowid and .last_pk for single update/inserts, not for .insert_all()/.upsert_all() with multiple records #98

Closed
simonw opened this issue Apr 10, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link
Owner

simonw commented Apr 10, 2020

No description provided.

@simonw simonw added the bug Something isn't working label Apr 10, 2020
@simonw
Copy link
Owner Author

simonw commented Apr 10, 2020

Investigate this traceback:

Traceback (most recent call last):
  File "fetch_projects.py", line 60, in <module>
    fetch_projects(db, token)
  File "fetch_projects.py", line 41, in fetch_projects
    db["projects"].upsert(project, pk="id")
  File "/Users/simonw/.local/share/virtualenvs/big-local-datasette-2jT6nJCT/lib/python3.7/site-packages/sqlite_utils/db.py", line 1139, in upsert
    conversions=conversions,
  File "/Users/simonw/.local/share/virtualenvs/big-local-datasette-2jT6nJCT/lib/python3.7/site-packages/sqlite_utils/db.py", line 1168, in upsert_all
    upsert=True,
  File "/Users/simonw/.local/share/virtualenvs/big-local-datasette-2jT6nJCT/lib/python3.7/site-packages/sqlite_utils/db.py", line 1107, in insert_all
    row = list(self.rows_where("rowid = ?", [self.last_rowid]))[0]
IndexError: list index out of range

@simonw simonw changed the title I think .upsert_all() with an empty list may throw an error list index out of range relating to self.last_rowid in some upserts Apr 10, 2020
@simonw
Copy link
Owner Author

simonw commented Apr 10, 2020

I need a test that reproduces this.

@simonw
Copy link
Owner Author

simonw commented Apr 13, 2020

I have a hunch that the root of the problem here is that accessing result.lastrowid during my version of an .upsert() doesn't actually make sense:

with self.db.conn:
for query, params in queries_and_params:
try:
result = self.db.conn.execute(query, params)
except OperationalError as e:
if alter and (" column" in e.args[0]):
# Attempt to add any missing columns, then try again
self.add_missing_columns(chunk)
result = self.db.conn.execute(query, params)
else:
raise
self.last_rowid = result.lastrowid

In the bug I'm seeing (which I still haven't reduced to a reproducible test) the debugger shows me this at that point:

(Pdb) query
'UPDATE [files] SET [createdAt] = ?, [ext] = ?, [updatedAt] = ?, [uri] = ?, [uriType] = ? WHERE [project] = ? AND [name] = ?'
(Pdb) params
['2020-03-04T04:04:40.152000+00:00', 'csv', '2020-03-04T04:04:40.152000+00:00', 'https://storage.googleapis.com/bln_prod/...', 'download', 'UHJvamVjdDo4MTgyMjU2Ny01ZjI0LTQxM2ItYWZmNi05NTlmNGY3MjExMjI=', 'loans_to_documentation.csv']
(Pdb) result.lastrowid
100

But here's the weird thing... there's no row in the table with a rowid of 100!

(Pdb) [r['rowid'] for r in self.db.execute_returning_dicts('select rowid, * from files')]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

So what the heck is going on?

The last SQL statement I executed here was an UPDATE. The lastrowid docs say: https://kite.com/python/docs/sqlite3.Cursor.lastrowid

This read-only attribute provides the rowid of the last modified row. It is only set if you issued a INSERT statement using the execute() method. For operations other than INSERT or when executemany() is called, lastrowid is set to None.

So where did that 100 come from? It should be None!

@simonw
Copy link
Owner Author

simonw commented Apr 13, 2020

Why do I even care about lastrowid here?

I'm trying to ensure that after you insert or upsert a row you can use table.last_pk to start doing things like building additional foreign key relationships.

So maybe it doesn't make sense to make .last_pk available at all for cases where you called .upsert_all() or .insert_all() - it should just be populated for .upsert() and .insert().

The documentation doesn't say it should work for .upsert_all() - it's only documented for the single actions.

self.last_rowid = result.lastrowid
self.last_pk = self.last_rowid
# self.last_rowid will be 0 if a "INSERT OR IGNORE" happened
if (hash_id or pk) and self.last_rowid:
row = list(self.rows_where("rowid = ?", [self.last_rowid]))[0]
if hash_id:
self.last_pk = row[hash_id]
elif isinstance(pk, str):
self.last_pk = row[pk]
else:
self.last_pk = tuple(row[p] for p in pk)
return self

@simonw
Copy link
Owner Author

simonw commented Apr 13, 2020

In mucking around with sqlite3 it looks like result.lastrowid is indeed populated for UPDATE - in this case with the last inserted rowid in the table. This differs from the documented behaviour I linked to above.

In [1]: import sqlite3                                                          

In [2]: c = sqlite3.connect(":memory:")                                         

In [3]: c                                                                       
Out[3]: <sqlite3.Connection at 0x103c4d490>

In [4]: c.execute('create table foo (bar integer);')                            
Out[4]: <sqlite3.Cursor at 0x103f760a0>

In [5]: c.execute('insert into foo (bar) values (1)')                           
Out[5]: <sqlite3.Cursor at 0x103f76650>

In [6]: c.execute('select * from foo').fetchall()                               
Out[6]: [(1,)]

In [7]: c.execute('insert into foo (bar) values (1)')                           
Out[7]: <sqlite3.Cursor at 0x103f766c0>

In [8]: c.execute('select * from foo').fetchall()                               
Out[8]: [(1,), (1,)]

In [9]: c.execute('insert into foo (bar) values (1)').lastrowid                 
Out[9]: 3

In [10]: c.execute('select * from foo').fetchall()                              
Out[10]: [(1,), (1,), (1,)]

In [11]: c.execute('select rowid, bar from foo').fetchall()                     
Out[11]: [(1, 1), (2, 1), (3, 1)]

In [12]: c.execute('insert into foo (bar) values (1)').lastrowid                
Out[12]: 4

In [13]: c.execute('select rowid, bar from foo').fetchall()                     
Out[13]: [(1, 1), (2, 1), (3, 1), (4, 1)]

In [14]: r = c.execute('update foo set bar =2 where rowid = 1')                 

In [15]: r.lastrowid                                                            
Out[15]: 4

In [16]: c.execute('select rowid, bar from foo').fetchall()                     
Out[16]: [(1, 2), (2, 1), (3, 1), (4, 1)]

In [17]: r = c.execute('select rowid, bar from foo')                            

In [18]: r.fetchall()                                                           
Out[18]: [(1, 2), (2, 1), (3, 1), (4, 1)]

In [19]: r.lastrowid                                                            
Out[19]: 4

@simonw
Copy link
Owner Author

simonw commented Apr 13, 2020

Implementation plan: .insert_all() and .upsert_all() should only set .last_rowid and last_pk if they were called with a single item.

@simonw simonw changed the title list index out of range relating to self.last_rowid in some upserts Only set .last_rowid and .last_pk for single update/inserts, not for .insert_all()/.upsert_all() with multiple records Apr 13, 2020
@simonw simonw closed this as completed Apr 13, 2020
simonw added a commit that referenced this issue Apr 13, 2020
Refs #96. Refs #98. Closes #97.
@patricktrainer
Copy link

Hi @simonw - wondering if you might be able to shed some light here. I've seemed to reproduce this issue.
Here's the stacktrace:

...     db["potholes"].insert(pothole, pk='id', alter=True, replace=True)
... 

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/patricktrainer/.pyenv/versions/3.9.0/lib/python3.9/site-packages/sqlite_utils/db.py", line 2481, in insert
    return self.insert_all(
  File "/Users/patricktrainer/.pyenv/versions/3.9.0/lib/python3.9/site-packages/sqlite_utils/db.py", line 2596, in insert_all
    self.insert_chunk(
  File "/Users/patricktrainer/.pyenv/versions/3.9.0/lib/python3.9/site-packages/sqlite_utils/db.py", line 2424, in insert_chunk
    row = list(self.rows_where("rowid = ?", [self.last_rowid]))[0]
IndexError: list index out of range

Interesting enough, I found that omitting the pk param does not throw the error.

Let me know how I can help out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants