Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upsert_all() throws issue when upserting to empty table #73

Closed
psychemedia opened this issue Jan 5, 2020 · 6 comments
Closed

upsert_all() throws issue when upserting to empty table #73

psychemedia opened this issue Jan 5, 2020 · 6 comments

Comments

@psychemedia
Copy link

psychemedia commented Jan 5, 2020

If I try to add a list of dicts to an empty table using upsert_all, I get an error:

import sqlite3
from sqlite_utils import Database
import pandas as pd

conx = sqlite3.connect(':memory')
cx = conx.cursor()
cx.executescript('CREATE TABLE "test" ("Col1" TEXT);')

q="SELECT * FROM test;"
pd.read_sql(q, conx) #shows empty table

db = Database(conx)
db['test'].upsert_all([{'Col1':'a'},{'Col1':'b'}])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-74-8c26d93d7587> in <module>
      1 db = Database(conx)
----> 2 db['test'].upsert_all([{'Col1':'a'},{'Col1':'b'}])

/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in upsert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, alter, extracts)
   1157             alter=alter,
   1158             extracts=extracts,
-> 1159             upsert=True,
   1160         )
   1161 

/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, alter, ignore, replace, extracts, upsert)
   1040                     sql = "INSERT OR IGNORE INTO [{table}]({pks}) VALUES({pk_placeholders});".format(
   1041                         table=self.name,
-> 1042                         pks=", ".join(["[{}]".format(p) for p in pks]),
   1043                         pk_placeholders=", ".join(["?" for p in pks]),
   1044                     )

TypeError: 'NoneType' object is not iterable

A hacky workaround in use is:

try:
  db['test'].upsert_all([{'Col1':'a'},{'Col1':'b'}])
except:
  db['test'].insert_all([{'Col1':'a'},{'Col1':'b'}])
@simonw
Copy link
Owner

simonw commented Jan 5, 2020

I think this is because you forgot to include a pk= argument. I'll change the code to throw a more useful error in this case.

@simonw simonw closed this as completed in 489eda9 Jan 5, 2020
@simonw
Copy link
Owner

simonw commented Jan 5, 2020

@psychemedia
Copy link
Author

I think I actually had several issues in play...

The missing key was one, but I think there is also an issue as per below.

For example, in the following:

def init_testdb(dbname='test.db'):
    
    if os.path.exists(dbname):
        os.remove(dbname)

    conn = sqlite3.connect(dbname)
    db = Database(conn)
    
    return conn, db

conn, db = init_testdb()

c = conn.cursor()
c.executescript('CREATE TABLE "test1" ("Col1" TEXT, "Col2" TEXT, PRIMARY KEY ("Col1"));')
c.executescript('CREATE TABLE "test2" ("Col1" TEXT, "Col2" TEXT, PRIMARY KEY ("Col1"));')

print('Test 1...')
for i in range(3):
    db['test1'].upsert_all([{'Col1':'a', 'Col2':'x'},{'Col1':'b', 'Col2':'x'}], pk=('Col1'))
    db['test2'].upsert_all([{'Col1':'a', 'Col2':'x'},{'Col1':'b', 'Col2':'x'}], pk=('Col1'))

print('Test 2...')
for i in range(3):
    db['test1'].upsert_all([{'Col1':'a', 'Col2':'x'},{'Col1':'b', 'Col2':'x'}], pk=('Col1'))
    db['test2'].upsert_all([{'Col1':'a', 'Col2':'x'},{'Col1':'b', 'Col2':'x'},
                            {'Col1':'c','Col2':'x'}], pk=('Col1'))
print('Done...')

---------------------------------------------------------------------------
Test 1...
Test 2...
 IndexError: list index out of range 
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-763-444132ca189f> in <module>
     22 print('Test 2...')
     23 for i in range(3):
---> 24     db['test1'].upsert_all([{'Col1':'a', 'Col2':'x'},{'Col1':'b', 'Col2':'x'}], pk=('Col1'))
     25     db['test2'].upsert_all([{'Col1':'a', 'Col2':'x'},{'Col1':'b', 'Col2':'x'},
     26                             {'Col1':'c','Col2':'x'}], pk=('Col1'))

/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in upsert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, alter, extracts)
   1157             alter=alter,
   1158             extracts=extracts,
-> 1159             upsert=True,
   1160         )
   1161 

/usr/local/lib/python3.7/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, alter, ignore, replace, extracts, upsert)
   1097                 # self.last_rowid will be 0 if a "INSERT OR IGNORE" happened
   1098                 if (hash_id or pk) and self.last_rowid:
-> 1099                     row = list(self.rows_where("rowid = ?", [self.last_rowid]))[0]
   1100                     if hash_id:
   1101                         self.last_pk = row[hash_id]

IndexError: list index out of range

the first test works but the second fails. Is the length of the list of items being upserted leaking somewhere?

@simonw
Copy link
Owner

simonw commented Jan 10, 2020

Odd.. I'm not able to replicate that error. Here's what I got:
Tony_Bug

@psychemedia
Copy link
Author

psychemedia commented Jan 10, 2020

Hmmm... just tried with installs from pip and the repo (v2.0.0 and v2.0.1) and I get the error each time (start of second run through the second loop).

Could it be sqlite3? I'm on 3.30.1.

UPDATE: just tried it on jupyter.org/try and I get the error there, too.

@psychemedia
Copy link
Author

psychemedia commented Jan 31, 2020

So the conundrum continues.. The simple test case above now runs, but if I upsert a large number of new records (successfully) and then try to upsert a fewer number of new records to a different table, I get the same error.

If I run the same upserts again (which in the first case means there are no new records to add, because they were already added), the second upsert works correctly.

It feels as if the number of items added via an upsert >> the number of items I try to add in an upsert immediately after, I get the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants