Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with square bracket in CSV column name #86

Closed
foscoj opened this issue Feb 13, 2020 · 7 comments
Closed

Problem with square bracket in CSV column name #86

foscoj opened this issue Feb 13, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@foscoj
Copy link

foscoj commented Feb 13, 2020

testing some data from european power information (entsoe.eu), the title of the csv contains square brackets.
as I am playing with glitch, sqlite-utils are used for creating the db.

Traceback (most recent call last):

File "/app/.local/bin/sqlite-utils", line 8, in

sys.exit(cli())

File "/app/.local/lib/python3.7/site-packages/click/core.py", line 764, in call

return self.main(*args, **kwargs)

File "/app/.local/lib/python3.7/site-packages/click/core.py", line 717, in main

rv = self.invoke(ctx)

File "/app/.local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/app/.local/lib/python3.7/site-packages/click/core.py", line 956, in invoke

return ctx.invoke(self.callback, **ctx.params)

File "/app/.local/lib/python3.7/site-packages/click/core.py", line 555, in invoke

return callback(*args, **kwargs)

File "/app/.local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 434, in insert

default=default,

File "/app/.local/lib/python3.7/site-packages/sqlite_utils/cli.py", line 384, in insert_upsert_implementation

docs, pk=pk, batch_size=batch_size, alter=alter, **extra_kwargs

File "/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py", line 997, in insert_all

extracts=extracts,

File "/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py", line 618, in create

extracts=extracts,

File "/app/.local/lib/python3.7/site-packages/sqlite_utils/db.py", line 310, in create_table

self.conn.execute(sql)

sqlite3.OperationalError: unrecognized token: "]"

entsoe_2016.csv

renamed to txt for uploading compatibility

entsoe_2016.txt

code is remixed directly from your https://glitch.com/edit/#!/datasette-csvs repo

@simonw simonw changed the title problem with square bracket in csv title Problem with square bracket in CSV column name Feb 16, 2020
@simonw
Copy link
Owner

simonw commented Feb 16, 2020

Thanks for the example file - looks like it can be trimmed down to just these two lines to replicate the bug:

"MTU (CET)","Day-ahead Price [EUR/MWh]"
"01.01.2016 00:00 - 01.01.2016 01:00","23.86"

@simonw
Copy link
Owner

simonw commented Feb 16, 2020

https://stackoverflow.com/a/22694438 looks like the answer:

When using square brackets, it is not possible to have these characters in the identifier.

When using double quotes, you can escape them in the name by doubling them:

CREATE TABLE "hello ""world"""(key INTEGER PRIMARY KEY);

@simonw
Copy link
Owner

simonw commented Feb 16, 2020

There's something weird about this. I created a test database file like so:

sqlite3 /tmp/demo.db <<EOF
    BEGIN TRANSACTION;
    CREATE TABLE "data" (
        "MTU (CET)" TEXT,
        "Day-ahead Price [EUR/MWh]" TEXT
    );
    INSERT INTO "data" VALUES('01.01.2016 00:00 - 01.01.2016 01:00','23.86');
    COMMIT;
EOF

Then confirmed that it works as expected in SQLite:

 $ sqlite3 /tmp/demo.db 
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE IF NOT EXISTS "data" (
        "MTU (CET)" TEXT,
        "Day-ahead Price [EUR/MWh]" TEXT
    );
sqlite> .headers on
sqlite> select * from data;
MTU (CET)|Day-ahead Price [EUR/MWh]
01.01.2016 00:00 - 01.01.2016 01:00|23.86
sqlite> 

BUT... if I open the same database in Python, something weird happens:

In [1]: import sqlite3                                                                                                         
In [2]: conn = sqlite3.connect("/tmp/demo.db")                                                                                 
In [3]: cursor = conn.cursor()                                                                                                 
In [4]: cursor.execute("select * from data")                                                                                   
Out[4]: <sqlite3.Cursor at 0x10c70a0a0>
In [5]: cursor.fetchall()                                                                                                      
Out[5]: [('01.01.2016 00:00 - 01.01.2016 01:00', '23.86')]
In [6]: cursor.description                                                                                                     
Out[6]: 
(('MTU (CET)', None, None, None, None, None, None),
 ('Day-ahead Price', None, None, None, None, None, None))
In [7]: conn.row_factory = sqlite3.Row                                                                                         
In [8]: cursor = conn.cursor()                                                                                                 
In [9]: cursor.execute("select * from data")                                                                                   
Out[9]: <sqlite3.Cursor at 0x10c7a8490>
In [10]: row = cursor.fetchall()                                                                                                     
In [12]: row                                                                                                                   
Out[12]: <sqlite3.Row at 0x10c3fe670>
In [15]: row.keys()                                                                                                            
Out[15]: ['MTU (CET)', 'Day-ahead Price']

Note that in cursor.description AND in row.keys() above the second column is displayed as 'Day-ahead Price' - when we would expect it to be displayed as Day-ahead Price [EUR/MWh]

So.... it looks like there may be a bug in Python's sqlite3 module where columns with square braces in them have that portion of the name stripped out!

@simonw
Copy link
Owner

simonw commented Feb 16, 2020

I'm not sure what to do about this one.

I can't fix it: this bug in Python's sqlite3 module means that even if I write a database out with column names that include [] I won't be able to read them back again.

So... I could do one of the following:

  • Throw an error if a column name includes those characters. That's my preferred option I think.
  • Automatically replace [ in column names with ( and ] with )
  • Do the automatic replacement but show a user-visible warning when I do it
  • Throw an error, but give the user an option to run with e.g. --fix-column-names which applies that automatic fix.

Since this is likely to be an incredibly rare edge-case I think I'd rather minimize the amount of code that deals with it, so my preferred option is to just throw that error and stop.

@simonw simonw added the bug Something isn't working label Feb 16, 2020
@foscoj
Copy link
Author

foscoj commented Feb 16, 2020

Probably the best option to just throw the error.
Is there any active dev chan where we could post the issue to python sqlite3?

@simonw
Copy link
Owner

simonw commented Feb 16, 2020

I filed a bug in the Python issue tracker here: https://bugs.python.org/issue39652

@simonw
Copy link
Owner

simonw commented Feb 27, 2020

I pushed a branch with my experiment in it, but I'm going to fix this by throwing an error on [ or ] in a column name instead - I won't implement the changes from that branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants