Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3 bug handling column names that contain square braces #83833

Closed
simonw mannequin opened this issue Feb 16, 2020 · 10 comments
Closed

sqlite3 bug handling column names that contain square braces #83833

simonw mannequin opened this issue Feb 16, 2020 · 10 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@simonw
Copy link
Mannequin

simonw mannequin commented Feb 16, 2020

BPO 39652
Nosy @berkerpeksag, @serhiy-storchaka, @simonw, @miss-islington, @eLRuLL
PRs
  • bpo-39652: returning entire sqlite column names #18533
  • bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. #18942
  • [3.8] bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942) #19103
  • [3.7] bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942). #19104
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2020-03-21.14:35:30.873>
    created_at = <Date 2020-02-16.17:10:07.563>
    labels = ['extension-modules', '3.8', 'type-bug', '3.7', '3.9']
    title = 'sqlite3 bug handling column names that contain square braces'
    updated_at = <Date 2020-03-21.14:35:30.872>
    user = 'https://github.com/simonw'

    bugs.python.org fields:

    activity = <Date 2020-03-21.14:35:30.872>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2020-03-21.14:35:30.873>
    closer = 'serhiy.storchaka'
    components = ['Extension Modules']
    creation = <Date 2020-02-16.17:10:07.563>
    creator = 'simonw'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39652
    keywords = ['patch']
    message_count = 10.0
    messages = ['362081', '362117', '362119', '363959', '363964', '363968', '363976', '364748', '364752', '364753']
    nosy_count = 6.0
    nosy_names = ['ghaering', 'berker.peksag', 'serhiy.storchaka', 'simonw', 'miss-islington', 'elrull']
    pr_nums = ['18533', '18942', '19103', '19104']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39652'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @simonw
    Copy link
    Mannequin Author

    simonw mannequin commented Feb 16, 2020

    Bit of an obscure bug this one. SQLite allows column names to contain [ and ] characters, even though those are often used as delimiters in SQLite. Here's how to create such a database with bash:

    sqlite3 /tmp/demo.db <<EOF
        BEGIN TRANSACTION;
        CREATE TABLE "data" (
            "MTU (CET)" TEXT,
            "Day-ahead Price [EUR/MWh]" TEXT
        );
        INSERT INTO "data" VALUES('01.01.2016 00:00 - 01.01.2016 01:00','23.86');
        COMMIT;
    EOF
    

    If you query that database from sqlite3 in Python, the [Eur/MWh] piece is removed from that column name:

    In [1]: import sqlite3                                                                                                         
    In [2]: conn = sqlite3.connect("/tmp/demo.db")                                                                                 
    In [3]: cursor = conn.cursor()                                                                                                 
    In [4]: cursor.execute("select * from data")                                                                                   
    Out[4]: <sqlite3.Cursor at 0x10c70a0a0>
    In [5]: cursor.fetchall()                                                                                                      
    Out[5]: [('01.01.2016 00:00 - 01.01.2016 01:00', '23.86')]
    In [6]: cursor.description                                                                                                     
    Out[6]: 
    (('MTU (CET)', None, None, None, None, None, None),
     ('Day-ahead Price', None, None, None, None, None, None))
    In [7]: conn.row_factory = sqlite3.Row                                                                                         
    In [8]: cursor = conn.cursor()                                                                                                 
    In [9]: cursor.execute("select * from data")                                                                                   
    Out[9]: <sqlite3.Cursor at 0x10c7a8490>
    In [10]: row = cursor.fetchall()                                                                                                     
    In [12]: row                                                                                                                   
    Out[12]: <sqlite3.Row at 0x10c3fe670>
    In [15]: row.keys()                                                                                                            
    Out[15]: ['MTU (CET)', 'Day-ahead Price']
    

    As you can see, it is missing from both cursor.description and from row.keys() here.

    But... if you query that database using SQLite directly (with .headers on so you can see the name of the columns) it works as expected:

    $ sqlite3 /tmp/demo.db 
    SQLite version 3.24.0 2018-06-04 14:10:15
    Enter ".help" for usage hints.
    sqlite> .schema
    CREATE TABLE IF NOT EXISTS "data" (
            "MTU (CET)" TEXT,
            "Day-ahead Price [EUR/MWh]" TEXT
        );
    sqlite> .headers on
    sqlite> select * from data;
    MTU (CET)|Day-ahead Price [EUR/MWh]
    01.01.2016 00:00 - 01.01.2016 01:00|23.86
    sqlite> 
    

    It looks to me like this is a bug in Python's SQLite3 module.

    This was first reported here: simonw/sqlite-utils#86

    @simonw simonw mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Feb 16, 2020
    @elrull
    Copy link
    Mannequin

    elrull mannequin commented Feb 17, 2020

    Thanks for reporting this issue, I have a PR here #18533 but that still needs some clarification, because there were some tests that were actually testing that we were stripping everything after the square brackets.

    P.D. this is my first contribution :)

    @simonw
    Copy link
    Mannequin Author

    simonw mannequin commented Feb 17, 2020

    Oh how interesting - yes it looks like this is deliberate behavior introduced in this commit: 0e3f591

    @berkerpeksag
    Copy link
    Member

    ghaering/pysqlite@f3d452f should probably give more context than that huge svnmerge commit :)

    (Removed older Python versions since even if decide to change the behavior from 2006, it may break third-party programs that are relying on it.)

    @berkerpeksag berkerpeksag removed 3.7 (EOL) end of life 3.8 only security fixes labels Mar 11, 2020
    @serhiy-storchaka
    Copy link
    Member

    It was originally introduced in ghaering/pysqlite@780a76d. I think that original column names should be returned when PARSE_COLNAMES is not set.

    Working on a PR.

    @serhiy-storchaka serhiy-storchaka self-assigned this Mar 11, 2020
    @elrull
    Copy link
    Mannequin

    elrull mannequin commented Mar 11, 2020

    hi @serhiy.storchaka is this something that you think could be done by a new contributor? I'd really love to take care of this, I can improve on the PR I was preparing #18533

    @serhiy-storchaka
    Copy link
    Member

    Sorry Raul, but I already have a PR written. It includes minor refactoring, better error handling in the nearby code, improved tests, fixed documentation and comments which incorrectly described the current behavior. It took time to determine what parts should be fixed, and unless you have found all this in your PR it would take much more time to explain to you what details should be checked than check and fix them myself.

    @serhiy-storchaka
    Copy link
    Member

    New changeset b146568 by Serhiy Storchaka in branch 'master':
    bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942)
    https://github.com/python/cpython/commit/b146568dfcbcd7409c724f8917e4f77433dd56e4

    @miss-islington
    Copy link
    Contributor

    New changeset 687f592 by Miss Islington (bot) in branch '3.8':
    bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942)
    https://github.com/python/cpython/commit/687f5921a46cf95c2a648d8031f9e99cdcc3e6b7

    @serhiy-storchaka
    Copy link
    Member

    New changeset 39680fb by Serhiy Storchaka in branch '3.7':
    [3.7] bpo-39652: Truncate the column name after '[' only if PARSE_COLNAMES is set. (GH-18942). (GH-19104)
    https://github.com/python/cpython/commit/39680fb7043e555469e08d3c4f49073acca77b20

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life 3.8 only security fixes labels Mar 21, 2020
    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life 3.8 only security fixes labels Mar 21, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants