Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.extract() shouldn't extract null values #186

Open
simonw opened this issue Oct 16, 2020 · 7 comments
Open

.extract() shouldn't extract null values #186

simonw opened this issue Oct 16, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link
Owner

simonw commented Oct 16, 2020

This almost works, but it creates a rogue type record with a value of None.

In [1]: import sqlite_utils
In [2]: db = sqlite_utils.Database(memory=True)
In [5]: db["creatures"].insert_all([
    {"id": 1, "name": "Simon", "type": None},
    {"id": 2, "name": "Natalie", "type": None},
    {"id": 3, "name": "Cleo", "type": "dog"}], pk="id")
Out[5]: <Table creatures (id, name, type)>
In [7]: db["creatures"].extract("type")
Out[7]: <Table creatures (id, name, type_id)>
In [8]: list(db["creatures"].rows)
Out[8]: 
[{'id': 1, 'name': 'Simon', 'type_id': None},
 {'id': 2, 'name': 'Natalie', 'type_id': None},
 {'id': 3, 'name': 'Cleo', 'type_id': 2}]
In [9]: db["type"]
Out[9]: <Table type (id, type)>
In [10]: list(db["type"].rows)
Out[10]: [{'id': 1, 'type': None}, {'id': 2, 'type': 'dog'}]
@simonw simonw added the bug Something isn't working label Oct 16, 2020
@simonw
Copy link
Owner Author

simonw commented Oct 16, 2020

How should this work with extractions covering multiple columns?

If there's a single column then it makes sense that a null value would not be extracted into the lookup table, but would instead become stay as null.

For a multiple column extraction, provided at least one of those columns is not null It should map to a record in the lookup table. Only if ALL of the extracted columns are null should the lookup value stay null.

@simonw
Copy link
Owner Author

simonw commented Oct 16, 2020

The alternative solution here would be that a single null value DOES get extracted. To implement this I would need to add some logic that uses is null instead of =.

@simonw
Copy link
Owner Author

simonw commented Oct 16, 2020

Either way I think I'm going to need to add some SQL which uses where a = b or (a is null and b is null).

@simonw
Copy link
Owner Author

simonw commented Oct 16, 2020

Actually I think this should be an option to .extract() which controls if nulls are extracted or left alone. Maybe called extract_nulls=True/False.

@simonw
Copy link
Owner Author

simonw commented Oct 16, 2020

Failing test:

def test_extract_null_values(fresh_db):
    fresh_db["species"].insert({"id": 1, "species": "Wolf"}, pk="id")
    fresh_db["individuals"].insert_all(
        [
            {"id": 10, "name": "Terriana", "species": "Fox"},
            {"id": 11, "name": "Spenidorm", "species": None},
            {"id": 12, "name": "Grantheim", "species": "Wolf"},
            {"id": 13, "name": "Turnutopia", "species": None},
            {"id": 14, "name": "Wargal", "species": "Wolf"},
        ],
        pk="id",
    )
    fresh_db["individuals"].extract("species")
    assert fresh_db["species"].schema == (
        "CREATE TABLE [species] (\n"
        "   [id] INTEGER PRIMARY KEY,\n"
        "   [species] TEXT\n"
        ")"
    )
    assert fresh_db["individuals"].schema == (
        'CREATE TABLE "individuals" (\n'
        "   [id] INTEGER PRIMARY KEY,\n"
        "   [name] TEXT,\n"
        "   [species_id] INTEGER,\n"
        "   FOREIGN KEY(species_id) REFERENCES species(id)\n"
        ")"
    )
    assert list(fresh_db["species"].rows) == [
        {"id": 1, "species": "Wolf"},
        {"id": 2, "species": "Fox"},
    ]
    assert list(fresh_db["individuals"].rows) == [
        {"id": 10, "name": "Terriana", "species_id": 2},
        {"id": 11, "name": "Spenidorm", "species_id": None},
        {"id": 12, "name": "Grantheim", "species_id": 1},
        {"id": 13, "name": "Turnutopia", "species_id": None},
        {"id": 14, "name": "Wargal", "species_id": 1},
    ]

@rayvoelker
Copy link

I think I ran into an issue that's perhaps related with extract()

I have a case where I want to create a lookup table for all the related title data where there are possibly multiple null values in the related columns ....

sql = """\
INSERT INTO "circulation_info" ("item_id", "bib_title", "bib_creator", "bib_format", "bib_pub_year", "checkout_date")
VALUES
(1, "title one", "creator one", "Book", 2018, "2021-08-12 00:01"),
(2, "title two", "creator one", "Book", 2019, "2021-08-12 00:02"),
(3, "title three", NULL, "DVD", 2020, "2021-08-12 00:03"),
(4, "title four", NULL, "DVD", NULL, "2021-08-12 00:04"),
(5, "title five", NULL, "DVD", NULL, "2021-08-12 00:05")
"""

with sqlite3.connect('test_bib.db') as con:
    con.execute(sql)

when I run the extract() method ...

db["circulation_info"].extract(
    [
        "bib_title",
        "bib_creator",
        "bib_format"        
    ],
    table="bib_info", 
    fk_column="bib_info_id"
)

db = sqlite_utils.Database("test_bib.db")

for row in db["circulation_info"].rows:
    print(row)

print("\n---\n")

for row in db["bib_info"].rows:
    print(row)

results in this ..

{'id': 1, 'item_id': 1, 'bib_info_id': 1, 'bib_pub_year': 2018, 'checkout_date': '2021-08-12 00:01'}
{'id': 2, 'item_id': 2, 'bib_info_id': 2, 'bib_pub_year': 2019, 'checkout_date': '2021-08-12 00:02'}
{'id': 3, 'item_id': 3, 'bib_info_id': None, 'bib_pub_year': 2020, 'checkout_date': '2021-08-12 00:03'}
{'id': 4, 'item_id': 4, 'bib_info_id': None, 'bib_pub_year': None, 'checkout_date': '2021-08-12 00:04'}
{'id': 5, 'item_id': 5, 'bib_info_id': None, 'bib_pub_year': None, 'checkout_date': '2021-08-12 00:05'}

---

{'id': 1, 'bib_title': 'title one', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 2, 'bib_title': 'title two', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 3, 'bib_title': 'title three', 'bib_creator': None, 'bib_format': 'DVD'}
{'id': 4, 'bib_title': 'title four', 'bib_creator': None, 'bib_format': 'DVD'}
{'id': 5, 'bib_title': 'title five', 'bib_creator': None, 'bib_format': 'DVD'}

Seems like it's correctly generating the row data for those lookups, but it's not correctly updating the foreign key back to the primary table? Looks like it just results in a NULL value in that original table.

Any ideas on why? Thanks again!

@rayvoelker
Copy link

Actually, I forgot to include the bib_pub_year in the extract ...

But also, I tried again with empty string values instead of NULL values and it seems to place the foreign key properly / correctly...

sql = """\
INSERT INTO "circulation_info" ("item_id", "bib_title", "bib_creator", "bib_format", "bib_pub_year", "checkout_date")
VALUES
(1, "title one", "creator one", "Book", 2018, "2021-08-12 00:01"),
(2, "title two", "creator one", "Book", 2019, "2021-08-12 00:02"),
(3, "title three", "", "DVD", 2020, "2021-08-12 00:03"),
(4, "title four", "", "DVD", "", "2021-08-12 00:04"),
(5, "title five", "", "DVD", "", "2021-08-12 00:05")
"""

with sqlite3.connect('test_bib_2.db') as con:
    con.execute(sql)
db["circulation_info"].extract(
    [
        "bib_title",
        "bib_creator",
        "bib_format",
        "bib_pub_year"
    ],
    table="bib_info", 
    fk_column="bib_info_id"
)
{'id': 1, 'item_id': 1, 'bib_info_id': 1, 'bib_pub_year': 2018, 'checkout_date': '2021-08-12 00:01'}
{'id': 2, 'item_id': 2, 'bib_info_id': 2, 'bib_pub_year': 2019, 'checkout_date': '2021-08-12 00:02'}
{'id': 3, 'item_id': 3, 'bib_info_id': 3, 'bib_pub_year': 2020, 'checkout_date': '2021-08-12 00:03'}
{'id': 4, 'item_id': 4, 'bib_info_id': 4, 'bib_pub_year': '', 'checkout_date': '2021-08-12 00:04'}
{'id': 5, 'item_id': 5, 'bib_info_id': 5, 'bib_pub_year': '', 'checkout_date': '2021-08-12 00:05'}

---

{'id': 1, 'bib_title': 'title one', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 2, 'bib_title': 'title two', 'bib_creator': 'creator one', 'bib_format': 'Book'}
{'id': 3, 'bib_title': 'title three', 'bib_creator': '', 'bib_format': 'DVD'}
{'id': 4, 'bib_title': 'title four', 'bib_creator': '', 'bib_format': 'DVD'}
{'id': 5, 'bib_title': 'title five', 'bib_creator': '', 'bib_format': 'DVD'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants