Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect_fts() identifies the wrong table if tables have names that are subsets of each other #434

Closed
ryascott opened this issue May 20, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@ryascott
Copy link

ryascott commented May 20, 2022

Windows 10
Python 3.9.6

When I was running a full text search through the Python library, I noticed that the query was being run on a different full text search table than the one I was trying to search.

I took a look at the following function

def detect_fts(self) -> Optional[str]:

and noticed:

sql LIKE '%VIRTUAL TABLE%USING FTS%content=%{table}%'

My database contains tables with similar names and %{table}% was matching another table that ended differently in its name.
I have included a sample test that shows this occurring:

I search for Marsupials in db["books"] and The Clue of the Broken Blade is returned.

This occurs since the search for Marsupials was "successfully" done against db["booksb"] and rowid 1 is returned. "The Clue of the Broken Blade" has a rowid of 1 in db["books"] and this is what is returned from the search.

def test_fts_search_with_similar_table_names(fresh_db):
    db = Database(memory=True)
    db["books"].insert_all(
        [
            {
                "title": "The Clue of the Broken Blade",
                "author": "Franklin W. Dixon",
            },
            {
                "title": "Habits of Australian Marsupials",
                "author": "Marlee Hawkins",
            },
        ]
    )
    db["booksb"].insert(
        {
            "title": "Habits of Australian Marsupials",
            "author": "Marlee Hawkins",
        }
    )

    db["booksb"].enable_fts(["title", "author"])
    db["books"].enable_fts(["title", "author"])


    query = "Marsupials"

    assert [
            {   "rowid": 1,
                "title": "Habits of Australian Marsupials",
                "author": "Marlee Hawkins",
            },
        ] == list(db["books"].search(query))
@simonw simonw changed the title Full text search via Python library returns results from similar table name in the same database detect_fts() identifies the wrong table if tables have names that are subsets of each other Jun 14, 2022
@simonw simonw added the bug Something isn't working label Jun 14, 2022
@simonw
Copy link
Owner

simonw commented Jun 14, 2022

Definitely a bug - thanks for the detailed write-up!

You're right, the code at fault is here:

def detect_fts(self) -> Optional[str]:
"Detect if table has a corresponding FTS virtual table and return it"
sql = (
textwrap.dedent(
"""
SELECT name FROM sqlite_master
WHERE rootpage = 0
AND (
sql LIKE '%VIRTUAL TABLE%USING FTS%content=%{table}%'
OR (
tbl_name = "{table}"
AND sql LIKE '%VIRTUAL TABLE%USING FTS%'
)
)
"""
)
.strip()
.format(table=self.name)
)

@simonw
Copy link
Owner

simonw commented Jun 14, 2022

A test that demonstrates the problem:

@pytest.mark.parametrize("reverse_order", (True, False))
def test_detect_fts_similar_tables(fresh_db, reverse_order):
    # https://github.com/simonw/sqlite-utils/issues/434
    table1, table2 = ("demo", "demo2")
    if reverse_order:
        table1, table2 = table2, table1

    fresh_db[table1].insert({"title": "Hello"}).enable_fts(
        ["title"], fts_version="FTS4"
    )
    fresh_db[table2].insert({"title": "Hello"}).enable_fts(
        ["title"], fts_version="FTS4"
    )
    assert fresh_db[table1].detect_fts() == "{}_fts".format(table1)
    assert fresh_db[table2].detect_fts() == "{}_fts".format(table2)

The order matters - so this test currently passes in one direction and fails in the other:

>       assert fresh_db[table2].detect_fts() == "{}_fts".format(table2)
E       AssertionError: assert 'demo2_fts' == 'demo_fts'
E         - demo_fts
E         + demo2_fts
E         ?     +

tests/test_introspect.py:53: AssertionError
========================================================================================= short test summary info =========================================================================================
FAILED tests/test_introspect.py::test_detect_fts_similar_tables[True] - AssertionError: assert 'demo2_fts' == 'demo_fts'
=============================================================================== 1 failed, 1 passed, 855 deselected in 1.00s ===============================================================================

@simonw
Copy link
Owner

simonw commented Jun 14, 2022

Since table names can be quoted like this:

CREATE VIRTUAL TABLE "searchable_fts"
    USING FTS4 (text1, text2, [name with . and spaces], content="searchable")

OR like this:

CREATE VIRTUAL TABLE "searchable_fts"
    USING FTS4 (text1, text2, [name with . and spaces], content=[searchable])

This fix looks to be correct to me (copying from the updated test_with_trace() test):

            (
                "SELECT name FROM sqlite_master\n"
                "    WHERE rootpage = 0\n"
                "    AND (\n"
                "        sql LIKE :like\n"
                "        OR sql LIKE :like2\n"
                "        OR (\n"
                "            tbl_name = :table\n"
                "            AND sql LIKE '%VIRTUAL TABLE%USING FTS%'\n"
                "        )\n"
                "    )",
                {
                    "like": "%VIRTUAL TABLE%USING FTS%content=[dogs]%",
                    "like2": '%VIRTUAL TABLE%USING FTS%content="dogs"%',
                    "table": "dogs",
                },
            )

@simonw simonw closed this as completed in b8af3b9 Jun 14, 2022
simonw added a commit that referenced this issue Jun 15, 2022
simonw added a commit that referenced this issue Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants