Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping FTS search strings #246

Closed
DeNeutoy opened this issue Mar 15, 2021 · 4 comments
Closed

Escaping FTS search strings #246

DeNeutoy opened this issue Mar 15, 2021 · 4 comments
Labels
enhancement New feature or request python-library

Comments

@DeNeutoy
Copy link
Contributor

Thanks for the excellent library, it's very nice to use!

I've been building some in memory search functionality for a data annotation tool i'm making, and I got tripped up a little bit with escaping the full text search queries. First I tried using db.quote(q), which doesn't work, because sqlite FTS has it's own (separate) query syntax. You can see this happening here also:

http://search-24ways.herokuapp.com/24ways-f8f455f/articles?_search=acces%2A

I got around this by aggressively escaping quotes inside the query string like this:

        quoted = q.replace('"', '""')
        quoted = f'"{quoted}"'
        print(quoted)
        results = db["data"].search(quoted, columns=["id"])
        return [x["id"] for x in results]

This works in the sense it doesn't crash, but it also removes access to the search query syntax. Given the well specified definition, it might be possible for sqlite-utils to provide a db.quote_query(q) which would intelligently escape a query whilst leaving the syntax intact. This would be very nice!

@simonw
Copy link
Owner

simonw commented Mar 15, 2021

This is a smart feature. I have something that does this in Datasette, extracting it out to sqlite-utils makes a lot of sense.

https://github.com/simonw/datasette/blob/8e18c7943181f228ce5ebcea48deb59ce50bee1f/datasette/utils/__init__.py#L818-L829

@polyrand
Copy link

polyrand commented Mar 18, 2021

I have found a similar problem, but I only when using that type of query (with * for doing a prefix search). I'm also building something on top of FTS5/sqlite-utils, and the way I decided to handle it was creating a specific function for prefixes. According to the docs, the query can be done in this 2 ways:

... MATCH '"one two thr" * '
... MATCH 'one + two + thr*'

I thought I could build a query like the first one using this function:

def prefix(query: str):
    return f'"{query}" *'

And then I use the output of that function as the query parameter for the standard .search() method in sqlite-utils.

However, my use case is different because I'm the one "deciding" when to use a prefix search, not the end user. I also haven't done many tests, but maybe you found that useful. One thing I could think of is checking if the query has an * at the end, remove it and build the prefix query using the function above.

This is just for prefix queries, I think having the escaping function is still useful for other use cases.

@simonw simonw closed this as completed in 53fec0d Aug 18, 2021
@simonw
Copy link
Owner

simonw commented Aug 18, 2021

The db.quote_fts(value) method from #247 can now be used for this - documentation here: https://sqlite-utils.datasette.io/en/latest/reference.html#sqlite_utils.db.Database.quote_fts

I'll be adding further improvements relating to this (a table.search(q, quote=True) parameter) in #296.

simonw added a commit that referenced this issue Aug 18, 2021
@simonw
Copy link
Owner

simonw commented Aug 18, 2021

simonw added a commit that referenced this issue Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python-library
Projects
None yet
Development

No branches or pull requests

3 participants