Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option that passes the results back to the LLM so it can generate a prose answer for you - -p/--prose #7

Open
Tracked by #1
simonw opened this issue Aug 19, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Aug 19, 2024

No description provided.

@simonw simonw added the enhancement New feature or request label Aug 19, 2024
@simonw simonw mentioned this issue Aug 19, 2024
8 tasks
@simonw
Copy link
Owner Author

simonw commented Aug 19, 2024

Prototype:

diff --git a/sqlite_utils_ask.py b/sqlite_utils_ask.py
index 6e18907..92f2cb4 100644
--- a/sqlite_utils_ask.py
+++ b/sqlite_utils_ask.py
@@ -18,7 +18,8 @@ def register_commands(cli):
     @click.argument("question")
     @click.option("model_id", "-m", "--model", help="LLM model to use")
     @click.option("-v", "--verbose", is_flag=True, help="Verbose output")
-    def ask(path, question, model_id, verbose):
+    @click.option("-p", "--prose", is_flag=True, help="Provide an answer in prose")
+    def ask(path, question, model_id, verbose, prose):
         "Ask a question of your data"
         # Open in read-only mode
         conn = sqlite3.connect("file:{}?mode=ro".format(str(path)), uri=True)
@@ -94,6 +95,15 @@ def register_commands(cli):
             if verbose:
                 click.echo(conversation.responses, err=True)
 
+        if prose:
+            prompt = sql + "\n\n"
+            prompt = "Given these results from that SQL query:\n\n"
+            prompt += json.dumps(results, indent=4, default=repr)
+            prompt += "\n\nAnswer the user's question: " + question
+            for token in model.prompt(prompt):
+                print(token, end="")
+            print()
+
 
 _pattern = r"```sql\n(.*?)\n```"

Demo:

sqlite-utils ask content.db 'what are the most interesting repos?' -p

Output:

{
    "sql": "SELECT\n  repos.name AS name,\n  repos.full_name AS full_name,\n  users.login AS owner,\n  repos.description AS description,\n  repos.stargazers_count,\n  repos.created_at AS created_at,\n  (\n    SELECT\n      SUM(downloads)\n    FROM\n      stats\n    WHERE\n      stats.package = repos.name\n      AND stats.date > DATE('now', '-7 days')\n  ) AS downloads_this_week,\n  (\n    SELECT\n      COUNT(*)\n    FROM\n      plugin_repos\n    WHERE\n      repo = repos.full_name\n  ) AS is_plugin,\n  (\n    SELECT\n      COUNT(*)\n    FROM\n      tool_repos\n    WHERE\n      repo = repos.full_name\n  ) AS is_tool\nFROM\n  repos\nJOIN\n  users ON users.id = repos.owner\nWHERE\n  repos.stargazers_count > (\n    SELECT AVG(stargazers_count) FROM repos\n  )\nORDER BY\n  repos.stargazers_count DESC;",
    "results": [
        {
            "name": "datasette",
            "full_name": "simonw/datasette",
            "owner": "simonw",
            "description": "An open source multi-tool for exploring and publishing data",
            "stargazers_count": 9182,
            "created_at": "2017-10-23T00:39:03Z",
            "downloads_this_week": 4643,
            "is_plugin": 0,
            "is_tool": 0
        },
        {
            "name": "llm",
            "full_name": "simonw/llm",
            "owner": "simonw",
            "description": "Access large language models from the command-line",
            "stargazers_count": 3512,
            "created_at": "2023-04-01T21:16:57Z",
            "downloads_this_week": 1125,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-utils",
            "full_name": "simonw/sqlite-utils",
            "owner": "simonw",
            "description": "Python CLI utility and library for manipulating SQLite databases",
            "stargazers_count": 1575,
            "created_at": "2018-07-14T03:21:46Z",
            "downloads_this_week": 21452,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-vss",
            "full_name": "asg017/sqlite-vss",
            "owner": "asg017",
            "description": "A SQLite extension for efficient vector search, based on Faiss!",
            "stargazers_count": 1494,
            "created_at": "2023-01-03T23:27:31Z",
            "downloads_this_week": 1203,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "shot-scraper",
            "full_name": "simonw/shot-scraper",
            "owner": "simonw",
            "description": "A command-line utility for taking automated screenshots of websites",
            "stargazers_count": 1490,
            "created_at": "2022-03-08T21:21:02Z",
            "downloads_this_week": 1826,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "csvs-to-sqlite",
            "full_name": "simonw/csvs-to-sqlite",
            "owner": "simonw",
            "description": "Convert CSV files into a SQLite database",
            "stargazers_count": 871,
            "created_at": "2017-11-13T06:38:21Z",
            "downloads_this_week": 77,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "django-sql-dashboard",
            "full_name": "simonw/django-sql-dashboard",
            "owner": "simonw",
            "description": "Django app for building dashboards using raw SQL queries",
            "stargazers_count": 420,
            "created_at": "2021-03-13T03:38:23Z",
            "downloads_this_week": 412,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-lines",
            "full_name": "asg017/sqlite-lines",
            "owner": "asg017",
            "description": "A SQLite extension for reading large files line-by-line (NDJSON, logs, txt, etc.)",
            "stargazers_count": 374,
            "created_at": "2022-04-25T19:11:46Z",
            "downloads_this_week": 38,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "github-to-sqlite",
            "full_name": "dogsheep/github-to-sqlite",
            "owner": "dogsheep",
            "description": "Save data from GitHub to a SQLite database",
            "stargazers_count": 361,
            "created_at": "2019-09-08T02:50:28Z",
            "downloads_this_week": 62,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-html",
            "full_name": "asg017/sqlite-html",
            "owner": "asg017",
            "description": "A SQLite extension for querying, manipulating, and creating HTML elements.",
            "stargazers_count": 344,
            "created_at": "2021-10-11T21:15:19Z",
            "downloads_this_week": 16,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "db-to-sqlite",
            "full_name": "simonw/db-to-sqlite",
            "owner": "simonw",
            "description": "CLI tool for exporting tables or queries from any SQL database to a SQLite file",
            "stargazers_count": 279,
            "created_at": "2019-01-17T04:16:48Z",
            "downloads_this_week": 279,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "twitter-to-sqlite",
            "full_name": "dogsheep/twitter-to-sqlite",
            "owner": "dogsheep",
            "description": "Save data from Twitter to a SQLite database",
            "stargazers_count": 269,
            "created_at": "2019-09-03T19:30:08Z",
            "downloads_this_week": 17,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "ttok",
            "full_name": "simonw/ttok",
            "owner": "simonw",
            "description": "Count and truncate text based on tokens",
            "stargazers_count": 231,
            "created_at": "2023-05-18T18:22:59Z",
            "downloads_this_week": 13,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "geocode-sqlite",
            "full_name": "eyeseast/geocode-sqlite",
            "owner": "eyeseast",
            "description": "Geocode rows in a SQLite database table",
            "stargazers_count": 223,
            "created_at": "2020-09-06T21:05:39Z",
            "downloads_this_week": 42,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "openai-to-sqlite",
            "full_name": "simonw/openai-to-sqlite",
            "owner": "simonw",
            "description": "Save OpenAI API results to a SQLite database",
            "stargazers_count": 211,
            "created_at": "2023-01-03T01:14:22Z",
            "downloads_this_week": 19,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "csv-diff",
            "full_name": "simonw/csv-diff",
            "owner": "simonw",
            "description": "Python CLI tool and library for diffing CSV and JSON files",
            "stargazers_count": 198,
            "created_at": "2019-03-13T01:11:26Z",
            "downloads_this_week": 8225,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "s3-credentials",
            "full_name": "simonw/s3-credentials",
            "owner": "simonw",
            "description": "A tool for creating credentials for accessing S3 buckets",
            "stargazers_count": 183,
            "created_at": "2021-11-02T20:09:50Z",
            "downloads_this_week": 36,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "sqlite-http",
            "full_name": "asg017/sqlite-http",
            "owner": "asg017",
            "description": "A SQLite extension for making HTTP requests purely in SQL",
            "stargazers_count": 182,
            "created_at": "2021-10-11T21:27:49Z",
            "downloads_this_week": 23,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "git-history",
            "full_name": "simonw/git-history",
            "owner": "simonw",
            "description": "Tools for analyzing Git history using SQLite",
            "stargazers_count": 172,
            "created_at": "2021-11-11T20:07:06Z",
            "downloads_this_week": 147,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "google-drive-to-sqlite",
            "full_name": "simonw/google-drive-to-sqlite",
            "owner": "simonw",
            "description": "Create a SQLite database containing metadata from Google Drive",
            "stargazers_count": 133,
            "created_at": "2022-02-16T02:16:29Z",
            "downloads_this_week": 4,
            "is_plugin": 0,
            "is_tool": 1
        },
        {
            "name": "datasette-extract",
            "full_name": "datasette/datasette-extract",
            "owner": "datasette",
            "description": "Import unstructured data (text and images) into structured tables",
            "stargazers_count": 133,
            "created_at": "2023-08-15T19:57:04Z",
            "downloads_this_week": 41,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "sqlite-regex",
            "full_name": "asg017/sqlite-regex",
            "owner": "asg017",
            "description": "A fast regular expression SQLite extension, written in Rust",
            "stargazers_count": 132,
            "created_at": "2022-11-04T18:31:01Z",
            "downloads_this_week": 209,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "datasette-dashboards",
            "full_name": "rclement/datasette-dashboards",
            "owner": "rclement",
            "description": "Datasette plugin providing data dashboards from metadata",
            "stargazers_count": 130,
            "created_at": "2021-04-23T21:56:48Z",
            "downloads_this_week": 123,
            "is_plugin": 1,
            "is_tool": 0
        },
        {
            "name": "dogsheep-photos",
            "full_name": "dogsheep/dogsheep-photos",
            "owner": "dogsheep",
            "description": "Upload your photos to S3 and import metadata about them into a SQLite database",
            "stargazers_count": 124,
            "created_at": "2020-04-18T19:22:13Z",
            "downloads_this_week": 2,
            "is_plugin": 0,
            "is_tool": 1
        }
    ]
}

Followed by:

To determine the most interesting repositories from the provided data, we can consider a combination of factors such as:

  1. Stargazer Count: A higher number of stars indicates popularity and interest from the community.
  2. Downloads This Week: This metric reflects current usage and relevance.
  3. Recency: More recent creations might indicate active development or relevance to current trends.
  4. Description: The purpose and utility of the repository can also be a factor in determining its interest level.

Based on these factors, here are some of the most interesting repositories:

  1. datasette (simonw/datasette)

    • Stars: 9182
    • Downloads This Week: 4643
    • Description: An open source multi-tool for exploring and publishing data.
    • Created At: 2017-10-23
  2. sqlite-utils (simonw/sqlite-utils)

    • Stars: 1575
    • Downloads This Week: 21452
    • Description: Python CLI utility and library for manipulating SQLite databases.
    • Created At: 2018-07-14
  3. llm (simonw/llm)

    • Stars: 3512
    • Downloads This Week: 1125
    • Description: Access large language models from the command-line.
    • Created At: 2023-04-01
  4. csv-diff (simonw/csv-diff)

    • Stars: 198
    • Downloads This Week: 8225
    • Description: Python CLI tool and library for diffing CSV and JSON files.
    • Created At: 2019-03-13
  5. shot-scraper (simonw/shot-scraper)

    • Stars: 1490
    • Downloads This Week: 1826
    • Description: A command-line utility for taking automated screenshots of websites.
    • Created At: 2022-03-08
  6. datasette-extract (datasette/datasette-extract)

    • Stars: 133
    • Downloads This Week: 41
    • Description: Import unstructured data (text and images) into structured tables.
    • Created At: 2023-08-15
  7. geocode-sqlite (eyeseast/geocode-sqlite)

    • Stars: 223
    • Downloads This Week: 42
    • Description: Geocode rows in a SQLite database table.
    • Created At: 2020-09-06

These repositories stand out due to their functionality, usage trends, and community engagement, making them particularly interesting.

That SQL query in full:

SELECT
  repos.name AS name,
  repos.full_name AS full_name,
  users.login AS owner,
  repos.description AS description,
  repos.stargazers_count,
  repos.created_at AS created_at,
  (
    SELECT
      SUM(downloads)
    FROM
      stats
    WHERE
      stats.package = repos.name
      AND stats.date > DATE('now', '-7 days')
  ) AS downloads_this_week,
  (
    SELECT
      COUNT(*)
    FROM
      plugin_repos
    WHERE
      repo = repos.full_name
  ) AS is_plugin,
  (
    SELECT
      COUNT(*)
    FROM
      tool_repos
    WHERE
      repo = repos.full_name
  ) AS is_tool
FROM
  repos
JOIN
  users ON users.id = repos.owner
WHERE
  repos.stargazers_count > (
    SELECT AVG(stargazers_count) FROM repos
  )
ORDER BY
  repos.stargazers_count DESC;

@simonw
Copy link
Owner Author

simonw commented Aug 19, 2024

Definitely need a sensible limit on how much data we fire back into the model there! That SQL query didn't have a limit clause, could have burned a lot of tokens.

simonw added a commit that referenced this issue Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant