Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial design #1

Closed
simonw opened this issue Mar 24, 2023 · 20 comments
Closed

Initial design #1

simonw opened this issue Mar 24, 2023 · 20 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Mar 24, 2023

Goal is to have a plugin that turns any Datasette instance into a ChatGPT plugin - providing instructions for ChatGPT to execute SQL queries against it.

@simonw simonw added the enhancement New feature or request label Mar 24, 2023
@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Documentation: https://platform.openai.com/docs/plugins/introduction

Mainly needs to provide a /.well-known/ai-plugin.json that points to a OpenAPI specification.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

I got ChatGPT to build me this OpenAPI specification:

openapi: 3.0.0
info:
  title: Datasette API
  version: 1.0.0
paths:
  /fixtures.json:
    get:
      summary: Retrieve data from the facetable
      description: This endpoint allows users to execute SQL queries on the facetable and retrieve the results as an array of JSON objects.
      parameters:
        - name: sql
          in: query
          description: The SQL query to be executed on the facetable.
          required: true
          schema:
            type: string
        - name: _shape
          in: query
          description: The shape of the response data. Set to "array" to receive an array of JSON objects.
          required: true
          schema:
            type: string
            enum:
              - array
      responses:
        '200':
          description: Successful operation
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error

From this prompt:

Write an OpenAPI schema explaining the https://latest.datasette.io/fixtures.json?sql=select+*+from+facetable&_shape=array GET API which accepts SQL and returns an array of JSON objects

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Initial prototype (in plugins/chatgpt.py on Glitch):

from datasette import hookimpl, Response

SCHEMA = """
openapi: 3.0.0
info:
  title: Datasette API
  version: 1.0.0
paths:
  /fixtures.json:
    get:
      summary: Execute SQL against the fixtures database
      description: This endpoint allows users to execute SQLite SQL queries and retrieve the results as an array of JSON objects.
      parameters:
        - name: sql
          in: query
          description: The SQL query to be executed
          required: true
          schema:
            type: string
        - name: _shape
          in: query
          description: The shape of the response data. Must be "array"
          required: true
          schema:
            type: string
            enum:
              - array
      responses:
        '200':
          description: Successful SQL results
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error
"""

AI_PLUGIN = {
    "schema_version": "v1",
    "name_for_model": "sql-fixtures",
    "name_for_human": "SQL against fixtures",
    "description_for_model": "Run SQL queries against the fixtures database.",
    "description_for_human": "Run SQL against Datasette.",
    "api": {
      "type": "openapi",
      "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
      "has_user_authentication": False
    },
    "contact_email": "hello@contact.com", 
    "legal_info_url": "hello@legal.com"
}


async def ai_plugin(request):
    return Response.json(AI_PLUGIN)

  
async def openapi_schema(request):
    return Response.text(SCHEMA)


@hookimpl
def register_routes():
    return [
        (r"^/\.well-known/ai-plugin\.json$", ai_plugin),
        (r"^/openapi-schema.yml$", openapi_schema),
    ]

image

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Fixed it:

image

AI_PLUGIN = {
    "schema_version": "v1",
    "name_for_model": "sql_fixtures",
    "name_for_human": "SQL against fixtures",
    "description_for_model": "Run SQL queries against the fixtures database.",
    "description_for_human": "Run SQL against Datasette.",
    "auth": {
      "type": "user_http",
      "authorization_type": "bearer"
    },
    "api": {
      "type": "openapi",
      "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
      "has_user_authentication": False
    },
    "logo_url": "https://avatars.githubusercontent.com/u/126964132?s=400&u=08b2ed680144a4feb421308f09e5f3cc5876211a&v=4",
    "contact_email": "hello@contact.com", 
    "legal_info_url": "hello@legal.com"
}

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

CleanShot 2023-03-23 at 22 10 15@2x

image

I'd like to tell it it doesn't need an auth token.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

image

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

I can't figure out why I'm getting that UnrecognizedFunctionError. The server logs don't show any hits to my API.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Found more documentation: https://platform.openai.com/docs/plugins/getting-started/plugin-manifest - visible on desktop but not I think on mobile.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

I'm trying to include the schema, but I think I broke it:

from datasette import hookimpl, Response
import textwrap

PROMPT = """
Run SQLite queries against the content SQL database.
Use 'select * from sqlite_master' to see the list of tables.
Use 'select * from pragma_table_info("name_of_table")' to see the columns in a table
NEVER use 'PRAGMA table_info' or any other PRAGMA - always use 'pragma_table_info()' and suchlike instead
"""

def make_openapi_schema(prompt):
    return """
openapi: 3.0.1
info:
  title: Datasette fixtures plugin
  description: A plugin that allows ChatGPT to run SQL queries against a Datasette SQLite database
  version: 'v1'
servers:
  - url: https://cedar-antique-cosmos.glitch.me
paths:
  /content.json:
    get:
      operationId: query
      summary: Execute a SQLite SQL query against the content database
      description: |-
{}
      parameters:
      - name: sql
        in: query
        description: The SQL query to be executed
        required: true
        schema:
          type: string
      - name: _shape
        in: query
        description: The shape of the response data. Must be "array"
        required: true
        schema:
          type: string
          enum:
            - array
      responses:
        '200':
          description: Successful SQL results
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error
""".format(textwrap.indent(prompt, '        '))

def ai_plugin_schema(extra_prompt):
    return {
        "schema_version": "v1",
        "name_for_model": "datasette_content",
        "name_for_human": "SQL against content",
        "description_for_model": PROMPT,
        "description_for_human": "Run SQL against content database in Datasette.",
        "auth": {
            "type": "none"
        },
        "api": {
          "type": "openapi",
          "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
          "has_user_authentication": False
        },
        "logo_url": "https://avatars.githubusercontent.com/u/126964132?s=400&u=08b2ed680144a4feb421308f09e5f3cc5876211a&v=4",
        "contact_email": "hello@contact.com", 
        "legal_info_url": "hello@legal.com"
    }


async def ai_plugin():
    return Response.json(ai_plugin_schema(""))

  
async def build_schema(datasette):
    database = [value for key, value in datasette.databases.items() if not key.startswith("_")][0]
    return (await database.execute("select group_concat(sql, ';') from sqlite_master")).single_value()
  
  
async def openapi_schema(datasette):
    return Response.text(make_openapi_schema(await build_schema(datasette)))


@hookimpl
def register_routes():
    return [
        (r"^/\.well-known/ai-plugin\.json$", ai_plugin),
        (r"^/openapi-schema.yml$", openapi_schema),
    ]

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Tried uninstalling and reinstalling the plugin and got this:

image

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Oh dear it hallucinated!

image

Despite that query returning the correct results:

image

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

My current working plugin is:

from datasette import hookimpl, Response
import textwrap

PROMPT = """
Run SQLite queries against the content SQL database.
Use 'select * from sqlite_master' to see the list of tables.
Use 'select * from pragma_table_info("name_of_table")' to see the columns in a table
NEVER use 'PRAGMA table_info' or any other PRAGMA - always use 'pragma_table_info()' and suchlike instead
"""

def make_openapi_schema(prompt):
    return """
openapi: 3.0.1
info:
  title: Datasette fixtures plugin
  description: A plugin that allows ChatGPT to run SQL queries against a Datasette SQLite database
  version: 'v1'
servers:
  - url: https://cedar-antique-cosmos.glitch.me
paths:
  /content.json:
    get:
      operationId: query
      summary: Execute a SQLite SQL query against the content database
      description: |-
{}
      parameters:
      - name: sql
        in: query
        description: The SQL query to be executed
        required: true
        schema:
          type: string
      - name: _shape
        in: query
        description: The shape of the response data. Must be "array"
        required: true
        schema:
          type: string
          enum:
            - array
      responses:
        '200':
          description: Successful SQL results
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
        '400':
          description: Bad request
        '500':
          description: Internal server error
""".format(textwrap.indent(prompt, '        '))

def ai_plugin_schema(extra_prompt):
    return {
        "schema_version": "v1",
        "name_for_model": "datasette_content",
        "name_for_human": "SQL against content",
        "description_for_model": PROMPT + ' ' + extra_prompt,
        "description_for_human": "Run SQL against content database in Datasette.",
        "auth": {
            "type": "none"
        },
        "api": {
          "type": "openapi",
          "url": "https://cedar-antique-cosmos.glitch.me/openapi-schema.yml",
          "has_user_authentication": False
        },
        "logo_url": "https://avatars.githubusercontent.com/u/126964132?s=400&u=08b2ed680144a4feb421308f09e5f3cc5876211a&v=4",
        "contact_email": "hello@contact.com", 
        "legal_info_url": "hello@legal.com"
    }


async def ai_plugin(datasette):
    return Response.json(ai_plugin_schema(await available_tables(datasette)))

  
async def build_schema(datasette):
    database = [value for key, value in datasette.databases.items() if not key.startswith("_")][0]
    return (await database.execute("select group_concat(sql, ';') from sqlite_master")).single_value()
  
async def available_tables(datasette):
    database = [value for key, value in datasette.databases.items() if not key.startswith("_")][0]
    return (await database.execute("select group_concat(name, ', ') from sqlite_master where type = 'table'")).single_value()

  
async def openapi_schema():
    return Response.text(make_openapi_schema("Use SQLite syntax"))


@hookimpl
def register_routes():
    return [
        (r"^/\.well-known/ai-plugin\.json$", ai_plugin),
        (r"^/openapi-schema.yml$", openapi_schema),
    ]

The schema proved too large so I'm sending a list of tables instead.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

OK, this prototype is working well enough that I'm going to release it as an alpha.

Schema discovery still isn't great - it keeps trying to run those PRAGMA commands. Maybe I should give it an explicit API for that, or perhaps enable PRAGMA table_info since it's so keen to use that.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

https://platform.openai.com/docs/plugins/getting-started/plugin-manifest limits:

  • 50 character max for name_for_human
  • 50 character max for name_for_model
  • 120 character max for description_for_human
  • 8000 character max just for description_for_model (will decrease over time)

It's not clear to me how important name_for_model is. Since this Datasette plugin could be installed on many instances I don't think it should be the same for each one, and it looks like hyphens are not allowed but underscores are.

So maybe datasette_latest_datasette_io would be a good name, using the server hostname with underscores?

What if that's longer than 50 characters?

>>> len("datasette_global_power_plants_datasettes_com")
44
>>> len("datasette_congress_legislators_datasettes.com")
45

So I'm already getting close with some of my longer named examples.

I'm tempted to go with datasette_as_much_as_possible_hexhash - ending in a 6 character hash of the whole value. That should give me good uniqueness without busting the length limit.

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Removing this code since I'm not calling it:

async def build_schema(datasette):
    database = [
        value for key, value in datasette.databases.items() if not key.startswith("_")
    ][0]
    return (
        await database.execute("select group_concat(sql, ';') from sqlite_master")
    ).single_value()


async def available_tables(datasette):
    database = [
        value for key, value in datasette.databases.items() if not key.startswith("_")
    ][0]
    return (
        await database.execute(
            "select group_concat(name, ', ') from sqlite_master where type = 'table'"
        )
    ).single_value()

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

I don't see OpenAI checking for updates to my ai-plugin.json file - so I'm going to try a fresh deploy with a new URL, since I also can't see how to uninstall and reinstall a plugin to trigger that refresh.

Tried deploying a new Glitch instance and got this:

image

But I caught it doing a fetch() to an API that failed to return my new plugin:

image

Figured that out:

image

You need to hit "Develop your own plugin" and not "Install an unverified plugin".

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

CleanShot 2023-03-24 at 07 09 22@2x

@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

OK! The latest code deployed fresh does indeed seem to work - and doesn't try to run the PRAGMA table_info things any more so it seems to be obeying the latest prompt.

simonw added a commit that referenced this issue Mar 24, 2023
simonw added a commit that referenced this issue Mar 24, 2023
Also updated setup.py and applied Black
@simonw simonw closed this as completed in ebf8d0b Mar 24, 2023
@simonw
Copy link
Owner Author

simonw commented Mar 24, 2023

Blogged it here: https://simonwillison.net/2023/Mar/24/datasette-chatgpt-plugin/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant