# GPT Action Library: AWS RedShift

## Introduction

This page provides an instruction & guide for developers building a GPT Action for a specific application. Before you proceed, make sure to first familiarize yourself with the following information: 
- [Introduction to GPT Actions](https://platform.openai.com/docs/actions)
- [Introduction to GPT Actions Library](https://platform.openai.com/docs/actions/actions-library)
- [Example of Building a GPT Action from Scratch](https://platform.openai.com/docs/actions/getting-started)

This solution enables a GPT action to retrieve data from Redshift and perform data analysis.It uses AWS Functions, performing every action from AWS ecosystem and network. The middleware (AWS function) will perform the SQL query, wait for its completion and return the data as a file. The code is provided for information purpose only and should be modified to your needs.

This solution uses the ability to [retrieve files in Actions](https://platform.openai.com/docs/actions/sending-files) and use them as if you had uploaded them directly to a conversation.

This solution highlight a connection to Redshift serverless, the integration with a provisioned Redshift might differ slighltly to retrieve networks and set-up connection, the overall code and integration will be very similar.

### Value + Example Business Use Cases

**Value**: Users can now leverage ChatGPT's natural language capability to connect directly to Redshift's DWH.

**Example Use Cases**:
- Data scientists can connect to tables and run data analyses using ChatGPT's Data Analysis
- Citizen data users can ask basic questions of their transactional data
- Users gain more visibility into their data & potential anomalies

## Application Information

### Application Prerequisites

Before you get started, make sure that:
- You have access to a Redshift environment
- You have the right to deploy AWS function in the same VPC (Virtual Private Network)
- Your AWS CLI is authenticated

## Middleware Information

### Install required libraries
- Install AWS CLI, required for AWS SAM ([docs](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html#getting-started-install-instructions))
- Install AWS SAM CLI ([docs](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html))
- Install Python
- Install yq [docs](https://github.com/mikefarah/yq?tab=readme-ov-file#install)

### Middleware function

You can either deploy directly this app [https://github.com/openai/redshift-middleware](https://github.com/openai/redshift-middleware), with your RedShift credentials as parameters or you can build your own SAM application (or any middleware), add the code of your function, and add psycog2 binary directly by following the steps in appendix.

> This code is meant to be directional - while it should work out of the box, it is designed to be customized to your needs (see examples towards the end of this document).

#### AWS SAM
To ease the deployment of the lambda function and its dependencies, we're using SAM (Serverless Application Model), a framework that AWS supports with a CloudFormation set-up.

### Retrieve VPC information

We will need to connnect our function to Redshift, therefore we need to find the network used by Redshift. You can find this on your Redshift interface the AWS console, under Amazon Redshift Serverless > Workgroup configuration > <your_workgroup> > Data access, or through the CLI:

In [10]:
! aws redshift-serverless get-workgroup --workgroup-name default-workgroup --query 'workgroup.{address: endpoint.address, port: endpoint.port, SecurityGroupIds: securityGroupIds, SubnetIds: subnetIds}'

{
    "address": "default-workgroup.014498629922.us-east-1.redshift-serverless.amazonaws.com",
    "port": 5439,
    "SecurityGroupIds": [
        "sg-027f8ddcf8733965c"
    ],
    "SubnetIds": [
        "subnet-0645bb03fff5f1514",
        "subnet-07b9339b8c119c45d",
        "subnet-0355094e218ea2788",
        "subnet-019a4d0836fb4024f",
        "subnet-09f25a9cdc349c779",
        "subnet-0c9ba9219bb590152"
    ]
}


### Set up AWS function

Copy `env.sample.yaml` to `env.yaml` and replace with the values obtained above. You will need a Redshift user with access to your DB/schema.

```
git clone https://github.com/openai/redshift-middleware
cd redshift-middleware
cp env

PARAM_FILE="env.yaml"
PARAMS=$(yq eval -o=json $PARAM_FILE | jq -r 'to_entries | map("\(.key)=\(.value|tostring)") | join(" ")')
sam deploy --template-file template.yaml --stack-name redshift-middleware --capabilities CAPABILITY_IAM --parameter-overrides $PARAMS

Retrieve the URL information, you can then try a cURL request:

In [7]:
! curl -X POST https://10o5fvtsr1.execute-api.us-east-1.amazonaws.com/Prod/sql_statement/ \
-H "Content-Type: application/json" \
-d '{ "sql_statement": "SELECT * FROM customers LIMIT 10", "workgroup_name": "default-workgroup", "database_name": "pap-db" }'

{"openaiFileResponse": [{"name": "query_result.json", "mime_type": "application/json", "content": "W1sxMDAxLCAiQ3VzdG9tZXJfMTAwMSIsICJjdXN0b21lcjEwMDFAZXhhbXBsZS5jb20iLCAiNTU1LTAxMSIsICIxMjM0IEVsbSBTdCwgQ2l0eV8xLCBTdGF0ZV8xIiwgIlJldHVybmluZyJdLCBbMTAwMiwgIkN1c3RvbWVyXzEwMDIiLCAiY3VzdG9tZXIxMDAyQGV4YW1wbGUuY29tIiwgIjU1NS0wMTIiLCAiMTIzNCBFbG0gU3QsIENpdHlfMiwgU3RhdGVfMiIsICJOZXciXSwgWzEwMDMsICJDdXN0b21lcl8xMDAzIiwgImN1c3RvbWVyMTAwM0BleGFtcGxlLmNvbSIsICI1NTUtMDEzIiwgIjEyMzQgRWxtIFN0LCBDaXR5XzMsIFN0YXRlXzMiLCAiTmV3Il0sIFsxMDA0LCAiQ3VzdG9tZXJfMTAwNCIsICJjdXN0b21lcjEwMDRAZXhhbXBsZS5jb20iLCAiNTU1LTAxNCIsICIxMjM0IEVsbSBTdCwgQ2l0eV80LCBTdGF0ZV80IiwgIk5ldyJdLCBbMTAwNSwgIkN1c3RvbWVyXzEwMDUiLCAiY3VzdG9tZXIxMDA1QGV4YW1wbGUuY29tIiwgIjU1NS0wMTUiLCAiMTIzNCBFbG0gU3QsIENpdHlfNSwgU3RhdGVfMCIsICJSZXR1cm5pbmciXSwgWzEwMDYsICJDdXN0b21lcl8xMDA2IiwgImN1c3RvbWVyMTAwNkBleGFtcGxlLmNvbSIsICI1NTUtMDE2IiwgIjEyMzQgRWxtIFN0LCBDaXR5XzYsIFN0YXRlXzEiLCAiTmV3Il0sIFsxMDA3LCAiQ3VzdG9tZXJfMTAwNyIsICJjdXN0b21lcjEwMDdAZXhhbXBsZ

## ChatGPT Steps

### Custom GPT Instructions 

Once you've created a Custom GPT, copy the text below in the Instructions panel.

In [None]:
**Context**: You are an expert at writing Redshift SQL queries. A user is going to ask you a question. 

**Instructions**:
1. No matter the user's question, start by running `runQuery` operation using this query: "SELECT table_name, column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE table_schema = 'public' ORDER BY table_name, ordinal_position;" 
2. Convert the user's question into a SQL statement that leverages the step above and run the `runQuery` operation on that SQL statement to confirm the query works.
3. Return back the query for the user to see

**Additional Notes**: If the user says "Let's get started", explain they can ask a question they want answered about data that we have access to. If the user has no ideas, suggest that we have transactions data they can query - ask if they want you to query that

### OpenAPI Schema 

Once you've created a Custom GPT, copy the text below in the Actions panel.

This expects a response that matches the file retrieval structure in our doc [here](https://platform.openai.com/docs/actions/sending-files) and passes in a `query` as a parameter to execute.
>Make sure to switch the function app name based on your function deployment

In [None]:
openapi: 3.1.0
info:
  title: SQL Execution API
  description: API to execute SQL statements and return results as a file.
  version: 1.0.0
servers:
  - url: {your_function_url}/Prod
    description: Production server
paths:
  /sql_statement:
    post:
      operationId: executeSqlStatement
      summary: Executes a SQL statement and returns the result as a file.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                sql_statement:
                  type: string
                  description: The SQL statement to execute.
                  example: SELECT * FROM customers LIMIT 10
              required:
                - sql_statement
      responses:
        '200':
          description: The SQL query result as a JSON file.
          content:
            application/json:
              schema:
                type: object
                properties:
                  openaiFileResponse:
                    type: array
                    items:
                      type: object
                      properties:
                        name:
                          type: string
                          description: The name of the file.
                          example: query_result.json
                        mime_type:
                          type: string
                          description: The MIME type of the file.
                          example: application/json
                        content:
                          type: string
                          description: The base64 encoded content of the file.
                          format: byte
                          example: eyJrZXkiOiJ2YWx1ZSJ9
        '500':
          description: Error response
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: Error message.
                    example: Database query failed error details


## Considerations

- No authentication was set-up, we recommend setting an authentication on your deployed function so only authenticated user can execute code.

## Appendix

Installing psycog2 in your own directory before deploying:
```
mkdir -p lambda_layer/python
pip install psycopg2-binary -t lambda_layer/python
cd lambda_layer
zip -r ../lambda_layer.zip .
cd ..
```

*Are there integrations that you’d like us to prioritize? Are there errors in our integrations? File a PR or issue in our github, and we’ll take a look.*
