<a href="https://colab.research.google.com/github/richardwhiteii/erika-cardenas_recipes/edit/main/generative-search/generative_search_openai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Dependencies

In [None]:
!pip install weaviate-client

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting weaviate-client
  Downloading weaviate_client-3.19.2-py3-none-any.whl (99 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.8/99.8 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting requests<2.29.0,>=2.28.0 (from weaviate-client)
  Downloading requests-2.28.2-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.8/62.8 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting validators<=0.21.0,>=0.18.2 (from weaviate-client)
  Downloading validators-0.20.0.tar.gz (30 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting authlib>=1.1.0 (from weaviate-client)
  Downloading Authlib-1.2.0-py2.py3-none-any.whl (214 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.8/214.8 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: validators
  Bui

## Configuration

In [None]:
import weaviate
import json

client = weaviate.Client(
  url="WEAVIATE-INSTANCE-URL",  # URL of your Weaviate instance
  auth_client_secret=weaviate.AuthApiKey(api_key="AUTH-KEY"), # (Optional) If the Weaviate instance requires authentication
  additional_headers={
    "X-OpenAI-Api-Key": "OPENAI-API-KEY", # Replace with your OpenAI key
  }
)

client.schema.get()  # Get the schema to test connection

## Schema

In [None]:
# resetting the schema. CAUTION: THIS WILL DELETE YOUR DATA 
client.schema.delete_all()

schema = {
   "classes": [
       {
           "class": "JeopardyQuestion",
           "description": "List of jeopardy questions",
           "vectorizer": "text2vec-openai",
           "moduleConfig": { # specify the model you want to use
               "generative-openai": { 
                    "model": "gpt-3.5-turbo",  # Optional - Defaults to `gpt-3.5-turbo`
                }
           },
           "properties": [
               {
                  "name": "Category",
                  "dataType": ["text"],
                  "description": "Category of the question",
               },
               {
                  "name": "Question",
                  "dataType": ["text"],
                  "description": "The question",
               },
               {
                  "name": "Answer",
                  "dataType": ["text"],
                  "description": "The answer",
                }
            ]
        }
    ]
}

client.schema.create(schema)

print("Successfully created the schema.")

Successfully created the schema.


## Import the Data

In [None]:
import requests
url = 'https://raw.githubusercontent.com/weaviate/weaviate-examples/main/jeopardy_small_dataset/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

if client.is_ready():

# Configure a batch process
  with client.batch as batch:
      batch.batch_size=100
      # Batch import all Questions
      for i, d in enumerate(data):
          print(f"importing question: {i+1}")

          properties = {
              "answer": d["Answer"],
              "question": d["Question"],
              "category": d["Category"],
          }

          client.batch.add_data_object(properties, "JeopardyQuestion")
else:
  print("The Weaviate cluster is not connected.")

importing question: 1
importing question: 2
importing question: 3
importing question: 4
importing question: 5
importing question: 6
importing question: 7
importing question: 8
importing question: 9
importing question: 10


## Generative Search Queries

### Single Result

Single Result makes a generation for each individual search result. 

In the below example, I want to create a Facebook ad from the Jeopardy question about Elephants. 

In [None]:
generatePrompt = "Turn the following Jeogrady question into a Facebook Ad: {question}"

result = (
  client.query
  .get("JeopardyQuestion", ["question"])
  .with_generate(single_prompt = generatePrompt)
  .with_near_text({
    "concepts": ["Elephants"]
  })
  .with_limit(1)
).do()

print(json.dumps(result, indent=1))

{
 "data": {
  "Get": {
   "JeopardyQuestion": [
    {
     "_additional": {
      "generate": {
       "error": null,
       "singleResult": "Attention animal lovers! Did you know that there is only one living mammal in the order Proboseidea? Discover more fascinating facts about this unique creature on Jeogrady. Click now to learn more! \ud83d\udc18\ud83c\udf0d #Jeogrady #AnimalFacts #Proboseidea #Mammals #Wildlife #NatureLovers"
      }
     },
     "question": "It's the only living mammal in the order Proboseidea"
    }
   ]
  }
 }
}


### Grouped Result

Grouped Result generates a single response from all the search results. 

The below example is creating a Facebook ad from the 3 retrieved Jeoprady questions about animals. 

In [None]:
generateTask = "Explain why these Jeopardy questions are under the Animals category."

result = (
  client.query
  .get("JeopardyQuestion", ["question"])
  .with_generate(grouped_task = generateTask)
  .with_near_text({
    "concepts": ["Animals"]
  })
  .with_limit(3)
).do()

print(json.dumps(result, indent=1))

{
 "data": {
  "Get": {
   "JeopardyQuestion": [
    {
     "_additional": {
      "generate": {
       "error": null,
       "groupedResult": "The first two Jeopardy questions are under the Animals category because they both refer to specific animals - the elephant and the gavial. The third question is actually under the Science category, not Animals, because it refers to the classification of a specific type of bird."
      }
     },
     "question": "It's the only living mammal in the order Proboseidea"
    },
    {
     "_additional": {
      "generate": null
     },
     "question": "The gavial looks very much like a crocodile except for this bodily feature"
    },
    {
     "_additional": {
      "generate": null
     },
     "question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification"
    }
   ]
  }
 }
}
