# Weaviate Quickstart

[![Google Colab](https://img.shields.io/badge/Google%20-%20Colab%20-%20%23F9AB00?style=for-the-badge&logo=googlecolab)](link-to-google-colab)

Welcome to the **Weaviate Jupyter Notebook Quickstart**! 🚀

In less than 30 minutes, you’ll set up Weaviate, load data, and explore basic semantic search and Retrieval-Augmented Generation (RAG) with OpenAI.  
By the end, you will learn how to:
1. [Run the Weaviate vector database with](#Step-1:-Set-up-Weaviate):
    1. [Weaviate Cloud (WCD)](#1.1-Connect-to-a-Weaviate-Cloud-instance) or
    2. [locally using Docker](#1.2-Run-Weaviate-locally-using-Docker)
2. [Populate the database and generate vector embeddings](#Step-2:-Populate-the-database)
3. [Perform a semantic search and retrieval augmented generation (RAG)](#Step-3:-Queries)

## Prerequisites

To run this Jupyter Notebook, you will need the `weaviate-client` Python library. You can install it with: 

```
pip install -U weaviate-client
```

You will also need an **[OpenAI API key](https://platform.openai.com/api-keys)** to use the OpenAI model to generate embeddings from text. 

You also need to choose if you will run Weaviate **[locally using Docker](#1.2-Run-Weaviate-locally-using-Docker)** or create a free Weaviate Cloud instance and **[connect to it](#1.1-Connect-to-a-Weaviate-Cloud-instance)**. 


## Step 1: Set up Weaviate

### 1.1 Connect to a Weaviate Cloud instance

You can follow this guide to set up a new Weaviate Cloud instance:
* [Create a Weaviate Cloud database](https://weaviate.io/developers/weaviate/quickstart#11-create-a-weaviate-database)

After the  cluster is created, you can grab the [REST Endpoint URL and the Admin API key](https://weaviate.io/developers/weaviate/quickstart#13-connect-to-weaviate).  
You should store your credentials in environment variables as suggested in the code below: 

In [6]:
import weaviate
from weaviate.classes.init import Auth
import os

# Best practice: store your credentials in environment variables
wcd_url = os.environ["WCD_URL"]
wcd_api_key = os.environ["WCD_API_KEY"]
openai_api_key=  os.environ["OPENAI_API_KEY"]


client = weaviate.connect_to_weaviate_cloud(
    cluster_url=wcd_url,                                    # Replace with your Weaviate Cloud URL
    auth_credentials=Auth.api_key(wcd_api_key),             # Replace with your Weaviate Cloud key
    headers={"X-OpenAI-Api-Key": openai_api_key}, 
)

client.collections.delete_all()  # Clear the database

print(client.is_ready())  # Should print: `True`


True


### 1.2 Run Weaviate locally using Docker

You can start a Weaviate database instance by running the following command: 

```
docker run -p 8080:8080 -p 50051:50051 -e ENABLE_MODULES='text2vec-openai,generative-openai' cr.weaviate.io/semitechnologies/weaviate:1.27.1
```

This command will also set the environment variables `text2vec-openai` and `generative-openai` inside the container which are needed to enable the vectorizer and generative integrations for the OpenAI model.

In [11]:
# Uncomment this code in order to connect to a locally running Weaviate instance

"""
import weaviate
import os

openai_api_key = os.environ["OPENAI_API_KEY"]


client = weaviate.connect_to_local(
    headers={"X-OpenAI-Api-Key": openai_api_key}
)

client.collections.delete_all()  # Clear the database

print(client.is_ready())  # Should print: `True`

"""

True


## Step 2: Populate the database
Now, we can populate our database by first defining a collection and then adding data.  
The dataset we will use in this example consists of Jeopardy questions with categories and answers:

```json
[
   {
      "Category":"SCIENCE",
      "Question":"This organ removes excess glucose from the blood & stores it as glycogen",
      "Answer":"Liver"
   },
   {
      "Category":"ANIMALS",
      "Question":"It's the only living mammal in the order Proboseidea",
      "Answer":"Elephant"
   }...
]
```

### 2.1 Define a collection

> 💡 **What is a collection?** A collection is a set of objects that share the same data structure, like a table in relational databases or a collection in NoSQL databases. A collection also includes additional configurations that define how the data objects are stored and indexed.

The following example creates a collection called `Question` with:

* OpenAI [embedding model integration](https://weaviate.io/developers/weaviate/model-providers/openai/embeddings) to create vectors during ingestion & queries.
* OpenAI [generative AI integrations](https://weaviate.io/developers/weaviate/model-providers/openai/generative) for retrieval augmented generation (RAG).

Run this code to create the collection to which you can add data:

In [7]:
from weaviate.classes.config import Configure

questions = client.collections.create(
    name="Question",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),   # Configure the OpenAI embedding integration
    generative_config=Configure.Generative.openai()             # Configure the OpenAI generative AI integration
)

print(f"Collection has been created: {questions.exists()}")


Collection has been created: True


### 2.2 Add objects

We can now add data to our collection. The following example:
* Loads objects from a JSON file, and
* Adds objects to the target collection `Question` using a batch process.

In [8]:
import requests, json

resp = requests.get(
    "https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json"
)
data = json.loads(resp.text)

questions = client.collections.get("Question")

with questions.batch.dynamic() as batch:
    for d in data:
        uuid = batch.add_object({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })
        print(f"Added new object with UUID: {uuid}")


Added new object with UUID: 9d17c854-df62-4364-aeec-6cf2c7c5240e
Added new object with UUID: 6360ae92-f90d-4862-892f-dacbf4c85f82
Added new object with UUID: febab836-d60b-46f5-9395-3f6a053017d8
Added new object with UUID: 02d010c5-37f0-43a3-ae4f-32aa1500c2da
Added new object with UUID: 445ae819-e8b2-477d-bb31-ddca5de1b0d9
Added new object with UUID: 4b2a9984-0ea9-43f1-b9cb-b958b5fa2ef6
Added new object with UUID: f380f090-31a2-497c-abca-ddd906bfb171
Added new object with UUID: fd137a04-57de-4b1f-9f47-cc6aa330f568
Added new object with UUID: 5a52fc69-8e66-4f99-bb40-4019eb2fb205
Added new object with UUID: 6739d9da-758d-4af7-8709-379c482e41b4


## Step 3: Queries
Weaviate provides a wide range of query tools to help you find the right data. We will try a few searches here.

### 3.1 Semantic search

Semantic search finds results based on meaning. This is called `nearText` in Weaviate.

The following example searches for 2 objects whose meaning is most similar to that of `biology`.

In [9]:
questions = client.collections.get("Question")

response = questions.query.near_text(
    query="biology",
    limit=2
)

for obj in response.objects:
    print(json.dumps(obj.properties, indent=2))


{
  "answer": "DNA",
  "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance",
  "category": "SCIENCE"
}
{
  "answer": "species",
  "question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification",
  "category": "SCIENCE"
}


**Where did the vectors come from?**  
Weaviate used the OpenAI API key to generate a vector embedding for each object during import. During the query, Weaviate similarly converted the query (`biology`) into a vector. If you would prefer to provide your own vectors, check out [Starter Guide: Bring Your Own Vectors](https://weaviate.io/developers/weaviate/starter-guides/custom-vectors).

> 💡 **More search types available**: Weaviate is capable of many types of searches. See, for example, our how-to guides on [similarity searches](https://weaviate.io/developers/weaviate/search/similarity), [keyword searches](https://weaviate.io/developers/weaviate/search/bm25), [hybrid searches](https://weaviate.io/developers/weaviate/search/hybrid), and [filtered searches](https://weaviate.io/developers/weaviate/search/filters).

## 3.2 Retrieval augmented generation
Retrieval augmented generation (RAG), also called generative search, combines the power of generative AI models such as large language models (LLMs) with the up-to-date truthfulness of a database. RAG works by prompting a large language model with a combination of a user query and data retrieved from a database.

The following example combines the same search (for `biology`) with a prompt to generate a tweet:

In [10]:
questions = client.collections.get("Question")

response = questions.generate.near_text(
    query="biology",
    limit=2,
    grouped_task="Write a tweet with emojis about these facts."
)

print(response.generated)  # Inspect the generated text

🧬 In 1953 Watson & Crick built a model of the molecular structure of DNA, the gene-carrying substance! 🧬🔬

🦢 2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new species! 🦢🌿 #ScienceFacts #DNA #SpeciesClassification


## Next steps

Now it's time to decide on what to do next with Weaviate!  
Check out the **[Imports in detail](https://weaviate.io/developers/weaviate/tutorials/import)** or the **[Querying in detail](https://weaviate.io/developers/weaviate/tutorials/query)** tutorials to find out more on how to analyze your own data with Weaviate.

## Troubleshooting

If you encounter any problems take a look at our **[troubleshooting section](https://weaviate.io/developers/weaviate/quickstart#troubleshooting)**.  
You can also ask questions and leave feedback on our **[user forum](https://forum.weaviate.io/)**.