# Quickstart Tutorial
https://weaviate.io/developers/weaviate/quickstart

> Questions:<ol><li>How to provide custom vectors? See **Option 2: Custom vectors** below!</li><li>How to query metadata (i) for a secific schema and (ii) for the entire database instance?</li><li>What search methods other than `nearText` (see below) search are available?</li></ol>

## Overview
Welcome. Here, you'll get a quick taste of Weaviate in ~20 minutes.<br>
You will:
- Build a vector database and
- Query it with *semantic search*.
> OBJECT VECTORS<br>With Weaviate, you have options to:<ul><li>Have **Weaviate create vectors**, or</li><li>Specify **custom vectors**.</li></ul>This tutorial demonstrates both methods.

#### Source data
We will use a (tiny) dataset of quizzes.

The data comes from a TV quiz show ("Jeopardy!")
||Category|Question|Answer|
|-:|:-|:-|:-|
|0|SCIENCE|This organ removes excess glucose from the blood & stores it as glycogen|Liver|
|1|ANIMALS|It's the only living mammal in the order Proboseidea|Elephant|
|2|ANIMALS|The gavial looks very much like a crocodile except for this bodily feature|the nose or snout|
|3|ANIMALS|Weighing around a ton, the eland is the largest species of this animal in Africa|Antelope|
|4|ANIMALS|Heaviest of all poisonous snakes is this North American rattlesnake|the diamond back rattler|
|5|SCIENCE|2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification|species|
|6|SCIENCE|A metal that is "ductile" can be pulled into this while cold & under pressure|wire|
|7|SCIENCE|In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance|DNA|
|8|SCIENCE|Changes in the tropospheric layer of this are what gives us weather|the atmosphere|
|9|SCIENCE|In 70-degree air, a plane traveling at about 1,130 feet per second breaks it|Sound barrier|
---
## Create an instance
First, create a Weaviate database.
1. Go to the [WCS Console](https://console.weaviate.cloud/), and
    - Click `Sign in with the Weaviate Cloud Services`.
    - If you don't have a WCS account, click on `Register`.
1. Sign in with your WCS username and password.
1. Click `Create cluster`.

<img style="float:center" width="85%" src="images/image_1.png">

<font color="midnightblue">If you prefer another method, please see our [installation options](https://weaviate.io/developers/weaviate/installation) page.</font>

Then:
1. Select the `Free sandbox` tier.
1. Provide a *Cluster name*.
1. Set *Enable Authentication*? to `YES`.

<img style="float:center;" width="85%" src="images/image_2.png">

Click `Create`. This will take ~2 minutes and you'll see a tick ✔️ when finished.

#### Note your cluster details
You will need:
- The Weaviate URL, and
- Authentication details (Weaviate API key).

Click `Details` to see them.

For the Weaviate API key, click on the 🗝️ button.

<img style="float:center" width="40%" src="images/image_3.png">

---

## Install a client library
We suggest using a [Weaviate client](https://weaviate.io/developers/weaviate/client-libraries). To install your preferred client ↓:

> <font color="midnightblue">INSTALL CLIENT LIBRARIES<br>Add `weaviate-client` to your Python environment with `pip:`<br>`pip install weaviate-client`</font>

---

## Connect to Weaviate
From the `Details` tab in WCS, get:
- The Weaviate **API key**, and
- The Weaviate **URL**.

And because we will use the Hugging Face inference API to generate vectors, you need:
- A Hugging Face **inference API key**.

So, instantiate the client as follows:

In [10]:
# load environment variables (custom)
import os
from dotenv import load_dotenv
load_dotenv()
WEAVIATE_CLIENT_URL = os.getenv("WEAVIATE_CLIENT_URL")
WEAVIATE_CLIENT_KEY = os.getenv("WEAVIATE_CLIENT_KEY")
HUGGINGFACE_API_KEY = os.getenv("HUGGINGFACE_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
print("environment variables loaded")
# instantiate weaviate client (vanilla)
import weaviate
import json
client = weaviate.Client(
    url = WEAVIATE_CLIENT_URL, # your Weaviate endpoint
    auth_client_secret=weaviate.AuthApiKey(api_key=WEAVIATE_CLIENT_KEY), # your Weaviate API key
    additional_headers = {
        "X-HuggingFace-Api-Key": HUGGINGFACE_API_KEY # HuggingFace inference API key
    }
)
client

environment variables loaded


<weaviate.client.Client at 0x107109420>

Now you are connected to your Weaviate instance!

---
## Define a class
Next, we define a data collection (a "class" in Weaviate) to store objects in:

In [18]:
# new, unique class name (custom)
import random
import string
def get_random_string(length):
    ID = "".join(random.choice(string.ascii_uppercase) for i in range(length))
    ID += "".join(random.choice(string.ascii_lowercase) for i in range(length))
    ID += "".join(random.choice(string.digits) for i in range(length))
    ID = "".join(random.sample(ID, length))
    ID = ID[:4]
    if ID==".git": # The name cannot end with ".git"
        ID = get_random_string(length)
    return "".join(random.sample(ID, length))
Question = f"Question_{get_random_string(4)}"
print(f"class name:\t{Question}")
# class definition (vanilla)
class_obj = {
    "class": f"{Question}",
    "vectorizer": "text2vec-huggingface",  # If set to "none" you must always provide vectors ...
    "moduleConfig": {                      # ...  yourself. Could be any other "text2vec-*" also.
        "text2vec-huggingface": {
            "model": "sentence-transformers/all-MiniLM-L6-v2",  # Can be any public or private Hugging Face model.
            "options": {
                "waitForModel": True
            }
        }
    }
}
client.schema.create_class(class_obj)
client

class name:	Question_64No


<weaviate.client.Client at 0x107109420>

> What if I want to use a different vectorizer module?<br><br>In this example, we use the `HuggingFace` inference API. But you can use others.<br><br>
> OUR RECOMMENDATION<br>
Vectorizer selection is a big topic - so for now, we suggest sticking to the defaults and focus on learning the basics of Weaviate.<br><br>
If you do want to change the vectorizer, you can - as long as:<br>- The module is available in the Weaviate instance you are using, and<br>- You have an API key (if necessary) for that module.<br><br>
Each of the following modules is available in the free sandbox.<br>-`text2vec-cohere`<br>-`text2vec-huggingface`<br>-`text2vec-openai`<br>-`text2vec-palm`<br><br>Depending on your choice, make sure to pass on the API key for the inference service by setting the header with an appropriate line from below, remembering to replace the placeholder with your actual key:<br>`"X-Cohere-Api-Key":      "YOUR-COHERE-API-KEY",      // For Cohere`<br>`"X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY", // For Hugging Face`<br>`"X-OpenAI-Api-Key":      "YOUR-OPENAI-API-KEY",      // For OpenAI`<br>`"X-Palm-Api-Key":        "YOUR-PALM-API-KEY",        // For PaLM`<br><br>Additionally, we also provide suggested `vectorizer` module configurations.<br><br>**Cohere**:<br>`class_obj = {"class": "Question", "vectorizer": "text2vec-cohere"}`<br><br>**HuggingFace**:<br>`class_obj = {`<br>`  "class": "Question",`<br>`  "vectorizer": "text2vec-huggingface",`<br>`  "moduleConfig": {`<br>`    "text2vec-huggingface": {`<br>`      "model": "sentence-transformers/all-MiniLM-L6-v2", // any HuggingFace model`<br>`      "options": {`<br>`        "waitForModel": true                             // "model not ready" error => try this`<br>`      }`<br>`    }`<br>`  }`<br>`}`<br><br>**OpenAI**:<br>`class_obj = {`<br>`  "class": "Question",`<br>`  "vectorizer": "text2vec-openai",`<br>`  "moduleConfig": {`<br>`    "text2vec-openai": {`<br>`      "model": "ada",`<br>`      "modelVersion": "002",`<br>`      "type": "text"`<br>`    }`<br>`  }`<br>`}`<br><br>**PaLM**:<br>`class_obj = {`<br>`  "class": "Question",`<br>`  "vectorizer": "text2vec-palm",`<br>`  "moduleConfig": {`<br>`    "text2vec-palm": {`<br>`      "projectId": "YOUR-GOOGLE-CLOUD-PROJECT-ID", // Required, e.g. "cloud-large-language-models"`<br>`      "apiEndpoint": "YOUR-API-ENDPOINT",          // Default: "us-central1-aiplatform.googleapis.com"`<br>`      "modelId": "YOUR-GOOGLE-CLOUD-MODEL-ID"      // Default: "textembedding-gecko"`<br>`    }`<br>`  }`<br>`}`

This creates a class `Question`, tells Weaviate which `vectorizer` to use, and sets the `moduleConfig` for the vectorizer.

> IS A `vectorizer` SETTING MANDATORY?<br>- No. You always have the option of providing vector embeddings yourself.<br>- Setting a `vectorizer` gives Weaviate the option of creating vector embeddings for you.<br>- If you do not wish to, you can set this to `none`.

Now you are ready to add objects to Weaviate.

---

## Add objects
We'll add objects to our Weaviate instance using a **batch import** process.

> Why use batch imports?<br>Batch imports provide significantly improved import performance, so you should almost always use batch imports unless you have a good reason not to, such as single object creation.

First, you will use the `vectorizer` to create object vectors.

### *Option 1*: `vectorizer`
The code below imports object data without specifying a vector. This causes Weaviate to use the `vectorizer` defined for the class to create a vector embedding for each object.

In [19]:
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)
# configure a batch process
with client.batch(batch_size=100) as batch:
    # batch import all questions
    for i, d in enumerate(data):
        print(f"importing question: {i+1}")
        properties = {"answer": d["Answer"], "question": d["Question"], "category": d["Category"]}
        client.batch.add_data_object(properties, Question)

importing question: 1
importing question: 2
importing question: 3
importing question: 4
importing question: 5
importing question: 6
importing question: 7
importing question: 8
importing question: 9
importing question: 10


The above code:
- Loads objects,
- initializes a batch process, and
- adds objects to the target class (`Question_XXXX`) one by one.

### *Option 2*: Custom `vector`s
Alternatively, you can also provide your own vectors to Weaviate.

Regardless of whether a `vectorizer` is set, if a vector is specified, Weaviate will use it to represent the object.

In [20]:
# load data
fname = "jeopardy_tiny_with_vectors_all-MiniLM-L6-v2.json"  # file with vectors, created by `all-MiniLM-L6-v2`
url = f"https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/{fname}"
resp = requests.get(url)
data = json.loads(resp.text)
# configure a batch process
with client.batch(batch_size=100) as batch:
    # batch import all questions
    for i, d in enumerate(data):
        print(f"importing question: {i+1}")
        properties = {"answer": d["Answer"], "question": d["Question"], "category": d["Category"]}
        custom_vector = d["vector"]
        client.batch.add_data_object(properties, "Question", vector=custom_vector) # add custom vector

importing question: 1
importing question: 2
importing question: 3
importing question: 4
importing question: 5
importing question: 6
importing question: 7
importing question: 8
importing question: 9
importing question: 10


> Custom vectors with a `vectorizer`<br>Note that you can specify a `vectorizer` and still provide a custom vector. In this scenario, make sure that the vector comes from the same model as the one specified in the `vectorizer`.<br>In this tutorial, they come from `sentence-transformers/all-MiniLM-L6-v2` - the same as specified in the vectorizer configuration.

> VECTOR != OBJECT PROPERTY<br>Do *not* specify object vectors as an object property. This will cause Weaviate to treat it as a regular property, rather than as a vector embedding.

---
## Putting it together
The following code puts the above steps together. You can run it yourself to import the data into your Weaviate instance.

*End-to-end code*

> REMEMBER TO REPLACE THE **URL**, **WEAVIATE API KEY**, AND **INFERENCE API KEY**

In [None]:
# client
client = weaviate.Client(
    url=WEAVIATE_CLIENT_URL,                                             # use your endpoint
    auth_client_secret=weaviate.AuthApiKey(api_key=WEAVIATE_CLIENT_KEY), # use your Weaviate instance API key
    additional_headers = {"X-HuggingFace-Api-Key": HUGGINGFACE_API_KEY}  # use inference API key
)
# schema
class_obj = {
    "class": f"{Question}",
    "vectorizer": "text2vec-huggingface", # "none" => provide vectors yourself or use any other "text2vec-*".
    "moduleConfig": {
        "text2vec-huggingface": {
            "model": "sentence-transformers/all-MiniLM-L6-v2", # can be any public or private HuggingFace model
            "options": {
                "waitForModel": True
            }
        }
    }
}
client.schema.create_class(class_obj)
# load data
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)
# configure a batch process
with client.batch(batch_size=100) as batch:
    # batch import all Questions
    for i, d in enumerate(data):
        print(f"importing question: {i+1}")
        properties = {"answer": d["Answer"], "question": d["Question"], "category": d["Category"]}
        client.batch.add_data_object(properties, "Question")

Congratulations, you've successfully built a vector database!

## Queries
Now, we can run queries.

### Semantic search
Let's try a similarity search. We'll use `nearText` search to look for quiz objects most similar to `biology`.

In [22]:
nearText = {"concepts": ["biology"]}
response = (
    client
    .query
    .get(Question, ["question", "answer", "category"])
    .with_near_text(nearText)
    .with_limit(2)
    .do()
)
print(json.dumps(response, indent=4))

{
    "data": {
        "Get": {
            "Question_64No": [
                {
                    "answer": "DNA",
                    "category": "SCIENCE",
                    "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
                },
                {
                    "answer": "Liver",
                    "category": "SCIENCE",
                    "question": "This organ removes excess glucose from the blood & stores it as glycogen"
                }
            ]
        }
    }
}


You should see a result like this (these may vary per module/model used):

```
{
    "data": {
        "Get": {
            "Question": [
                {
                    "answer": "DNA",
                    "category": "SCIENCE",
                    "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
                },
                {
                    "answer": "Liver",
                    "category": "SCIENCE",
                    "question": "This organ removes excess glucose from the blood & stores it as glycogen"
                }
            ]
        }
    }
}
```


The response includes a list of top 2 (due to the `limit` set) objects whose vectors are most similar to the word `biology`.

> WHY IS THIS USEFUL?<br>Notice that even though the word `biology` does not appear anywhere, Weaviate returns biology-related entries.<br><br>This example shows why vector searches are powerful. Vectorized data objects allow for searches based on degrees of similarity, as shown here.

### Semantic search with a filter
You can add a Boolean filter to your example. For example, let's run the same search, but only look in objects that have a "category" value of "ANIMALS".

In [24]:
nearText = {"concepts": ["biology"]}
response = (
    client
    .query
    .get(Question, ["question", "answer", "category"])
    .with_near_text(nearText)
    .with_where({"path": ["category"], "operator": "Equal", "valueText": "ANIMALS"})
    .with_limit(2)
    .do()
)
print(json.dumps(response, indent=4))

{
    "data": {
        "Get": {
            "Question_64No": [
                {
                    "answer": "Elephant",
                    "category": "ANIMALS",
                    "question": "It's the only living mammal in the order Proboseidea"
                },
                {
                    "answer": "the nose or snout",
                    "category": "ANIMALS",
                    "question": "The gavial looks very much like a crocodile except for this bodily feature"
                }
            ]
        }
    }
}


You should see a result like this (these may vary per module/model used):

```
{
    "data": {
        "Get": {
            "Question": [
                {
                    "answer": "Elephant",
                    "category": "ANIMALS",
                    "question": "It's the only living mammal in the order Proboseidea"
                },
                {
                    "answer": "the nose or snout",
                    "category": "ANIMALS",
                    "question": "The gavial looks very much like a crocodile except for this bodily feature"
                }
            ]
        }
    }
}
```
$\Rightarrow$ This works well! $\checkmark$

The response includes a list of top 2 (due to the `limit` set) objects whose vectors are most similar to the word `biology` - but only from the "ANIMALS" category.

> WHY IS THIS USEFUL?<br>Using a Boolean filter allows you to combine the flexibility of vector search with the precision of `where` filters.

---

## Recap
Well done! You have:
<ul>
    <li>Created your own cloud-based vector database with Weaviate,</li>
    <li>
        Populated it with data objects,
        <ul>
            <li>Using an inference API, or</li>
            <li>Using custom vectors,</li>
        </ul>
    </li>
    <li>Performed text similarity searches.</li>
</ul>
Where next is up to you. We include a few links below - or you can check out the sidebar.

## Troubleshooting & FAQs
We provide answers to some common questions, or potential issues below.

**How to confirm class creation**
> If you are not sure whether the class has been created, you can confirm it by visiting the [`schema` endpoint](https://weaviate.io/developers/weaviate/api/rest/schema) here (replace the URL with your actual endpoint):<br>*https://some-endpoint.weaviate.network/v1/schema*<br><br>You should see:<br>`{`<br>`    "classes": [`<br>`        {`<br>`            "class": "Question",`<br>`            ...  // truncated additional information here`<br>`            "vectorizer": "text2vec-huggingface"`<br>`        }`<br>`    ]`<br>`}`<br>Where the schema should indicate that the `Question` class has been added.<br><br>REST & GRAPHQL IN WEAVIATE<br>Weaviate uses a combination of RESTful and GraphQL APIs. In Weaviate, RESTful API endpoints can be used to add data or obtain information about the Weaviate instance, and the GraphQL interface to retrieve data.

**If you see `Error: Name 'Question' already used as a name for an Object class`**
> You may see this error if you try to create a class that already exists in your instance of Weaviate. In this case, you can delete the class following the below instructions.<br><br>You can delete any unwanted class(es), along with the data that they contain.<br><br>DELETING A CLASS == DELETING ITS OBJECTS<br>Know that **deleting a class will also delete all associated objects**!<br>Do not do this to a production database, or anywhere where you do not wish to delete your data.<br>Run the code below to delete the relevant class and its objects.<br><br>`client.schema.delete_class("YourClassName") # Replace with your class name - e.g. "Question"`

**How to confirm data import**
> To confirm successful data import, navigate to the [`objects` endpoint](https://weaviate.io/developers/weaviate/api/rest/objects) to check that all objects have been imported (replace with your actual endpoint):<br>`https://some-endpoint.weaviate.network/v1/objects`<br>You should see:<br><br>`{`<br>`    "deprecations": null,`<br>`    "objects": [`<br>`        ... // details of each object`<br>`    ],`<br>`    "totalResults": 10 // You should see 10 results here`<br>`}`<br><br>Where you should be able to confirm that you have imported all `10` objects.

**If the `nearText` search is not working**
> To perform text-based (`nearText`) similarity searches, you need to have a vectorizer enabled, and configured in your class.<br>Make sure you configured it as shown in [this section](https://weaviate.io/developers/weaviate/quickstart#define-a-class).<br>If it still doesn't work - please [reach out to us](https://weaviate.io/developers/weaviate/quickstart#more-resources)!

**Will my sandbox be deleted?**
> SANDBOX EXPIRY<br>The sandbox is free, but it will expire after 14 days. After this time, all data in the sandbox will be deleted.<br>If you would like to preserve your sandbox data, you can [retrieve your data](https://weaviate.io/developers/weaviate/manage-data/read-all-objects), or [contact us to upgrade](https://weaviate.io/pricing#register) to a production SaaS instance.

## Next
You can choose your direction from here. For example, you can:
<ul>
    <li>Go through our guided <a href="https://weaviate.io/developers/weaviate/tutorials" target="_blank">Tutorials</a>, like how to
        <ul>
            <li><a href="https://weaviate.io/developers/weaviate/tutorials/schema" target="_blank">build schemas</a>,</li>
            <li><a href="https://weaviate.io/developers/weaviate/tutorials/import" target="_blank">import data</a>,</li>
            <li><a href="https://weaviate.io/developers/weaviate/tutorials/query" target="_blank">query data</a> and more.</li>
        </ul>
    </li>
    <li>Find out how to do specific things like:
        <ul>
            <li><a href="https://weaviate.io/developers/weaviate/search" target="_blank">searches</a></li>
        </ul>
    </li>
    <li>Read about important <a href="https://weaviate.io/developers/weaviate/concepts" target="_blank">concepts/theory about Weaviate</a></li>
    <li>Read our references for:
        <ul>
            <li><a href="https://weaviate.io/developers/weaviate/configuration" target="_blank">Configuration</a></li>
            <li><a href="https://weaviate.io/developers/weaviate/api" target="_blank">API</a></li>
            <li><a href="https://weaviate.io/developers/weaviate/modules" target="_blank">Modules</a></li>
            <li><a href="https://weaviate.io/developers/weaviate/client-libraries" target="_blank">Client libraries</a></li>
        </ul>
    </li>
</ul>

## More Resources
If you can't find the answer to your question here, please look at the:
<ol>
    <li><a href="https://weaviate.io/developers/weaviate/more-resources/faq" target="_blank">Frequently Asked Questions</a>. Or,</li>
    <li><a href="https://github.com/weaviate/weaviate/issues?utf8=%E2%9C%93&q=label%3Abug" target="_blank">Knowledge base of old issues</a>. Or,</li>
    <li>For questions: <a href="https://stackoverflow.com/questions/tagged/weaviate" target="_blank">Stackoverflow</a>. Or,</li>
    <li><a href="https://forum.weaviate.io/" target="_blank">Weaviate community forum</a>. Or,</li>
    <li>We also have a <a href="https://weaviate.io/slack" target="_blank">Slack channel</a>.</li>
</ol>