## Tutorial 1: Academy Weaviate
### [101T Work with: Text Data](https://docs.weaviate.io/academy/py/starter_text_data)


### 👉 Perform searches
- Describe differences between `semantic`, `keyword` and `hybrid` **searches** at a high level.
- Perform a **semantic search** with near text functions.
- Perform a **keyword search**.
- Perform a **hybrid search**.
- Perform a **Filters**.

## ➡️🧠🤓 Semantic search

With Weaviate, you can perform **semantic searches** <u>to find similar items based on their meaning</u>. This is done by <u>comparing the vector embeddings of the items in the database</u>.

This example finds <u>entries in "Movie" based on their similarity to the query "dystopian future"</u>, and prints out **the title** and **release year** of the **top 5** matches.

```bash
import weaviate
import weaviate.classes.query as wq
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}  # Replace with your OpenAI API key
# client = weaviate.connect_to_weaviate_cloud(..., headers=headers) or
# client = weaviate.connect_to_local(..., headers=headers)

# Get the collection
movies = client.collections.use("Movie")

# Perform query
response = movies.query.near_text(
    query="dystopian future", limit=5, return_metadata=wq.MetadataQuery(distance=True)
)

# Inspect the response
for o in response.objects:
    print(
        o.properties["title"], o.properties["release_date"].year
    )  # Print the title and release year (note the release date is a datetime object)
    print(
        f"Distance to query: {o.metadata.distance:.3f}\n"
    )  # Print the distance of the object from the query

client.close()

```

The results are based on similarity of the vector embeddings between the query and the database object text. In this case, <u>the **embeddings** are generated by the `vectorizer` **module**</u>.

The `limit` parameter  sets <u>the maximum number of results to return</u>.

The `return_metadata` parameter <u>takes an instance of the `MetadataQuery` class to set metadata to return in the search results.</u> 
The current `query` returns the **vector distance** to the query.



###  Example results: 

```bash
In Time 2011
Distance to query: 0.179>

Gattaca 1997
Distance to query: 0.180

I, Robot 2004
Distance to query: 0.182

Mad Max: Fury Road 2015
Distance to query: 0.190

The Maze Runner 2014
Distance to query: 0.193 
```


## ◻️ Explain the code

### ◽Response object
The returned object is an instance of a custom class. Its objects attribute is a list of search results, each object being an instance of another custom class.

Each returned object will:

- Include all **properties** and its **UUID** by default <u>except those with blob data types.</u>
- Not include any other information (e.g. references, metadata, vectors.) by default.

## ➡️🧠🤓 Keyword search

It can also perform `keyword (BM25) searches` <u>to find items based on their keyword similarity</u>, or `hybrid searches` <u>that combine BM25 and semantic/vector searches</u>.


## ◻️ Keyword search

This example <u>finds entries in "Movie" with the **highest keyword search scores** for the term "history"</u>, and prints out the **title** and **release year** of the **top 5** matches.

```bash
import weaviate
import weaviate.classes.query as wq
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}  # Replace with your OpenAI API key
# client = weaviate.connect_to_weaviate_cloud(..., headers=headers) or
# client = weaviate.connect_to_local(..., headers=headers)

# Get the collection
movies = client.collections.use("Movie")

# Perform query
response = movies.query.bm25(
    query="history", limit=5, return_metadata=wq.MetadataQuery(score=True)
)

# Inspect the response
for o in response.objects:
    print(
        o.properties["title"], o.properties["release_date"].year
    )  # Print the title and release year (note the release date is a datetime object)
    print(
        f"BM25 score: {o.metadata.score:.3f}\n"
    )  # Print the BM25 score of the object from the query

client.close()

```

### ◽Explain the code

The results are based on a keyword search score using what's called the `BM25f algorithm`.
The `limit` parameter sets the maximum number of results to return.

The `return_metadata` parameter takes an instance of the `MetadataQuery` class to set metadata to return in the search results. 
The current `query` returns the `score`, <u>which is the BM25 score of the result.</u>

###  Example results: 

```bash
American History X 1998
BM25 score: 2.707

A Beautiful Mind 2001
BM25 score: 1.896

Legends of the Fall 1994
BM25 score: 1.663

Hacksaw Ridge 2016
BM25 score: 1.554

Night at the Museum 2006
BM25 score: 1.529
```



## ➡️🧠🤓 Hybrid search

This example <u>finds entries in "Movie" with the **highest hybrid search scores** for the term "history"</u>, and prints out the **title** and **release year** of the **top 5** matches.

```bash
import weaviate
import weaviate.classes.query as wq
import os


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}  # Replace with your OpenAI API key
# client = weaviate.connect_to_weaviate_cloud(..., headers=headers) or
# client = weaviate.connect_to_local(..., headers=headers)

# Get the collection
movies = client.collections.use("Movie")

# Perform query
response = movies.query.hybrid(
    query="history", limit=5, return_metadata=wq.MetadataQuery(score=True)
)

# Inspect the response
for o in response.objects:
    print(
        o.properties["title"], o.properties["release_date"].year
    )  # Print the title and release year (note the release date is a datetime object)
    print(
        f"Hybrid score: {o.metadata.score:.3f}\n"
    )  # Print the hybrid search score of the object from the query

client.close()

```

### ◽Explain the code

The results are based on a hybrid search score. A **hybrid search** <u>blends results of **BM25** and **semantic/vector searches**.</u>

The `limit` parameter sets the maximum number of results to return.

The `return_metadata` parameter takes an instance of the `MetadataQuery` class to set metadata to return in the search results. 
The current `query` returns the `score`, which is the hybrid score of the result.

###  Example results: 

```bash
Legends of the Fall 1994
Hybrid score: 0.016

Hacksaw Ridge 2016
Hybrid score: 0.016

A Beautiful Mind 2001
Hybrid score: 0.015

The Butterfly Effect 2004
Hybrid score: 0.015

Night at the Museum 2006
Hybrid score: 0.012

```

## ➡️🧠🤓 Filters

`Filters` can be used to precisely <u>refine search results</u>. You can filter by **properties** as well as **metadata**, and you can <u>combine **multiple filters**</u> with `and` or `or` **conditions** to further narrow down the results.

This example <u>finds entries in "Movie" based on their similarity to the query "dystopian future"</u>, only from those **released after 2020**. It prints out the **title** and **release year** of the **top 5** matches.

```bash
import weaviate
import weaviate.classes.query as wq
import os

from datetime import datetime


# Instantiate your client (not shown). e.g.:
# headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_APIKEY")}  # Replace with your OpenAI API key
# client = weaviate.connect_to_weaviate_cloud(..., headers=headers) or
# client = weaviate.connect_to_local(..., headers=headers)

# Get the collection
movies = client.collections.use("Movie")

# Perform query
response = movies.query.near_text(
    query="dystopian future",
    limit=5,
    return_metadata=wq.MetadataQuery(distance=True),
    filters=wq.Filter.by_property("release_date").greater_than(datetime(2020, 1, 1))
)

# Inspect the response
for o in response.objects:
    print(
        o.properties["title"], o.properties["release_date"].year
    )  # Print the title and release year (note the release date is a datetime object)
    print(
        f"Distance to query: {o.metadata.distance:.3f}\n"
    )  # Print the distance of the object from the query

client.close()

```

### ◽Explain the code

This query is identical to that shown earlier for semantic search, but with the addition of a filter. The `filters` parameter <u>takes an instance of the Filter class to set the filter conditions</u>. The current query filters the results to only include those with a release year after 2020.

###  Example results: 

```bash
Dune 2021
Distance to query: 0.199

Tenet 2020
Distance to query: 0.200

Mission: Impossible - Dead Reckoning Part One 2023
Distance to query: 0.207

Onward 2020
Distance to query: 0.214

Jurassic World Dominion 2022
Distance to query: 0.216

```