# `searchlite` Ollama Demo Notebook v2.0 
This notebook contains code walking through how to use `searchlite` with an embedding model from Ollama. This implementation requires the `ollama` python library. Before running this notebook, **make sure you've pip installed the optional dependency** by doing:

```bash 
pip install searchlite[ollama]
```

In this notebook we'll load a sample text data set with some metadata, split the dataframe into the text and its metadata, load it into `searchlite`, and perform/display a semantic search.

First, import your dependencies. For this example, we need searchlite, pandas             (for loading in our example data), and os (for defining the file path to our example data).

In [1]:
from searchlite.document import Document
import pandas as pd
import os

## Import and look at data

Next, define the path to the sample data. In this case it is in the data folder.            After defining the path, use pandas to load in the csv file as a data frame.

In [2]:
sample_df = pd.read_csv(
   os.path.join(os.getcwd(), '../data/synthetic_data.csv'),
   index_col=0
)

Let's take a look at our sample data below. The data consists of 15 distinct pieces             of text with corresponding id and category values. Each text topic is quite different so you can test the                 semantic search with different queries to see if the results makes sense.

In [3]:
sample_df

Unnamed: 0,id,category,text
0,1,Product Description,Experience unparalleled sound quality with the...
1,2,Movie Synopsis,"In a world ravaged by climate change, a group ..."
2,3,News Article,The city council approved the new public trans...
3,4,Recipe,"Preheat the oven to 375°F. Mix flour, sugar, a..."
4,5,Travel Guide,"Discover the hidden gems of Kyoto, from tranqu..."
5,6,Scientific Abstract,This study investigates the effects of micropl...
6,7,Book Review,"An evocative tale of love and loss, 'The Silen..."
7,8,Job Posting,Looking for a skilled software engineer profic...
8,9,User Manual,"To reset your device, hold the power button fo..."
9,10,Historical Event,"The Berlin Wall, constructed in 1961, symboliz..."


Before initializing the `Document` class, you need to split the dataframe into the             text you want to embed and it's corresponding metadata (shown below). You can accomplish this by simply                 isolating the text column and by using the .to_dict() method to convert the metadata columns into a                     list of dictionaries, with each entry corresponding to a row in the dataframe.

In [4]:

            sample_texts = sample_df["text"]
            sample_metadata = sample_df[["id", "category"]].to_dict(orient = "records")
            

In [5]:
sample_texts[0:3]

0    Experience unparalleled sound quality with the...
1    In a world ravaged by climate change, a group ...
2    The city council approved the new public trans...
Name: text, dtype: object

In [6]:
sample_metadata[0:3]

[{'id': 1, 'category': 'Product Description'},
 {'id': 2, 'category': 'Movie Synopsis'},
 {'id': 3, 'category': 'News Article'}]

## Use searchlite to embed text an run semantic search

Now, you can initialize our `Document` class. As shown below, both the text and metadata             are saved as attributes. Before performing search, you must generate embeddings for the texts stored within the `Document` instance.

We'll be using the `nomic-embed-text` embedding model from Ollama for this demo. Before writing any code, **make sure the Ollama application is open and that a server is running with the api endpoint**. You can start a server by running the following in your terminal.

```bash
ollama serve
```
Next, make sure you've downloaded the model you want to use. If you want to use the same model as this demo, run the following in your terminal.

```bash
ollama pull nomic-embed-text
```

`searchlite` will automatically check if you have an api endpoint running before allowing you to embed anything. If you get an error, double check that Ollama is running and that you've served an API endpoint.

In [7]:
from searchlite.embedders.ollama import OllamaEmbedder

In [8]:
embedder = OllamaEmbedder(model_name = "nomic-embed-text")
embedder

Ollama Embedder object. Chosen model: nomic-embed-text. 
Ollama server status: Running

In [9]:
doc = Document(texts = sample_texts, metadata = sample_metadata, embedder = embedder)

In [10]:
doc

Document instance with 15 texts. Metadata contains the following fields: id, category. Embeddings: Not Ready.

In [11]:
doc.embed()

In [12]:
res = doc.query(query_text = "wireless earbuds with good battery life")
res

[{'id': 1,
  'category': 'Product Description',
  'text': 'Experience unparalleled sound quality with the EchoSphere wireless earbuds, featuring noise cancellation, 12-hour battery life, and an ergonomic design perfect for workouts.',
  'similarity score': 0.7513732317790955},
 {'id': 9,
  'category': 'User Manual',
  'text': 'To reset your device, hold the power button for 10 seconds until the LED indicator flashes. Release the button and wait for the system reboot.',
  'similarity score': 0.4745084895063595},
 {'id': 12,
  'category': 'Health & Fitness',
  'text': 'Regular cardio workouts not only improve heart health but also boost mental clarity and reduce stress levels.',
  'similarity score': 0.46145138112268447}]

In [13]:
doc.display_results(res, style = "f-string")

Result 1:
    id: 1
    category: Product Description
    text: Experience unparalleled sound quality with the EchoSphere wireless earbuds, featuring noise cancellation, 12-hour battery life, and an ergonomic design perfect for workouts.
    similarity score: 0.7513732317790955

Result 2:
    id: 9
    category: User Manual
    text: To reset your device, hold the power button for 10 seconds until the LED indicator flashes. Release the button and wait for the system reboot.
    similarity score: 0.4745084895063595

Result 3:
    id: 12
    category: Health & Fitness
    text: Regular cardio workouts not only improve heart health but also boost mental clarity and reduce stress levels.
    similarity score: 0.46145138112268447



In [14]:
doc.display_results(res, style = "pprint")

[{'category': 'Product Description',
  'id': 1,
  'similarity score': 0.7513732317790955,
  'text': 'Experience unparalleled sound quality with the EchoSphere wireless '
          'earbuds, featuring noise cancellation, 12-hour battery life, and an '
          'ergonomic design perfect for workouts.'},
 {'category': 'User Manual',
  'id': 9,
  'similarity score': 0.4745084895063595,
  'text': 'To reset your device, hold the power button for 10 seconds until '
          'the LED indicator flashes. Release the button and wait for the '
          'system reboot.'},
 {'category': 'Health & Fitness',
  'id': 12,
  'similarity score': 0.46145138112268447,
  'text': 'Regular cardio workouts not only improve heart health but also '
          'boost mental clarity and reduce stress levels.'}]


In [15]:
doc.display_results(res, style = "tabulate")

+------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
|   id | category            | text                                                                                                                                                                          |   similarity score |
|    1 | Product Description | Experience unparalleled sound quality with the EchoSphere wireless earbuds, featuring noise cancellation, 12-hour battery life, and an ergonomic design perfect for workouts. |           0.751373 |
+------+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+
|    9 | User Manual         | To reset your device, hold the power button for 10 second