# Tutorial

## Introduction

Let's get started by instantiating the Lexy client. By default, this will connect to the Lexy server running at `http://localhost:9900`. You can visit the OpenAPI documentation for the server at [`http://localhost:9900/docs`](http://localhost:9900/docs).

In [None]:
from lexy_py import LexyClient

lx = LexyClient()

We can get more information about the Lexy server by calling the `info` method. Here we see existing Collections, Indexes, Transformers, and Bindings.

In [None]:
lx.info()

Let's add some documents to our "**default**" collection. We can add them using the `Collection.add_documents` method.

In [None]:
lx.add_documents([
    {"content": "This is my first document! It's great!"},
    {"content": "Starlink is a satellite internet constellation operated by American aerospace company SpaceX, providing coverage to over 60 countries."},
    {"content": "A latent space is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another."}
])

Documents that are added to the "**default**" collection are automatically embedded, and the embeddings are stored in the index "**default_text_embeddings**".

<div style="text-align: center;">

```mermaid
flowchart LR
    collection["Collection
      
    &quot;default&quot;"] 
    --> 
    transformer["Transformer 
    
    &quot;text.embeddings.minilm&quot;"]
    -->
    index["Index
    
    &quot;default_text_embeddings&quot;"];   
```

</div>

We can query the default index for "_what is deep learning_" and see our documents ranked by cosine similarity.

In [None]:
lx.query_index('what is deep learning')

## Example: Famous biographies

Let's go through a longer example to see how **Collections**, **Documents**, **Indexes**, **Bindings**, and **Transformers** interact with one another. We'll use Lexy to create and query embeddings for a new collection of documents.

### Collections

We can see that there are currently two collections, "**default**" and "**code**".

In [None]:
lx.collections

Let's create a new "**bios**" collection for famous biographies.

In [None]:
bios = lx.create_collection('bios', description='Famous biographies')
bios

### Documents

We can use the `Collection.list_documents` method to see that our new collection is empty.

In [None]:
bios.list_documents()

Let's add a few documents to our collection.

In [None]:
bios.add_documents([
    {"content": "Stephen Curry is an American professional basketball player for the Golden State Warriors."},
    {"content": "Dwayne 'The Rock' Johnson is a well-known actor, former professional wrestler, and businessman."},
    {"content": "Taylor Swift is a singer known for her songwriting, musical versatility, and artistic reinventions."}
])

### Transformers

Now we want to create embeddings for the documents in our new collection. We'll use a **`Transformer`** to generate embeddings for our documents. We can use the `LexyClient.transformers` property to see a list of available transformers. 

In [None]:
# list of available transformers
lx.transformers

For our embeddings, we'll use the "**text.embeddings.minilm**" transformer, which uses the [MiniLM sentence transformer](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model to generate embeddings for text.

### Indexes

Before we can bind this transformer to our collection, we need to create an **`Index`** for storing the resulting embeddings. 

Let's create a new index called "**bios_index**" with embeddings for our new collection. Our index will have a single field called **`bio_embedding`** that will store the embeddings output from the MiniLM sentence transformer.

In [None]:
# define index fields
index_fields = {
    "bio_embedding": {
        "type": "embedding", "extras": {"dims": 384, "model": "text.embeddings.minilm"}
    }
}

# create index
index = lx.create_index(
    index_id='bios_index', 
    description='Biography embeddings', 
    index_fields=index_fields)
index

### Bindings

Now let's create a **`Binding`**. Our binding will:
1) Feed the documents in our "**bios**" collection into the "**text.embeddings.minilm**" transformer, then 
2) Insert the resulting output in our newly created index, "**bios_index**".
<br>
<h5><center>`bios` Collection &nbsp;&nbsp;&rarr;&nbsp;&nbsp; `text.embeddings.minilm` Transformer &nbsp;&nbsp;&rarr;&nbsp;&nbsp; `bios_index` Index </center></h5>

<div style="text-align: center;">

```mermaid
flowchart LR
    collection["Collection
      
    &quot;default&quot;"] 
    --> 
    transformer["Transformer 
    
    &quot;text.embeddings.minilm&quot;"]
    -->
    index["Index
    
    &quot;default_text_embeddings&quot;"];   
```

</div>

In [None]:
binding = lx.create_binding(
    collection_name='bios',
    transformer_id='text.embeddings.minilm',
    index_id='bios_index'
)
binding

We can now query our index for "_famous artists_" and see the results ranked by cosine similarity.

In [None]:
index.query(query_text='famous artists', query_field='bio_embedding', k=3)

Because our binding has status set to "`ON`", any new documents added to our collection will automatically be processed by our transformer and inserted into our index as embeddings.
 
Let's add another document.

In [None]:
bios.add_documents([
    {"content": "Beyoncé is a singer and songwriter recognized for her boundary-pushing artistry, vocals, and performances."}
])

Now let's run the same query again for "_famous artists_". We can see the results have been updated and include our new document.

In [None]:
index.query(query_text='famous artists', query_field='bio_embedding', k=3)

## Next steps

### Custom transformers

So far, we've only used the default transformers included in Lexy. Let's see how we can easily create our own transformers.

_Coming soon._

### Document filters

_Coming soon._