## Summary

📇 Embedding Index
- A data structure used to store and search embeddings efficiently

- Purpose: Enables fast nearest-neighbor lookup (e.g., for semantic search or retrieval)

- Often built using libraries like:

    - FAISS (Facebook)

    - Annoy (Spotify)

    - ScaNN (Google)

    - Chroma, Weaviate, Pinecone (vector DBs)
  
🔄 Relationship
|Component|	Role|
|---------------|-------------------------------------------|
|Embedding Model|	Generates vector representations|
|Embedding Index|	Stores those vectors and supports search|

🔧 Example Workflow
- Use an embedding model (e.g., OpenAI or SBERT) to encode documents.

- Store those embeddings in an index (e.g., FAISS).

- When a query comes in:

    - Encode it with the same model

    - Search the index for nearest neighbors (semantic matches)

### Embeddings for 2022 Events Case Study
For our dataset, we will use an OpenAI Embedding model, specifically text-embedding-ada-002. This OpenAI model [produces embeddings with 1,536 dimensions](https://openai.com/blog/new-and-improved-embedding-model). Read more in the [API documentation](https://platform.openai.com/docs/guides/embeddings/embeddings).

#### Basic Example
A basic example of using this model looks like this:

    # Generic example code
    openai.Embedding.create(
        input=["text", "input", "here"],
        engine="name-of-model"
    )

#### Embeddings Code for 2022 Events Case Study
We can generate embeddings for the 2022 Wikipedia page using a similar process. But in this case, we'll send data in batches of 100 in order to avoid rate-limiting issues.

    EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
    batch_size = 100
    embeddings = []
    for i in range(0, len(df), batch_size):
        # Send text data to OpenAI model to get embeddings
        response = openai.Embedding.create(
            input=df.iloc[i:i+batch_size]["text"].tolist(),
            engine=EMBEDDING_MODEL_NAME
        )

        # Add embeddings to list
        embeddings.extend([data["embedding"] for data in response["data"]])

    # Add embeddings list to dataframe
    df["embeddings"] = embeddings

**Reminder: All of this code for the case study is available in a Jupyter Notebook on the Case Study Workspace page**

After this step, we will have generated and saved embeddings for all rows of our dataset. This is also known as creating an **embeddings index**.

You can practice building your own embedding index on the next page.

## Additional References

[]()