# Ask Python

This project is an effort to showcase how to use Milvus for semantic search and **retrieval-augmented generation** (**RAG**) with a **large language model** (**LLM**). It demonstrates how to use `llama-index` with Milvus to download, extract, and load data from plain text files into the Milvus vector store, and then use them as context when querying an LLM of choice.

## Prerequisites
While 


In [None]:
!pip install milvus pymilvus llama-index requests openai transformers torch

In [2]:
# Standard library imports
import requests
import tarfile

# External imports
from IPython.display import display
from llama_index import ServiceContext, SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores import MilvusVectorStore
from milvus import default_server
import openai
from pymilvus import connections, utility

# Internal imports
from config import OPENAI_KEY

# Only run this if you're planning to use 
openai.api_key = OPENAI_KEY

In [3]:
DOCS_URL = "https://docs.python.org/3/archives/python-3.11.5-docs-text.tar.bz2"

If you don't have the dataset from the repo, you can download and extract the full set in the cell below. While you can try to embed the entire Python documents, to reduce your costs and time spent generating embeddings, you can remove all directories and files except for `faq`, `howto`, `library`, `tutorial`, and `glossary.txt`.

**Note**: If using OpenAI, make sure you have a payment method set up so that you aren't subject to daily request limits.

In [None]:
response = requests.get(DOCS_URL, stream=True)
file = tarfile.open(fileobj=response.raw, mode="r|bz2")
file.extractall(path="python_docs")

Next, we will use SimpleDirectoryReader to load the text data into a `docs` object.

In [16]:
reader = SimpleDirectoryReader(
    input_dir="python_docs", required_exts=[".txt"], recursive=True
)
docs = reader.load_data()

Now that we've loaded the extracted data, we can generate embeddings and save them to Milvus. First, start a Milvus server with `milvus-server` (alternatively, see Milvus' [Getting Started](https://milvus.io/docs/milvus_lite.md) page for more ways to connect to the server). If you're starting the server programmatically, it's a good idea to connect with a [context manager](https://realpython.com/python-with-statement/) to automatically close your server connection when your code leaves the context manager's scope.

Now, we can create a MilvusVectorStore object, configure it, and then create an index on it from the loaded documents. Only run one of the two below cells depending on whether you want to use OpenAI or a local model. OpenAI usage may be rate-limited or incur charges, while using a local model may take longer to download and more resources to generate embeddings.

In [17]:
# Using OpenAI

vector_store = MilvusVectorStore(
    dim=1536,
    host = "127.0.0.1",
    port = default_server.listen_port
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(docs, storage_context=storage_context)

In [None]:
# Using a local embedding model
service_context = ServiceContext.from_defaults(embed_model="local:BAAI/bge-large-en")

vector_store = MilvusVectorStore(
    dim=1024,
    host = "127.0.0.1",
    port = default_server.listen_port
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_context=service_context)

Once this is done, we can create a query engine from the index and query via that.

In [19]:
query_engine = index.as_query_engine()

In [20]:
print(query_engine.query("What is a list?"))
print("---")
print(query_engine.query("How do I create a dictionary?"))

A list is a compound data type in Python that is used to group together other values. It is written as a list of comma-separated values between square brackets. Lists can contain items of different types, but usually, the items have the same type. Lists are mutable, which means their content can be changed. They support operations like indexing, slicing, concatenation, and adding new items. Lists can also be nested, meaning they can contain other lists.
---
To create a dictionary in Python, you can use a pair of braces "{}" to create an empty dictionary. If you want to add initial key-value pairs to the dictionary, you can place a comma-separated list of key:value pairs within the braces. For example:

```
my_dict = {}  # creates an empty dictionary

my_dict = {'key1': 'value1', 'key2': 'value2'}  # creates a dictionary with initial key-value pairs
```

You can also use the "dict()" constructor to build dictionaries directly from sequences of key-value pairs. For example:

```
my_dict 