### Import Python Modules

Import the required Python modules from Langchain.

* **RecursiveCharacterTextSplitter** - Text splitter. This is required to split our Nutanix Bible content into chunks.
* **HuggingFaceEmbeddings** - This allows access to embedding models in Hugging Face
* **Milvus** - This allows the code to use and manage our Milvus Vector DB.
* **WebBaseLoader** - This loads HTML content into a document format we can use with our DB.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Milvus
from langchain.document_loaders import WebBaseLoader

### Initialize an instance of HuggingFaceEmbeddings

This code is initializing an instance of [HuggingFaceEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.huggingface.HuggingFaceEmbeddings.html) from Langchain, which allows us to access the [Sentence Transformers](https://huggingface.co/sentence-transformers) embedding models in Hugging Face.

An embedding is a numerical representation of objects for use in machine learning systems. The text content from the Nutanix Bible needs to be translated into multi-dimensional vectors. Like the foundational models that can be used for chatbots (e.g. LLama2), there are many pre-trained embedding models to help us do this translation. These models have learned from a huge amount of text to understand how words are normally used and in what contexts those words are used in.

In this lab, we are using the [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) embedding model.

#### Note
When running this code, you can ignore the following warning:

```
/home/jovyan/.local/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
```

In [None]:
modelPath = "sentence-transformers/all-mpnet-base-v2"

model_kwargs = {}

# Create a dictionary with encoding options, specifically setting 'normalize_embeddings' to False
encode_kwargs = {'normalize_embeddings': True}

# Initialize an instance of HuggingFaceEmbeddings with the specified parameters
embeddings = HuggingFaceEmbeddings(
    model_name=modelPath,     # Provide the pre-trained model's path
    model_kwargs=model_kwargs, # Pass the model configuration options
    encode_kwargs=encode_kwargs # Pass the encoding options
)

### Load the Nutanix Bible content

This code uses the <a href="https://python.langchain.com/docs/integrations/document_loaders/web_base" target="_blank">WebBaseLoader</a> component of Langchain to load our Nutanix Bible content into a document format that can be ingested into the database.

In [None]:
loader = WebBaseLoader("https://www.nutanixbible.com/classic")
data = loader.load()

### Split contents of the data object

We don’t want to store the data as one vector. In this code, we’ll split up the data into chunks of 500 characters each with the [RecursiveCharacterTextSplitter](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.RecursiveCharacterTextSplitter.html) component of Langchain.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(data)

### Ingest the documents into Milvus

In this code, we'll ingest our documents into Milvus along with our embeddings object, which will ingest all of our documents and create a vector embedding of each.

Note that we are using the internal name of the database service instead of an IP.

In [None]:
vector_db = Milvus.from_documents(
    docs,
    embeddings,
    collection_name="nutanixbible_web",
    connection_args={"host":"10.38.35.41","port":"19530"}
)

### Return to the Lab Guide

Move onto the **View Documents in Milvus** section of the lab guide.