### Library Installation

- llama-index: This is the core package for Llama Index, providing the framework for building LLM applications.
- llama-index-core: This is a subpackage of Llama Index, containing essential components like the VectorStoreIndex and SimpleDirectoryReader.
- llama-index-embeddings-huggingface: This package integrates Hugging Face embeddings with Llama Index, allowing you to use pretrained models for encoding text.
- llama-index-llms-cohere: This package integrates the Cohere LLM with Llama Index, enabling you to use Cohere for text generation and other tasks.

In [None]:
#!pip install llama-index llama-index-core llama-index-embeddings-huggingface llama-index-llms-cohere

### Library Imports

- from llama_index.core import (Settings, VectorStoreIndex, SimpleDirectoryReader)
: Imports classes for creating a vector store index, reading documents from a directory, and configuring the index's settings.
- from llama_index.core.node_parser import SentenceSplitter: Imports a class for splitting text into sentences.
- from llama_index.embeddings.huggingface import HuggingFaceEmbedding: Imports a class for using Hugging Face embeddings to encode text.
- from llama_index.llms.cohere import Cohere: Imports a class for using Cohere as the language model.
- from dotenv import load_dotenv: Imports a module for loading environment variables from a .env file.
- import os: Imports the built-in os module for interacting with the operating system.

In [1]:
from llama_index.core import (
    Settings,
    VectorStoreIndex, 
    SimpleDirectoryReader, 
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.cohere import Cohere
from dotenv import load_dotenv
import os

### LlamaIndex Settings

- load_dotenv()
: Loads environment variables from the .env file, if it exists. This is typically used to store sensitive information like API keys.
- Settings
: This class is used to configure the settings of the vector store index. You can customize parameters like the maximum number of documents, the embedding model used, and the similarity metric.
- VectorStoreIndex: This class is used to create a vector store index. It stores documents as vectors and allows for efficient retrieval of relevant documents based on similarity.
- SimpleDirectoryReader: This class is used to read documents from a directory and create a list of Document objects.
- SentenceSplitter: This class is used to split text into sentences. It can be useful for breaking down long documents into smaller units for indexing.
- HuggingFaceEmbedding: This class is used to encode text using Hugging Face embeddings. You can specify the embedding model you want to use (e.g., sentence-transformers/all-MiniLM-L6-v2).
- Cohere: This class is used to interact with the Cohere language model. You can use it to generate text, summarize text, or answer questions.

In [None]:
load_dotenv()
COHERE_API_KEY = os.getenv("COHERE_API_KEY")

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.text_splitter = SentenceSplitter(chunk_size=1024)
Settings.llm = Cohere(model = 'command-r', api_key = COHERE_API_KEY)
Settings.embed_model = embed_model
#Settings.chunk_size = 512
#Settings.chunk_overlap = 20

### Loading Files

In [3]:
reader = SimpleDirectoryReader(input_dir="sample_data_files")
documents = reader.load_data()

### VectorStore Creation for Indexing

In [4]:
index = VectorStoreIndex.from_documents(documents)

### Querying and Generating Response

In [5]:
query_engine = index.as_query_engine()
response = query_engine.query("What is the data about?")
response

