# Langchain Experimentation
Feel free to peek at the repo, but I'm simply using Jupyter Notebooks to explore the LangChain framework and gain undertanding of how it interacts with LLM's and supports things like RAG and Agentic Workflows. I have no specific goal in mind. 


## Install Dependencies

In [None]:
!pipenv install

# Prepare the LLM
You'll need a `.env` file with the following variables:
```text
HUGGINGFACE_API_TOKEN=api_key
```
You can alternatively downgrade the `repo_id` to the Llama3 8B Instruct model, which 
should be available for free. Or, swap out the LLM for a different one. 

In [None]:
from langchain.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import os

HF_API_TOKEN = os.environ['HUGGINGFACE_API_TOKEN']

llm = HuggingFaceEndpoint(repo_id="meta-llama/Meta-Llama-3-70B-Instruct", 
                             temperature=0.001,
                             streaming=True,
                             callbacks=[StreamingStdOutCallbackHandler()],
                             huggingfacehub_api_token=HF_API_TOKEN)

## Load the Content
You'll see commented-out code that was used to load a file with the declaration of independence. However, this has been replaced by a YouTube video transcript loader.

In [None]:
# from langchain_community.document_loaders import TextLoader
# SOURCE = "./declaration-of-independence.txt"
# loader = TextLoader(SOURCE)

from langchain_community.document_loaders import YoutubeLoader
SOURCE="https://youtu.be/2XlYSmIlpfs?si=qZSiBtDroP1vvvJX"
loader = YoutubeLoader.from_youtube_url(
    SOURCE, add_video_info=False
)
docs = loader.load()
char_count = len(docs[0].page_content)
print(f"Char count in {SOURCE}: {char_count}")

## Split the Content into Chunks
We want the content to be split into chunks that are small enough to be processed by the LLM. That is, the context window needs to accomodate all chunks passed to it as relevant information, the query, the template/instructions, and the respose. So, we split the content into chunks of 100 tokens with a 10 token overlap. 

We also add metadata for each chunk that can be used for later processing.

In [None]:
from langchain_text_splitters import  RecursiveCharacterTextSplitter
from transformers import GPT2TokenizerFast

tokenizer: GPT2TokenizerFast = GPT2TokenizerFast.from_pretrained("gpt2")
cts = RecursiveCharacterTextSplitter(separators=["|\n\n", "\n", " ", ""])
splitter = cts.from_huggingface_tokenizer(tokenizer=tokenizer, chunk_size=100, chunk_overlap=10)
split_docs = splitter.split_documents(docs)
print(f"Split document count: {len(split_docs)}")

for index, doc in enumerate(split_docs):
    doc.metadata.update({"chunk_id": index})
    doc.metadata.update({"token_count": len(tokenizer.tokenize(doc.page_content))})

## Vectorize (Create Embeddings)
The content needs to be vectorized so that we can use Semantic Search to find relevant chunks. The chosen sentence transformer is part of the inference API offered by Hugging Face PRO; if you do not have an API key you can just delete the `model_name` parameter and use one that is freely available on Hugging Face. Alternatively, you can swap out the Hugging Face embeddings for the OpenAI embeddings.

In [None]:
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=HF_API_TOKEN, model_name="sentence-transformers/all-mpnet-base-v2"
)

doc_content = [doc.page_content for doc in split_docs]
vectorized_docs = embeddings.embed_documents(doc_content)
len(vectorized_docs[0])

## Generate a Query and Query Vector
We generate a query and vectorize it. This query will be used to find relevant chunks in the content.

In [None]:
query = "Make a detailed summary of the youtube video conversation."
query_vector = embeddings.embed_query(query)
len(query_vector)

## Create a Facebook AI Semantic Search (FAISS) database
We create a FAISS database with the embeddings of the content chunks. This will allow us to quickly find relevant chunks when we search for them using the query. The benefit of vectorized embeddings is that we can use cosine similarity to find the most similar chunks to the query. That is, we can perform a mathematical search which is much more powerful than a literal string search as we can encode meaning into the embeddings.

In [None]:
from langchain_community.vectorstores import FAISS

db = FAISS.from_documents(split_docs, embeddings)
found = db.similarity_search_by_vector(query_vector, k=30)
sorted_found = sorted(found, key=lambda x: x.metadata["chunk_id"])
sorted_found[:3]

## Execute the LLM
The LLM is executed with the query and the relevant chunks. The LLM will generate a response for each chunk. We can then use the response to generate a summary of the content. The template used to generate the response is a simple one that asks the LLM to generate a response to the query and you can easily modify it below.

In [None]:
from langchain.prompts import PromptTemplate
from langchain_core.runnables import Runnable

template="""
You are a cryptocurrency policy export. You will answer
the question below using the relevant information from the
similar chunks of documents provided.

Question: {question}

Relevant Information: {similar_chunks}

Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question", "similar_chunks"])

chain: Runnable = prompt | llm

chain.invoke({"question": query, "similar_chunks": [doc.page_content for doc in sorted_found]})