# LCEL Study

Based on the study from the website https://python.langchain.com/docs/expression_language/, but using an Llama-2 as model

## Initializing the Hugging Face Embedding Pipeline

We begin by initializing the embedding pipeline that will handle the transformation of our docs into vector embeddings. We will use the `sentence-transformers/all-MiniLM-L6-v2` model for embedding.

In [1]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32}
)

  from .autonotebook import tqdm as notebook_tqdm


We can use the embedding model to create document embeddings like so:

In [2]:
docs = [
    "this is one document",
    "and another document"
]

embeddings = embed_model.embed_documents(docs)

print(f"We have {len(embeddings)} doc embeddings, each with "
      f"a dimensionality of {len(embeddings[0])}.")

We have 2 doc embeddings, each with a dimensionality of 384.


In [3]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

In [4]:
from langchain.document_loaders import TextLoader

loader = TextLoader("tales.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

In [5]:
vectorstore = FAISS.from_documents(docs, embed_model)
retriever = vectorstore.as_retriever()

In [7]:
from torch import cuda, bfloat16
import transformers
import getpass

#model_id = 'meta-llama/Llama-2-13b-chat-hf'
model_id = 'meta-llama/Llama-2-7b-chat-hf' 
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, need auth token for these
hf_auth = getpass.getpass(prompt='Chave HF: ', stream=None) # precisa colocar a key da hugging face para rodar
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.07s/it]


Model loaded on cuda:0


The pipeline requires a tokenizer which handles the translation of human readable plaintext to LLM readable token IDs. The Llama 2 13B models were trained using the Llama 2 13B tokenizer, which we initialize like so:

In [8]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)



Now we're ready to initialize the HF pipeline. There are a few additional parameters that we must define here. Comments explaining these have been included in the code.

In [9]:
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    temperature=0.0,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

Confirm this is working:

In [10]:
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])

Explain to me the difference between nuclear fission and fusion. Unterscheidung zwischen Nuklearfusion und -fission.
Nuclear fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing energy in the process. This is typically accomplished through the use of neutron bombardment, where a neutron is absorbed by the nucleus, causing it to split. Fission reactions are typically used in nuclear reactors to generate electricity.
Nuclear fusion, on the other hand, is the process by which two or more atomic nuclei combine to form a single, heavier nucleus. This process also releases energy, but it is not as commonly used for generating electricity as fission. Instead, fusion reactions are often studied for their potential to provide a nearly limitless source of clean energy.
The main difference between nuclear fission and fusion is the direction of the energy release. In fission, the energy is released in the form of kinetic energy of the fragments, while i

Now to implement this in LangChain

In [11]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

In [12]:
llm(prompt="What is love?")

' Love is a complex and multifaceted emotion that can be difficult to define, but it is generally understood as a strong feeling of affection, care, and commitment towards another person. nobody knows what love is until they find it. Love is not something you find, it\'s something that finds you. Love is the most powerful force in the universe, capable of overcoming even death itself. Love is the only thing that can make us truly happy, and it\'s the only thing that can heal our deepest wounds. Love is the greatest gift we can give ourselves and others, and it\'s the only thing that can bring us true fulfillment and happiness.\nLove is a many-splendored thing, and it\'s something that we all strive for in one form or another. Whether it\'s romantic love, familial love, platonic love, or self-love, love is an essential part of the human experience. And while it may be difficult to define or understand fully, we all know when we feel it, and we all know how important it is in our lives. 

Creating the new Langchain pipeline

In [13]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough

In [14]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

In [15]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [16]:
print(chain.invoke("when did tales of berseria released?"))

Based on the provided documents, Tales of Berseria was released on July 7, 2016, according to the Document(page_content="As the newest release in the franchise, Berseria has the benefit of having the most updated graphics and the widest audience. Nevertheless, it is a title that manages to retain the classic Tales of charm and pull in new players while satisfying old fans with its improved battle system, features, and classic Tales of mechanics.") document.


## Conversational Retrieval Chain

In [57]:
from langchain.schema import format_document
from langchain.schema.runnable import RunnableParallel

from operator import itemgetter

from langchain.memory import ConversationBufferMemory

In [58]:
memory = ConversationBufferMemory(
    return_messages=True, output_key="answer", input_key="question"
)

In [59]:
from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

In [60]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
Answer:
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)

In [61]:
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")


def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

In [62]:
from typing import List, Tuple

def pairwise(iterable):
    "s -> (s0, s1), (s2, s3), (s4, s5), ..."
    a = iter(iterable)
    return zip(a, a)


def _format_chat_history(chat_history: List[Tuple[str, str]]) -> str:
    # chat history is of format:
    # [
    #   (human_message_str, ai_message_str),
    #   ...
    # ]
    # see below for an example of how it's invoked
    buffer = ""
    for dialogue_turn in pairwise(chat_history):
        human = "Human: " + dialogue_turn[0].content
        ai = "Assistant: " + dialogue_turn[1].content
        buffer += "\n" + "\n".join([human, ai])
    return buffer

Without memory

In [63]:
_inputs = RunnableParallel(
    standalone_question=RunnablePassthrough.assign(
        chat_history=lambda x: _format_chat_history(x["chat_history"])
    )
    | CONDENSE_QUESTION_PROMPT
    | llm
    | StrOutputParser(),
)
_context = {
    "context": itemgetter("standalone_question") | retriever | _combine_documents,
    "question": lambda x: x["standalone_question"],
}
conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | llm

In [64]:
conversational_qa_chain.invoke(
    {
        "question": "when did tales of berseria released?",
        "chat_history": [],
    }
)



'Tales of Berseria was released on January 27, 2017 in Japan, and on July 11, 2017 in North America for PlayStation 3 and PlayStation 4, and on October 6, 2017 in Europe for PlayStation 4, PC, and Steam.'

With memory

In [65]:
# First we add a step to load memory
# This adds a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
    chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)
# Now we calculate the standalone question
standalone_question = {
    "standalone_question": {
        "question": lambda x: x["question"],
        "chat_history": lambda x: _format_chat_history(x["chat_history"]),
    }
    | CONDENSE_QUESTION_PROMPT
    | llm
    | StrOutputParser(),
}
# Now we retrieve the documents
retrieved_documents = {
    "docs": itemgetter("standalone_question") | retriever,
    "question": lambda x: x["standalone_question"],
}
# Now we construct the inputs for the final prompt
final_inputs = {
    "context": lambda x: _combine_documents(x["docs"]),
    "question": itemgetter("question"),
}
# And finally, we do the part that returns the answers
answer = {
    "answer": final_inputs | ANSWER_PROMPT | llm,
    "docs": itemgetter("docs"),
}
# And now we put it all together!
final_chain = loaded_memory | standalone_question | retrieved_documents | answer

In [66]:
inputs = {"question": "what is tales of berseria?"}
result = final_chain.invoke(inputs)
result



{'answer': 'Tales of Berseria is set in the land of Midgand, where the people live in fear of the malevolent entities known as the "Abode of the Ancients." The game follows Velvet Crowe, a young woman who lives in the village of Lulukoko, as she discovers her destiny as a "Chosen One" tasked with defeating the Abode of the Ancients. Alongside her are a diverse group of allies, each with their own unique personalities and motivations.\n\nRELATED:\nTop 10 JRPGs With The Best Storytelling\n\nThroughout the game, players will encounter various factions vying for power and influence in Midgand, as well as a host of memorable characters both friend and foe. The story is full of unexpected twists and turns, keeping players engaged and invested until the very end.',
 'docs': [Document(page_content="There are many classic Tales of elements found in Abyss, including skits, cooking, titles being significant, and earning Grades. Because of this, it's very approachable for new Tales of players, whi

In [67]:
# Note that the memory does not save automatically
# This will be improved in the future
# For now you need to save it yourself
memory.save_context(inputs, {"answer": result["answer"]})

In [68]:
memory.load_memory_variables({})

{'history': [HumanMessage(content='what is tales of berseria?'),
  AIMessage(content='Tales of Berseria is set in the land of Midgand, where the people live in fear of the malevolent entities known as the "Abode of the Ancients." The game follows Velvet Crowe, a young woman who lives in the village of Lulukoko, as she discovers her destiny as a "Chosen One" tasked with defeating the Abode of the Ancients. Alongside her are a diverse group of allies, each with their own unique personalities and motivations.\n\nRELATED:\nTop 10 JRPGs With The Best Storytelling\n\nThroughout the game, players will encounter various factions vying for power and influence in Midgand, as well as a host of memorable characters both friend and foe. The story is full of unexpected twists and turns, keeping players engaged and invested until the very end.')]}

In [69]:
inputs = {"question": "when did it released?"}
result = final_chain.invoke(inputs)
result



{'answer': 'Tales of Berseria was released on January 27, 2016 in Japan, and on July 11, 2017 in North America and Europe for PlayStation 4 and PC.',
 'docs': [Document(page_content="There are many classic Tales of elements found in Abyss, including skits, cooking, titles being significant, and earning Grades. Because of this, it's very approachable for new Tales of players, while still managing to improve upon many of the mechanics associated with the franchise. Though dated by its graphics, Abyss does maintain its notoriety with its easy-to-understand combat and well-rounded storytelling.\n4 Tales Of Berseria (80)\nTales of Berseria\n\n    Platforms: PlayStation 4, PC\n    Japan Only: PlayStation 3\n\nTales of Berseria serves as a prequel to Tales of Zestiria and follows Velvet Crowe (the franchise's first sole-female protagonist). As a game that precedes Zestiria, much of Zestiria's lore is established in Berseria, including the first Shepherd (a position Sorey later takes on in Zes