In [1]:
%%capture
!pip install -r requirements.txt

In [2]:
import os
from typing import Optional, Tuple

## Loading Embedding Model

In [3]:
from langchain.embeddings import HuggingFaceEmbeddings

model_name = "intfloat/e5-base-v2"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': True}
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

## Data Ingestion

In [7]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.faiss import FAISS
from langchain.docstore.document import Document
import pickle

In [8]:
input_txt = """"NDIANAPOLIS, Aug. 14, 2023 /PRNewswire/ -- Eli Lilly and Company (NYSE: LLY) today announced the successful completion of its acquisition of Sigilon Therapeutics, Inc. (NASDAQ: SGTX). The acquisition allows Lilly to continue researching and developing encapsulated cell therapies, including SIG-002, for the treatment of type 1 diabetes.

"Make life better – that's the phrase that guides everything we do at Lilly," said Ruth Gimeno, Ph.D., group vice president, diabetes, obesity and cardiometabolic research at Lilly. "We are excited to welcome our new colleagues from Sigilon to Lilly; together, we will strive to provide solutions for people living with type 1 diabetes that absolves them of constant disease management, and advance Sigilon's technology for patients."

The Offer and the Merger

As previously announced, Lilly and Sigilon entered into a Merger Agreement dated as of June 28, 2023, and pursuant thereto, on July 13, 2023, Lilly and a wholly owned subsidiary ("Purchaser") commenced a tender offer (the "Offer") to purchase all of the issued and outstanding shares ("Shares") of Sigilon's common stock in exchange for (a) $14.92 per Share, net to the stockholder in cash, without interest (the "Cash Consideration") and less any applicable tax withholding, plus (b) one non-tradable contingent value right ("CVR" and, together with the Cash Consideration, the "Offer Price") per Share, which represents the contractual right to receive contingent payments of up to $111.64 per Share in cash, net to the stockholder in cash, without interest and less any applicable tax withholding, upon the achievement of certain specified milestones. There can be no assurance that any payments will be made with respect to the CVRs. The Offer expired as scheduled on Aug. 9, 2023, with 1,718,493 Shares validly tendered and not validly withdrawn, which together with Shares previously owned by Lilly, represented 76.61% of the issued and outstanding Shares. In accordance with the terms of the Offer, Purchaser accepted for payment all such validly tendered and not validly withdrawn Shares.

Following consummation of the Offer, on Aug. 11, 2023, Lilly completed its acquisition of Sigilon through the merger of Purchaser with and into Sigilon in accordance with Section 251(h) of the General Corporation Law of the State of Delaware), with Sigilon surviving such merger as a wholly owned subsidiary of Lilly. In connection with the merger, each Share issued and outstanding immediately prior to the effective time of the merger (other than (i) Shares held in Sigilon's treasury or owned by Sigilon, or owned by Lilly, Purchaser or any direct or indirect wholly-owned subsidiary of Lilly or Purchaser or (ii) Shares held by any stockholder of Sigilon who was entitled to demand and properly demanded appraisal for such Shares in accordance with Section 262 of the DGCL), including each Share that was subject to vesting or forfeiture restrictions granted pursuant to a Sigilon equity incentive plan, program or arrangement, was canceled and converted into the right to receive the Offer Price, without interest, less any applicable tax withholding. Sigilon's common stock has been delisted from the NASDAQ Global Select Market and will be deregistered under the Securities Exchange Act of 1934, as amended.

For Lilly, Morgan, Lewis & Bockius LLP is acting as legal counsel. For Sigilon, Lazard is acting as lead financial advisor and Ropes & Gray LLP is acting as legal counsel. Canaccord Genuity also acted as financial advisor to Sigilon.

About Lilly 

Lilly unites caring with discovery to create medicines that make life better for people around the world. We've been pioneering life-changing discoveries for nearly 150 years, and today our medicines help more than 51 million people across the globe. Harnessing the power of biotechnology, chemistry and genetic medicine, our scientists are urgently advancing new discoveries to solve some of the world's most significant health challenges, redefining diabetes care, treating obesity and curtailing its most devastating long-term effects, advancing the fight against Alzheimer's disease, providing solutions to some of the most debilitating immune system disorders, and transforming the most difficult-to-treat cancers into manageable diseases. With each step toward a healthier world, we're motivated by one thing: making life better for millions more people. That includes delivering innovative clinical trials that reflect the diversity of our world and working to ensure our medicines are accessible and affordable. To learn more, visit Lilly.com and Lilly.com/newsroom or follow us on Facebook, Instagram, Twitter and LinkedIn. C-LLY

Cautionary Statement Regarding Forward-Looking Statements

This press release contains forward-looking statements regarding Lilly's acquisition of Sigilon. All statements other than statements of historical fact are statements that could be deemed forward-looking statements. Forward-looking statements reflect current beliefs and expectations; however, these statements involve inherent risks and uncertainties, including with respect to drug research, development and commercialization, Lilly's evaluation of the accounting treatment of the acquisition and its impact on its financial results and financial guidance, the effects of the acquisition on Sigilon's relationships with key third parties or governmental entities, transaction costs, risks that the acquisition disrupts current plans and operations or adversely affects employee retention, and any legal proceedings that may be instituted related to the acquisition. Actual results could differ materially due to various factors, risks and uncertainties. Among other things, there can be no guarantee that Lilly will realize the expected benefits of the acquisition, that product candidates will be approved on anticipated timelines or at all, that any products, if approved, will be commercially successful, that all or any of the contingent consideration will become payable on the terms described herein or at all, that Lilly's financial results will be consistent with its expected 2023 guidance or that Lilly can reliably predict the impact of the acquisition on its financial results or financial guidance. For further discussion of these and other risks and uncertainties, see Lilly's most recent Form 10-K and Form 10-Q filings with the United States Securities and Exchange Commission. Except as required by law, Lilly does not undertake any duty to update forward-looking statements to reflect events after the date of this press release.

"""

In [9]:
print("Splitting text...")
text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=600,
    chunk_overlap=100,
    length_function=len,
)


texts = text_splitter.split_text(input_txt)
docs = [Document(page_content=t) for t in texts]

Created a chunk of size 1296, which is longer than the specified 600
Created a chunk of size 1214, which is longer than the specified 600
Created a chunk of size 1138, which is longer than the specified 600


Splitting text...


### Embedding and Storing Documents in Vector Store

In [11]:
print("Creating vectorstore...")
vectorstore = FAISS.from_documents(docs, embeddings)
with open("vectorstore.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

Creating vectorstore...


## Answer Generation Pipeline

In [12]:
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.prompts.prompt import PromptTemplate
from langchain.vectorstores.base import VectorStoreRetriever

from langchain.memory import ConversationBufferMemory
import pickle

import torch
from transformers import pipeline
from langchain.llms import HuggingFacePipeline

In [13]:
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

template = """You are an AI assistant for answering questions about the provided text.
You are given the following extracted parts of a long document and a question. Provide a conversational answer.
If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not related to the text, politely inform them that you do not have the answer.
Question: {question}
=========
{context}
=========
Answer in Markdown:"""
QA_PROMPT = PromptTemplate(template=template, input_variables=[
                           "question", "context"])

In [14]:
def load_retriever():
    with open("vectorstore.pkl", "rb") as f:
        vectorstore = pickle.load(f)
    retriever = VectorStoreRetriever(vectorstore=vectorstore)
    return retriever

In [15]:
def get_basic_qa_chain(llm):
    retriever = load_retriever()
    memory = ConversationBufferMemory(
        memory_key="chat_history", return_messages=True)
    model = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory)
    return model


def get_custom_prompt_qa_chain(llm):
    retriever = load_retriever()
    memory = ConversationBufferMemory(
        memory_key="chat_history", return_messages=True)
    # see: https://github.com/langchain-ai/langchain/issues/6635
    # see: https://github.com/langchain-ai/langchain/issues/1497
    model = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        combine_docs_chain_kwargs={"prompt": QA_PROMPT})
    return model


def get_condense_prompt_qa_chain(llm):
    retriever = load_retriever()
    memory = ConversationBufferMemory(
        memory_key="chat_history", return_messages=True)
    # see: https://github.com/langchain-ai/langchain/issues/5890
    model = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        condense_question_prompt=CONDENSE_QUESTION_PROMPT,
        combine_docs_chain_kwargs={"prompt": QA_PROMPT})
    return model


def get_qa_with_sources_chain(llm):
    retriever = load_retriever()
    history = []
    model = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        return_source_documents=True)

    def model_func(question):
        # bug: this doesn't work with the built-in memory
        # hacking around it for the tutorial
        # see: https://github.com/langchain-ai/langchain/issues/5630
        new_input = {"question": question['question'], "chat_history": history}
        result = model(new_input)
        history.append((question['question'], result['answer']))
        return result

    return model_func


chain_options = {
    "basic": get_basic_qa_chain,
    "with_sources": get_qa_with_sources_chain,
    "custom_prompt": get_custom_prompt_qa_chain,
    "condense_prompt": get_condense_prompt_qa_chain
}

In [26]:
generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16,
                         trust_remote_code=True, return_full_text=True, device_map="auto")
hf_pipeline = HuggingFacePipeline(pipeline=generate_text)

chain = get_basic_qa_chain(llm=hf_pipeline)

In [27]:
response = chain({"question": "Which Company did Eli Lilly acquired?"})
print("Question ::  {}".format(response['question']))
print("Answer ::  {}".format(response['answer'].strip()))
print("History ::  {}".format(response['chat_history']))

Question ::  Which Company did Eli Lilly acquired?
Answer ::  Eli Lilly acquired Sigilon, a biotechnology company that develops cell therapies for diabetes.
History ::  [HumanMessage(content='Which Company did Eli Lilly acquired?', additional_kwargs={}, example=False), AIMessage(content='\nEli Lilly acquired Sigilon, a biotechnology company that develops cell therapies for diabetes.', additional_kwargs={}, example=False)]


In [28]:
response = chain({"question": "How much did they pay?"})
print("Question ::  {}".format(response['question']))
print("Answer ::  {}".format(response['answer'].strip()))
print("History ::  {}".format(response['chat_history']))

Question ::  How much did they pay?
Answer ::  The Offer was $14.92 per Share, net to the stockholder in cash, without interest, less any applicable tax withholding. Lilly paid 1,718,493 Shares for 76.61% of the issued and outstanding Shares.
History ::  [HumanMessage(content='Which Company did Eli Lilly acquired?', additional_kwargs={}, example=False), AIMessage(content='\nEli Lilly acquired Sigilon, a biotechnology company that develops cell therapies for diabetes.', additional_kwargs={}, example=False), HumanMessage(content='How much did they pay?', additional_kwargs={}, example=False), AIMessage(content='\nThe Offer was $14.92 per Share, net to the stockholder in cash, without interest, less any applicable tax withholding. Lilly paid 1,718,493 Shares for 76.61% of the issued and outstanding Shares.', additional_kwargs={}, example=False)]


In [29]:
response = chain({"question": "Is Sigilon publicly listed in a stock market?"})
print("Question ::  {}".format(response['question']))
print("Answer ::  {}".format(response['answer'].strip()))
print("History ::  {}".format(response['chat_history']))

Question ::  Is Sigilon publicly listed in a stock market?
Answer ::  Sigilon is an investor friendly company. Investors can log on to its website and conduct an independent securities research.
History ::  [HumanMessage(content='Which Company did Eli Lilly acquired?', additional_kwargs={}, example=False), AIMessage(content='\nEli Lilly acquired Sigilon, a biotechnology company that develops cell therapies for diabetes.', additional_kwargs={}, example=False), HumanMessage(content='How much did they pay?', additional_kwargs={}, example=False), AIMessage(content='\nThe Offer was $14.92 per Share, net to the stockholder in cash, without interest, less any applicable tax withholding. Lilly paid 1,718,493 Shares for 76.61% of the issued and outstanding Shares.', additional_kwargs={}, example=False), HumanMessage(content='Is Sigilon publicly listed in a stock market?', additional_kwargs={}, example=False), AIMessage(content='\nSigilon is an investor friendly company. Investors can log on to 