## Stammzelltisch 9.4.2024 - Large language model tutorial

## Background
* There are a couple of different large language models available
    * OpenAI models
        * GPT-3.5
        * GPT-4.0
    * LLama2
        * 7, 13, 70 billion parameters
        * Here 7 billion parameters, 4-bits --> can be run on laptop with 8 GBs
    * Gemma
    * ...

### Initialize large language model
* Important: Ollama server must be running in the background - can be started in a terminal:
```
ollama serve
```

In [None]:
from langchain_community.llms import Ollama
llm = Ollama(model="llama2")

### Query large language model

In [None]:
print(llm.invoke("Please show me a recipe for a vegan Tiramisu!"))

### Modify the behavior by modifying the prompt

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You talk like an Australian guy."),
    ("user", "{input}")
])

chain = prompt | llm 
print(chain.invoke({"input": "What is life all about?"}))

### Test what the llm knows about stem cell biology

In [None]:
print(llm.invoke("Please explain the concept of stemness of stem cells to me!"))

In [None]:
print(llm.invoke("Please explain the concept of within-tissue plasticity to me!"))

## Use retrieval augmented generation

In [None]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://pubmed.ncbi.nlm.nih.gov/12160836/")
docs = loader.load()

In [None]:
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings()

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

In [None]:
from langchain_core.documents import Document
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

response = retrieval_chain.invoke({"input": "Please explain the concept of stemness to me!"})
print(response["answer"])


## Use RAG for getting information about a specific article

In [None]:
result = llm.invoke("What did Lutz Leichsenring tell the German broadcaster DW?")
print(result)


In [None]:
loader = WebBaseLoader("https://www.theguardian.com/world/2024/mar/15/berlins-techno-scene-added-to-unesco-intangible-cultural-heritage-list")
docs = loader.load()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

In [None]:
retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

response = retrieval_chain.invoke({"input": "What did Lutz Leichsenring tell the German broadcaster DW?"})
print(response["answer"])
