# Download Llama2 Model weights
First we need to load the Model weights of the Llama2 model. In order to access the models you might have to be registered at HuggingFace-Hub as requirement.
Therefore, we can install the GGUF model weights from the HuggingFace-Hub from the user <a href=https://huggingface.co/TheBloke/Llama-2-7B-GGUF>TheBloke Llama 2 7B</a> or <a href=https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF> TheBloke Llama2 7B Chat Version </a>. The following model weights are considered and can be changed throughout the PoC code:
- llama-2-7b-chat.Q4_K_M.gguf (MAX RAM required: 6.58 GB; SPACE required: 3.9 GB)
- llama-2-7b-chat.Q5_K_S.gguf (MAX RAM required: 7.15 GB; SPACE required: 4.5 GB)
- llama-2-7b-chat.Q5_K_M.gguf (MAX RAM required: 7.28 GB; SPACE required: 4.9 GB)

We are using llama-2-7b-chat.Q4_K_M.gguf for our use case.

## Alternative download:
- create folder and `cd` into it
- run: `wget https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q4_K_M.gguf` or `wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf?download` for the chat version

This might take a while.
Sidenote: 13B models would be nicer but the cheapest recommended model requires 11.47 GB, which is quite much given our hardware capabilities

# Get the requirements for Llama2 usage

Once the models are downloaded we can now use the llama.cpp library to make use of the downloaded model weights and use it together with LangChain. Therefore we have to first install the libraries by running:

```
pip install langchain llama-cpp-python
```

Now everything should be setup and you should be able to use the quantized Llama2 model + LangChain

Problem handling:
- Using Anaconda on Windows, I constantly run into the following error message:
```
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
```

Following the instructions from <a href="https://stackoverflow.com/questions/73969269/error-could-not-build-wheels-for-hnswlib-which-is-required-to-install-pyprojec">Varada</a> solved my issues.
1. Download and run the VSC builder
2. Under "Individual Components" check the corresponding boxes and run the program further
3. It hopefully should be fixed by now

# Install prerequisits

In [None]:
# !pip install langchain llama-cpp-python 

In [None]:
# run this cell in a suitable directory (example: ../llama_weights)
# !wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf?download

# Import and Initial Prompts

In [None]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate

In [None]:
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
# DEFAULT_SYSTEM_PROMPT = "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information"
DEFAULT_SYSTEM_PROMPT = "You are an english teacher for very formal language. Emphasize any mistake without correcting it."

SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT +E_SYS

In [None]:
def get_prompt(instruction):
    return B_INST + SYSTEM_PROMPT + instruction + E_INST

# chat_history = []

In [None]:
import os

os.listdir("/Users/josi/Llama2_weights")

In [None]:
llm = LlamaCpp(
    model_path = "/Users/josi/Llama2_weights/llama-2-7b-chat.Q5_K_M.gguf",
    temperature=0.75,
    max_tokens=2048,
    top_p=1,
    # callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

In [None]:
prompt

In [None]:
prompt = get_prompt("Correct the following sentence: Hau is the wether today?")

llm(prompt)

In [None]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

In [None]:
prompt_template = B_INST +SYSTEM_PROMPT + "{user_message}" + E_INST
prompt_template

In [None]:
llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_template))

In [None]:
llm_chain.run("Hai tere!")

# Test out LangChain's ConversationChain

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

In [None]:
conversation = ConversationChain(
    llm=llm, verbose=True, memory=ConversationBufferMemory()
)

In [None]:
conversation.predict(input="Hi there!")

# English Conversation Assistant

In [None]:
template = """
[INST] <<SYS>>
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
<</SYS>>

Current conversation:
{history}
Human: {input}
AI Assistant:[/INST]"""

PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
    prompt=PROMPT,
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory(ai_prefix="AI Assistant"),
)

In [None]:
a = conversation.predict(input="Hi there!")
print(a)

In [None]:
b = conversation.predict(input="How is the weather?")
print(b)

In [None]:
c = conversation.predict(input="Do you like spagghetti?")
print(c)

After the conversation the whole history of the conversation between the human and AI assistant can be stored with LangChain's `RedisChatMessageHistory`. Then the messages can be pulled from the DB in order to analyse the conversations from the user

In [None]:
# analyse_chat = ...

# English Grammar Helping Assistant

** Maybe using standard LLMChain is the better approach for this use case.

In [None]:
template = """
[INST] <<SYS>>
The following is a Grammar lesson between an AI and the human. The AI emphasizes mistakes made by the human without given the solution of the problem. If the AI does not know the answer to a question, it truthfully says it does not know.
<</SYS>>

Current conversation:
{history}
Human: {input}
AI Assistant:[/INST]"""

PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
    prompt=PROMPT,
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory(ai_prefix="AI Assistant"),
)

In [None]:
a = conversation.predict(input="Hai tere!")
print(a)

In [None]:
b = conversation.predict(input="Wat was the first mistake?")
print(b)

In [None]:
# ALternative without using the conversational framework, but just the single prompt itself
prompt_template = B_INST +SYSTEM_PROMPT + "{user_message}" + E_INST
llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_template))
llm_chain.run("Hai tere!")

# Local Translator from English to German

Using LLMChain is sufficient

In [None]:
SYSTEM_PROMPT = """
<<SYS>>
You are a translator that translates text from German to French. Only return the translated text, nothing else.
<</SYS>>
"""

In [None]:
prompt_template = B_INST +SYSTEM_PROMPT + "{user_message}" + E_INST
prompt_template

In [None]:
llm_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(prompt_template))
llm_chain.run("Translate the following text into french: Ich mag es Basketball zu spielen.")