# Chatbot
In this tutorial, we'll be designing a chatbot with the capability to retain information from previous prompts and responses, enabling it to maintain context throughout the conversation. This ability sets it apart from LLMs, which typically process language in a more static manner.


---
## 1.&nbsp; Installations and Settings 🛠️
Let's download and install the necessary libraries and model.

In [None]:
!pip3 install -qqq langchain --progress-bar off
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install -qqq llama-cpp-python --progress-bar off

!huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GGUF mistral-7b-instruct-v0.1.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
Consider using `hf_transfer` for faster downloads. This solution comes with some limitations. See https://huggingface.co/docs/huggingface_hub/hf_transfer for more details.
downloading https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf to /root/.cache/huggingface/hub/tmp6zeylnrv
mistral-7b-instruct-v0.1.Q4_K_M.gguf: 100% 4.37G/4.37G [00:20<00:00, 209MB/s]
./mistral-7b-instruct-v0.1.Q4_K_M.gguf


---
## 2.&nbsp; Setting up your LLM 🧠

In [None]:
from langchain.llms import LlamaCpp

llm = LlamaCpp(model_path = "/content/mistral-7b-instruct-v0.1.Q4_K_M.gguf",
               max_tokens = 2000,
               temperature = 0.1,
               top_p = 1,
               n_gpu_layers = -1)

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /content/mistral-7b-instruct-v0.1.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 l

### 2.1.&nbsp; Test your LLM

In [None]:
answer_1 = llm.invoke("Write a poem about data science.")
print(answer_1)


llama_print_timings:        load time =     269.28 ms
llama_print_timings:      sample time =     110.52 ms /   183 runs   (    0.60 ms per token,  1655.88 tokens per second)
llama_print_timings: prompt eval time =     269.22 ms /     8 tokens (   33.65 ms per token,    29.72 tokens per second)
llama_print_timings:        eval time =    4441.99 ms /   182 runs   (   24.41 ms per token,    40.97 tokens per second)
llama_print_timings:       total time =    5597.32 ms /   190 tokens



Data Science is the future, it's where we're all headed,
With algorithms and models, predictions are guaranteed.
We take in information from all around,
And use it to create insights that astound.

From machine learning to deep learning too,
There's nothing that we can't do.
We analyze data with techniques so precise,
And make predictions with incredible accuracy.

Data Science is the key to unlocking the unknown,
It helps us make decisions that have never been shown.
With statistical analysis and visualization tools,
We can find patterns and make them known.

So if you want to be ahead of the game,
Learn Data Science, it's not a shame.
With its endless possibilities and potential for growth,
It's the future of technology, there's no doubt.


---
## 3.&nbsp; Making a chatbot 💬
To transform a basic LLM into a chatbot, we'll need to infuse it with additional functionalities: prompts, memory, and chains.

In [None]:
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.chains import LLMChain

### Prompt ###
prompt = ChatPromptTemplate(
    messages=[
        SystemMessagePromptTemplate.from_template(
            "Keep your answers very short, succinct, and to the point."
        ),
        # The `variable_name` here is what must align with memory
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("{question}"),
    ]
)

### Memory ###
# We `return_messages=True` to fit into the MessagesPlaceholder
# `"chat_history"` must align with the MessagesPlaceholder name
memory = ConversationBufferMemory(memory_key = "chat_history",
                                  return_messages=True)

### Chain ###
conversation = LLMChain(llm = llm,
                        prompt = prompt,
                        verbose = True,
                        memory = memory)

### 3.1.&nbsp; Test the chatbot

In [None]:
conversation({"question": "hi"})

  warn_deprecated(
Llama.generate: prefix-match hit




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Keep your answers very short, succinct, and to the point.
Human: hi[0m



llama_print_timings:        load time =     269.28 ms
llama_print_timings:      sample time =       8.79 ms /    13 runs   (    0.68 ms per token,  1478.45 tokens per second)
llama_print_timings: prompt eval time =     216.19 ms /    21 tokens (   10.29 ms per token,    97.13 tokens per second)
llama_print_timings:        eval time =     324.60 ms /    12 runs   (   27.05 ms per token,    36.97 tokens per second)
llama_print_timings:       total time =     644.82 ms /    33 tokens



[1m> Finished chain.[0m


{'question': 'hi',
 'chat_history': [HumanMessage(content='hi'),
  AIMessage(content='\nAI: Hello! How can I assist you today?')],
 'text': '\nAI: Hello! How can I assist you today?'}

In [None]:
conversation({"question": "Translate this sentence from English to French: I love programming."})

Llama.generate: prefix-match hit




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Keep your answers very short, succinct, and to the point.
Human: hi
AI: 
AI: Hello! How can I assist you today?
Human: Translate this sentence from English to French: I love programming.[0m



llama_print_timings:        load time =     269.28 ms
llama_print_timings:      sample time =       8.12 ms /    10 runs   (    0.81 ms per token,  1231.38 tokens per second)
llama_print_timings: prompt eval time =     294.33 ms /    30 tokens (    9.81 ms per token,   101.93 tokens per second)
llama_print_timings:        eval time =     218.00 ms /     9 runs   (   24.22 ms per token,    41.28 tokens per second)
llama_print_timings:       total time =     585.81 ms /    39 tokens



[1m> Finished chain.[0m


{'question': 'Translate this sentence from English to French: I love programming.',
 'chat_history': [HumanMessage(content='hi'),
  AIMessage(content='\nAI: Hello! How can I assist you today?'),
  HumanMessage(content='Translate this sentence from English to French: I love programming.'),
  AIMessage(content='\nAI: Je aime programmer.')],
 'text': '\nAI: Je aime programmer.'}

In [None]:
conversation({"question": "Now translate the sentence to German."})

Llama.generate: prefix-match hit




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Keep your answers very short, succinct, and to the point.
Human: hi
AI: 
AI: Hello! How can I assist you today?
Human: Translate this sentence from English to French: I love programming.
AI: 
AI: Je aime programmer.
Human: Now translate the sentence to German.[0m



llama_print_timings:        load time =     269.28 ms
llama_print_timings:      sample time =       9.13 ms /    15 runs   (    0.61 ms per token,  1642.76 tokens per second)
llama_print_timings: prompt eval time =     206.91 ms /    21 tokens (    9.85 ms per token,   101.49 tokens per second)
llama_print_timings:        eval time =     378.42 ms /    14 runs   (   27.03 ms per token,    37.00 tokens per second)
llama_print_timings:       total time =     666.37 ms /    35 tokens



[1m> Finished chain.[0m


{'question': 'Now translate the sentence to German.',
 'chat_history': [HumanMessage(content='hi'),
  AIMessage(content='\nAI: Hello! How can I assist you today?'),
  HumanMessage(content='Translate this sentence from English to French: I love programming.'),
  AIMessage(content='\nAI: Je aime programmer.'),
  HumanMessage(content='Now translate the sentence to German.'),
  AIMessage(content='\nAI: \nAI: Ich liebe Programmieren.')],
 'text': '\nAI: \nAI: Ich liebe Programmieren.'}

In [None]:
conversation({"question": "Which contains more characters, the French translation or the German?"})

Llama.generate: prefix-match hit




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Keep your answers very short, succinct, and to the point.
Human: hi
AI: 
AI: Hello! How can I assist you today?
Human: Translate this sentence from English to French: I love programming.
AI: 
AI: Je aime programmer.
Human: Now translate the sentence to German.
AI: 
AI: 
AI: Ich liebe Programmieren.
Human: Which contains more characters, the French translation or the German?[0m



llama_print_timings:        load time =     269.28 ms
llama_print_timings:      sample time =       7.68 ms /    15 runs   (    0.51 ms per token,  1952.11 tokens per second)
llama_print_timings: prompt eval time =     283.79 ms /    27 tokens (   10.51 ms per token,    95.14 tokens per second)
llama_print_timings:        eval time =     352.87 ms /    14 runs   (   25.20 ms per token,    39.67 tokens per second)
llama_print_timings:       total time =     688.69 ms /    41 tokens



[1m> Finished chain.[0m


{'question': 'Which contains more characters, the French translation or the German?',
 'chat_history': [HumanMessage(content='hi'),
  AIMessage(content='\nAI: Hello! How can I assist you today?'),
  HumanMessage(content='Translate this sentence from English to French: I love programming.'),
  AIMessage(content='\nAI: Je aime programmer.'),
  HumanMessage(content='Now translate the sentence to German.'),
  AIMessage(content='\nAI: \nAI: Ich liebe Programmieren.'),
  HumanMessage(content='Which contains more characters, the French translation or the German?'),
  AIMessage(content='\nAI: \nAI: The German translation contains more characters.')],
 'text': '\nAI: \nAI: The German translation contains more characters.'}