### Objectif

Créer un pipeline complet pour déployer un chatbot basé sur un modèle de langage (LLM) avec un suivi automatisé des performance.

### Instructions

1. Préparation des Données :

    - Utiliser un modèle de langage pré-entraîné tel que (Phi3, Mistral7B, Llama3, …) disponible via la bibliothèque transformers de Hugging Face.
    
2. Développement du Chatbot :

    - Implémenter un chatbot simple utilisant le modèle de langage choisi.
    - Utiliser streamlit pour créer une interface utilisateur interactive pour le chatbot.

3. Déploiement du Chatbot :

    - Déployer l'application streamlit gratuitement en utilisant la plateforme Streamlit Share.
    - Fournir un lien vers l'application déployée.

4. Surveillance du Chatbot :

    - Implémenter un mécanisme de surveillance pour suivre les interactions du chatbot. Par exemple, enregistrer les questions posées par les utilisateurs et les réponses fournies par le chatbot.
    - Utiliser SQLite (une base de données intégrée et gratuite) pour stocker les logs de performance.

### Livrables

- Un Jupyter Notebook contenant :
    - Le code source pour le développement et le déploiement du chatbot.
    - La documentation détaillée de chaque étape.
- Un lien vers l'application streamlit déployée : **https://peaks-chatbot-nelson.streamlit.app/**



**Le code reste du test est rédigé exclusivement en anglais.**

## Install & import the required libraries

It's important to provide specific versions in order to ensure compatibility over time. These libraries should be included in the requirements.txt file for deployment (see deployment section) 

In [None]:
%pip install transformers==4.41.2 torch==2.3.1 streamlit==1.35.0

In [53]:
import streamlit as st
import sqlite3
from transformers import pipeline, Conversation, AutoTokenizer
from uuid import uuid4
from datetime import datetime

## Prepare the data: download model from Hugging Face

Since model quality is not the objective, I chose a small model so as to limit download time and potentially inference time as well.

### Load the model

Caching the loading of the model allows the app to run more efficiently as it doesn't have to load the model for each request.

In this case, we chose a model capable of conversational tasks which allows it to have back and forth communication with the user.

In [54]:
@st.cache_resource # load the model only once
def load_model():
    print("Loading model...")
    return pipeline(task="conversational", model="facebook/blenderbot-400M-distill")

chatbot = load_model()

Loading model...


### Load the tokenizer

Caching the loading of the tokenizer allows the app to run more efficiently as it doesn't have to load the tokenizer for each request.

Note that we apply a template to the tokenizer - this is not required but it makes the chat template explicit.

In [55]:
@st.cache_resource # load the tokenizer only once
def load_tokenizer():
    print("Loading tokenizer...")
    tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
    chat = [
        {"role": "user", "content": "Hello, how are you?"},
        {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
        {"role": "user", "content": "I'd like to show off how chat templating works!"},
    ]
    tokenizer.apply_chat_template(chat, tokenize=False)
    return tokenizer

tokenizer = load_tokenizer()

Loading tokenizer...


## Set up SQLite database

### Connect to SQLite database (or create it if it doesn't exist)

In [56]:
conn = sqlite3.connect('chat_history.db')
c = conn.cursor()

### Create empty table to store chat history

A timestamp is included for additional information. It also allows for chronological sorting.

In [57]:
@st.cache_resource # create the table only once
def create_table():
    print("Creating SQLite table...")
    c.execute('''
        CREATE TABLE IF NOT EXISTS chat_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            role TEXT NOT NULL,
            content TEXT NOT NULL,
            timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
        )
    ''')
    conn.commit()

create_table()

Creating SQLite table...


### Functions: interact with the database

Read and write functions.

In [62]:
# Function to get chat history from the database
def get_chat_history():
    c.execute("SELECT role, content FROM chat_history ORDER BY timestamp ASC")
    return c.fetchall()

# Function to add a new message to the chat history
def add_to_chat_history(role, content):
    c.execute("INSERT INTO chat_history (role, content, timestamp) VALUES (?, ?, ?)", (role, content, datetime.now()))
    conn.commit()

## App functionality

### Basic display

The app (**Nelson's Simple Chatbot**) displays an text submission form that will return an answer from the language model. 

The chat history (prompts & responses) will be displayed above the form in chronological order.

In [63]:
st.title("Nelson's Simple Chatbot")

# Display chat history
chat_history = get_chat_history()
for role, content in chat_history:
    st.write(f"{role.capitalize()}: {content}")

# Input area for new messages
with st.form(key='chat_form', clear_on_submit=True):
    user_input = st.text_input("You: ", "")
    submit_button = st.form_submit_button("Submit")

### Processing user input
When the user submits a prompt, it gets added to chat history and a response in generated by the language model.

In order to preserve the language model's memory, we retrieve all chat history as input for the response.

The tokenizer is then used to check that the model's token limit (128) has not been exceeded, before generating the response and updating the display. If the token limit has been reached, the user is asked to start a new chat.

In [60]:
# Process user input
if submit_button and user_input:

    add_to_chat_history('user', user_input)

    full_chat = [{"role": role, "content": content} for role, content in get_chat_history()]

    # Process full chat history into single string to count tokens
    chat_input = " ".join([f"{message['role']}: {message['content']}" for message in full_chat])

    # Check if the token limit is exceeded
    if len(tokenizer.tokenize(chat_input)) < tokenizer.model_max_length:
        
        conversation = Conversation(messages=full_chat)
        conversation = chatbot(conversation)
        
        response = conversation[-1]['content'].strip()
        add_to_chat_history('assistant', response)

        # Refresh the app to display the new chat
        st.rerun()
    
    else:
        st.write("The token limit is exceeded. Please start a new chat.")

### Starting a new conversation
If at any point you wish to start a new conversation, you may click "Start New Chat".
This will delete chat history in order to start over.

Of course, in a real scenario, we would likely not delete the chat history but rather archive it for any future use. In this case, since we will not be using the data, we simply delete it.

The app display is then updated using `st.rerun()`.

In [61]:
if st.button("Start New Chat"):
    c.execute("DELETE FROM chat_history")
    conn.commit()
    st.rerun()

## Running the app

In order to run the app, you must convert this notebook into a `.py` file and run the following command: `streamlit run app.py` where app.py is the name of your file.

## Deploying the app

In order to deploy the app to Streamlit Share, you must upload the project to GitHub. Your project directory should look something like this:
```
your-repository/
├── your_app.py
└── requirements.txt
```

or like this (if you include any custom configurations):

```
your-repository/
├── .streamlit/
│   └── config.toml
├── your_app.py
└── requirements.txt
```

You can then deploy the app directly from the streamlit website based on your GitHub repository: https://share.streamlit.io/new