<a href="https://colab.research.google.com/github/ric4234/AI-Fridays/blob/main/Analisi%20Di%20Testi/01_NLP.ipynb" target="_parent\"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Natural Language Processing (NLP)





Natural language processing (NLP) is an interdisciplinary subfield of computer science and linguistics and it is primarily concerned with giving computers the ability to support and manipulate human language. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches (https://en.wikipedia.org/wiki/Natural_language_processing). Transformers introduced a significant improvement in this field, starting from Attention Is All You Need paper in 2017 (https://arxiv.org/abs/1706.03762)

The goal of this exercise is to build a chatbot using the transformers library (in particular the pipeline function).
In order to do that, we will use blenderbot-400M-distill open source model, which was built by Meta https://huggingface.co/facebook/blenderbot-400M-distill.

We decided to use this model because it is very small (only 400M parameters) and performs quite well.



#### 1 - Install dependencies and create utils functions

Firstly, we make sure to install all the needed libraries

In [None]:
!pip install transformers

Suppress warning messages

In [2]:
from transformers.utils import logging
logging.set_verbosity_error()

#### 2 - Build and use a chatbot

At this point, we create a conversational pipeline pipeline using blenderbot-400M-distill model from facebook (https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2Ffacebook%2Fblenderbot-400M-distill)
You can find also a lot of other conversationals model from Huggingface hub filtering models by Text2Text Generation type (https://huggingface.co/models?pipeline_tag=text2text-generation&sort=trending)

In [4]:
from transformers import pipeline
chatbot = pipeline(task="conversational",
                   model="facebook/blenderbot-400M-distill")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.57k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/730M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/127k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/62.9k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/310k [00:00<?, ?B/s]

Now that the chatbot is loaded let's pass the user message

In [18]:
user_message = """
What are some fun activities I can do in the winter?
"""

To pass the user message in the chatbot firstly we need to put it in a Conversation object

In [29]:
from transformers import Conversation

In [30]:
conversation = Conversation(user_message)
conversation

Conversation id: 6a3e5f0c-8dc3-4c5f-97bc-776b24903e67
user: 
What are some fun activities I can do in the winter?


Then, we can pass the conversation to the chatbot

In [31]:
conversation = chatbot(conversation)

In [32]:
conversation

Conversation id: 6a3e5f0c-8dc3-4c5f-97bc-776b24903e67
user: 
What are some fun activities I can do in the winter?

assistant:  I like snowboarding and skiing.  What do you like to do in winter?

We can also try to continue the conversation:

In [25]:
user_message_1 = """
What else do you recommend?
"""
conversation_1 = Conversation(user_message_1)

conversation_1

Conversation id: 5aad45a5-47fb-4d94-bd88-d3191fbe3e76
user: 
What else do you recommend?


As you can see, the model does not have memory of any prior conversations.

In [26]:
conversation_1 = chatbot(conversation_1)

conversation_1

Conversation id: 5aad45a5-47fb-4d94-bd88-d3191fbe3e76
user: 
What else do you recommend?

assistant:  Well, there are a lot of things to choose from.  What do you have in mind?

To include prior conversations in the LLM's context, we can add a 'message' to include the previous chat history.

In [33]:
conversation.add_message(
    {"role": "user",
     "content": """
What else do you recommend?
"""
    })

conversation

Conversation id: 6a3e5f0c-8dc3-4c5f-97bc-776b24903e67
user: 
What are some fun activities I can do in the winter?

assistant:  I like snowboarding and skiing.  What do you like to do in winter?
user: 
What else do you recommend?


In [34]:
conversation = chatbot(conversation)

print(conversation)

Conversation id: 6a3e5f0c-8dc3-4c5f-97bc-776b24903e67
user: 
What are some fun activities I can do in the winter?

assistant:  I like snowboarding and skiing.  What do you like to do in winter?
user: 
What else do you recommend?

assistant:  Snowboarding is a lot of fun.  You can do it indoors or outdoors.

