# NATURAL LANGUAGE PROCESSING

Natural Language Processing (NLP) is the Artificial Intelligence technique that analyses human language through different tasks:

* **Sentiment Analysis**
* **Summarization**
* **Name Entity Recognision (NER)**
* Topic Modeling
* Text & Emojis Classification
* Sentence Tokenization
* Stop Words
* Stemmization
* Part of Speech Tagging (POS)
* Lemmatization 
* Text Generation
* **Question&Answering**

...AND MORE

Here in this notebook we are exploring some of those tasks.

<img src="https://images.pexels.com/photos/267669/pexels-photo-267669.jpeg?auto=compress&cs=tinysrgb&dpr=3&h=750&w=1260" style="width: 455px; height: 355px;">

# CHOOSE SOME EXAMPLES: POSITIVE VS NEGATIVE

### Positive: "Mucilon to support reforestation in Brazil"

In [1]:
title =  "Mucilon to support reforestation in Brazil"
content_positive = "'Mucilon: Taking Care of the Planet for Your Children.' This is the latest initiative of Nestlé's leading infant cereal brand in Brazil. As part of Nestlé's reforestation program announced earlier this year, Mucilon is offering families the opportunity to dedicate one of the one million trees Nestlé will plant in Brazil’s Atlantic Forest to a child. In 2020 alone, the company will plant an estimated 500,000 seedlings. \n To encourage those children to learn about and take responsibility for protecting the planet, they will each receive a confirmation of their personal tree's number.\n 'We want to act as a mobilizing force, a force for good that helps build a future which is better for everyone and, in particular, for our children', says Marcelo Melchior, CEO of Nestlé Brazil.\n The aim of the reforestation program is to contribute to the expansion of the forest's green area, which covers approximately 1,200 hectares of replanted forest – the equivalent of 1,110 soccer fields. On top of that, more than 100 tree species will be restored, including some that are at risk of extinction."
content_positive

"'Mucilon: Taking Care of the Planet for Your Children.' This is the latest initiative of Nestlé's leading infant cereal brand in Brazil. As part of Nestlé's reforestation program announced earlier this year, Mucilon is offering families the opportunity to dedicate one of the one million trees Nestlé will plant in Brazil’s Atlantic Forest to a child. In 2020 alone, the company will plant an estimated 500,000 seedlings. \n To encourage those children to learn about and take responsibility for protecting the planet, they will each receive a confirmation of their personal tree's number.\n 'We want to act as a mobilizing force, a force for good that helps build a future which is better for everyone and, in particular, for our children', says Marcelo Melchior, CEO of Nestlé Brazil.\n The aim of the reforestation program is to contribute to the expansion of the forest's green area, which covers approximately 1,200 hectares of replanted forest – the equivalent of 1,110 soccer fields. On top o

### Negative: "COVID-19 cases rising in 39 states as infections roll across nation: 'We are overwhelmed"

In [2]:
content_negative = "U.S. coronavirus cases surpassed 7.5 million on Wednesday with most states seeing a rise in cases and a startling nine of them setting ominous seven-day records for infections.\n A USA TODAY analysis of Johns Hopkins data through late Tuesday shows Alaska, Indiana, Kansas, Kentucky, Minnesota, Montana, North Dakota, Utah and Wyoming all set state records in the seven-day period. In all, 39 states reported more coronavirus cases in the last week than they had in the week before. \n More than 210,000 Americans have died, and Wisconsin and Hawaii reported record numbers of deaths in their states for a seven-day period. \n President Donald Trump's assertion that Americans need not worry because 'some really great drugs and knowledge' have been developed under his administration has prompted dismay for many in the medical community."

In [3]:
content_negative

"U.S. coronavirus cases surpassed 7.5 million on Wednesday with most states seeing a rise in cases and a startling nine of them setting ominous seven-day records for infections.\n A USA TODAY analysis of Johns Hopkins data through late Tuesday shows Alaska, Indiana, Kansas, Kentucky, Minnesota, Montana, North Dakota, Utah and Wyoming all set state records in the seven-day period. In all, 39 states reported more coronavirus cases in the last week than they had in the week before. \n More than 210,000 Americans have died, and Wisconsin and Hawaii reported record numbers of deaths in their states for a seven-day period. \n President Donald Trump's assertion that Americans need not worry because 'some really great drugs and knowledge' have been developed under his administration has prompted dismay for many in the medical community."

# NLP TASK 1: Sentiment Analysis

In [4]:
from transformers import pipeline

bert_sent_model = pipeline('sentiment-analysis')

In [5]:
ans = bert_sent_model(content_positive)


In [6]:
ans

[{'label': 'POSITIVE', 'score': 0.9928104281425476}]

In [7]:
ans = bert_sent_model(content_negative)


In [8]:
ans

[{'label': 'NEGATIVE', 'score': 0.9670604467391968}]

In [9]:
bert_sent_model("2 lines of code? WOW. This is magic! And I can write whatever you want")

[{'label': 'POSITIVE', 'score': 0.9994400143623352}]

# NLP TASK 2: SUMMARIZATION

In [10]:

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")

model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")



In [11]:
input_ids = tokenizer.encode(content_positive, return_tensors="pt", add_special_tokens=True)

generated_ids = model.generate(input_ids=input_ids,max_length=100)

preds_summary = [tokenizer.decode(g, skip_special_tokens=True, 
                          clean_up_tokenization_spaces=True)
                 for g in generated_ids]



In [12]:
preds_summary

["Mucilon is offering families the opportunity to dedicate one of the one million trees planted in Brazil's Atlantic Forest to a child as part of Nestlé's reforestation program announced earlier this year. The company plans to plant around 500,000 seedlings in 2020. The children will receive a confirmation of their personal tree's number. As part of the program, Nestlé will plant around 100 trees, making it the equivalent of 1,110"]

In [13]:
input_ids = tokenizer.encode(content_negative, return_tensors="pt", add_special_tokens=True)

generated_ids = model.generate(input_ids=input_ids,max_length=50)

preds_summary = [tokenizer.decode(g, skip_special_tokens=True, 
                          clean_up_tokenization_spaces=True)
                 for g in generated_ids]



In [14]:
preds_summary

['on Wednesday surpassed 7.5 million, with most states seeing a rise in cases and a startling nine of them setting ominous seven-day records for infections. The number of cases surpassed 7.5 million on Wednesday']

# NLP TASK 3: NAME ENTITY RECOGNISION

In [15]:
bert_ner = pipeline('ner',grouped_entities=True)

In [16]:
bert_ner(content_positive)

[{'entity_group': 'I-MISC', 'score': 0.8006271123886108, 'word': 'Mucilon'},
 {'entity_group': 'I-MISC', 'score': 0.488721638917923, 'word': 'Taking'},
 {'entity_group': 'I-MISC', 'score': 0.8084900975227356, 'word': 'Planet'},
 {'entity_group': 'I-MISC', 'score': 0.7611815333366394, 'word': 'Children'},
 {'entity_group': 'I-ORG', 'score': 0.9968330264091492, 'word': 'Nestlé'},
 {'entity_group': 'I-LOC', 'score': 0.9988975524902344, 'word': 'Brazil'},
 {'entity_group': 'I-ORG', 'score': 0.997764786084493, 'word': 'Nestlé'},
 {'entity_group': 'I-ORG', 'score': 0.9901615579922994, 'word': 'Mucilon'},
 {'entity_group': 'I-ORG', 'score': 0.9954317212104797, 'word': 'Nestlé'},
 {'entity_group': 'I-LOC', 'score': 0.9995310306549072, 'word': 'Brazil'},
 {'entity_group': 'I-LOC',
  'score': 0.9962860643863678,
  'word': 'Atlantic Forest'},
 {'entity_group': 'I-PER',
  'score': 0.9988093852996827,
  'word': 'Marcelo Melchior'},
 {'entity_group': 'I-ORG',
  'score': 0.9842033535242081,
  'word':

In [17]:
bert_ner(content_negative)

[{'entity_group': 'I-LOC', 'score': 0.9987306594848633, 'word': 'U'},
 {'entity_group': 'I-LOC', 'score': 0.6636754423379898, 'word': 'S .'},
 {'entity_group': 'I-LOC', 'score': 0.6849371194839478, 'word': 'USA'},
 {'entity_group': 'I-ORG', 'score': 0.524163007736206, 'word': 'TO'},
 {'entity_group': 'I-ORG',
  'score': 0.9582788050174713,
  'word': 'Johns Hopkins'},
 {'entity_group': 'I-LOC', 'score': 0.9992624521255493, 'word': 'Alaska'},
 {'entity_group': 'I-LOC', 'score': 0.9972076416015625, 'word': 'Indiana'},
 {'entity_group': 'I-LOC', 'score': 0.9989084005355835, 'word': 'Kansas'},
 {'entity_group': 'I-LOC', 'score': 0.998690128326416, 'word': 'Kentucky'},
 {'entity_group': 'I-LOC', 'score': 0.9986274242401123, 'word': 'Minnesota'},
 {'entity_group': 'I-LOC', 'score': 0.9975356459617615, 'word': 'Montana'},
 {'entity_group': 'I-LOC',
  'score': 0.9989486634731293,
  'word': 'North Dakota'},
 {'entity_group': 'I-LOC', 'score': 0.9991711974143982, 'word': 'Utah'},
 {'entity_group'

# NLP TASK 4: QUESTION&ANSWERS

In [17]:
bert_qa = pipeline('question-answering')

In [18]:
bert_qa(context = content_positive, question = 'What is Nestle doing?')

{'answer': 'offering families the opportunity to dedicate one of the one million trees',
 'end': 293,
 'score': 0.018586063757538795,
 'start': 219}

In [19]:
bert_qa(context = content_positive, question = 'Is Nestle good?')

{'answer': 'a force for good that helps build a future which is better for everyone',
 'end': 700,
 'score': 0.04284476116299629,
 'start': 629}

In [20]:
bert_qa(context = content_positive, question = 'How is Nestle making and impact?')

{'answer': "expansion of the forest's green area,",
 'end': 884,
 'score': 0.07284179329872131,
 'start': 847}

In [21]:
bert_qa(context = content_negative, question = "How is the Coronavirus Pandemic evolving in the US?")

{'answer': 'U.S. coronavirus cases surpassed 7.5 million',
 'end': 44,
 'score': 0.043662771582603455,
 'start': 0}

In [22]:
bert_qa(context = content_negative, question = "Is there a solution to the problem?")

{'answer': 'setting ominous seven-day records for infections.',
 'end': 177,
 'score': 0.011558104306459427,
 'start': 127}

# LET'S GET REAL! 

Some of the real life cases from NLP are:
* Match Clinical Trials
* Auto-complete sentences
* ChatBots
* Customer Tickets (AutoTagging, Text Generation ...)
* Speech Recognition (transcrive customer service calls into text)
* Spam filter
* Computer Assisted coding
* Translation
* Hiring and Recruitment recommendations
* Security Authentication 
* Intelligence gathering for forecasting (stock prediction, Nestlé sales...)

Nestle Use Cases:
* **HCP Sensing for new trends discovery**
By using NER to detect Biological entities in the text we are able to spot the latest trends in the biological environment
* **Keywords Extraction**
By preprocessing text using models for 5 different language step by step (tokenization, lemmatization, POS) we are able to find the most relevant words 
* **Infant Nutrition Sensing**
Topic modeling to detect the main topics for positive or negative reviews for Nestlé and Competitors products
* **PFME Chatbot**
Using an IN-BOX solution called LUIS we can train a question model