# Naas - NLP Examples
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Naas/Naas_NLP_Example.ipynb" target="_parent"><img src="https://img.shields.io/badge/-Open%20in%20Naas-success?labelColor=000000&logo="/></a>

'#nlp #huggingface #api #models #transformers

In [1]:
from naas_drivers import nlp

## How it works?
Naas NLP formulas follow this format.
```
nlp.get(task, model, tokenizer)(inputs)
```
The supported tasks are the following:

- text-generation (model: GPT2)
- summarization (model: t5-small)
- fill-mask (model: distilroberta-base)
- text-classification (model: distilbert-base-uncased-finetuned-sst-2-english)
- feature-extraction (model: distilbert-base-cased)
- token-classification (model: dslim/bert-base-NER)
- question-answering
- translation

We use [Hugging Face API](https://huggingface.co/models) under the hood to access the models.

## Text Generation

In [2]:
nlp.get("text-generation", model="gpt2", tokenizer="gpt2")("What is the most important thing in your life right now?")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'What is the most important thing in your life right now?\n\nThe most important thing in my life right now is my body; my mind. How does my body relate to this moment on the physical planet? Because it says what it is.'}]

## Text Summarization
Summarize the text given, maximum lenght (number of tokens/words) is set to 200.

In [11]:
nlp.get("summarization", model="t5-small", tokenizer="t5-small")('''

There will be fewer and fewer jobs that a robot cannot do better. 
What to do about mass unemployment this is gonna be a massive social challenge and 
I think ultimately we will have to have some kind of universal basic income.

I think some kind of a universal basic income is going to be necessary 
now the output of goods and services will be extremely high 
so with automation they will they will come abundance there will be or almost everything will get very cheap.

The harder challenge much harder challenge is how do people then have meaning like a lot of people 
they find meaning from their employment so if you don't have if you're not needed if 
there's not a need for your labor how do you what's the meaning if you have meaning 
if you feel useless these are much that's a much harder problem to deal with. 

''')

Your max_length is set to 200, but you input_length is only 183. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)


[{'summary_text': 'there will be fewer and fewer jobs that a robot cannot do better . what to do about mass unemployment this is gonna be a massive social challenge . we will have to have some kind of universal basic income .'}]

## Text Classification
Basic sentiment analysis on a text.<br>
Returns a "label" (negative/neutral/positive), and score between -1 and 1.

In [15]:
nlp.get("text-classification", 
        model="distilbert-base-uncased-finetuned-sst-2-english",
        tokenizer="distilbert-base-uncased-finetuned-sst-2-english")('''

It was a weird concept. Why would I really need to generate a random paragraph? 
Could I actually learn something from doing so? 
All these questions were running through her head as she pressed the generate button. 
To her surprise, she found what she least expected to see.

''')

[{'label': 'POSITIVE', 'score': 0.7975085377693176}]

## Fill Mask

Fill the blanks ('< mask >') in a sentence given with multiple proposals. <br>
Each proposal has a score (confidence of accuracy), token value (proposed word in number), token_str (proposed word)

In [19]:
nlp.get("fill-mask",
        model="distilroberta-base",
        tokenizer="distilroberta-base")('''

It was a beautiful <mask>.

''')

[{'sequence': '\n\nIt was a beautiful sunset.\n\n',
  'score': 0.09137986600399017,
  'token': 18820,
  'token_str': ' sunset'},
 {'sequence': '\n\nIt was a beautiful day.\n\n',
  'score': 0.07021963596343994,
  'token': 183,
  'token_str': ' day'},
 {'sequence': '\n\nIt was a beautiful sight.\n\n',
  'score': 0.062469232827425,
  'token': 6112,
  'token_str': ' sight'},
 {'sequence': '\n\nIt was a beautiful night.\n\n',
  'score': 0.05541374906897545,
  'token': 363,
  'token_str': ' night'},
 {'sequence': '\n\nIt was a beautiful evening.\n\n',
  'score': 0.051386620849370956,
  'token': 1559,
  'token_str': ' evening'}]

## Feature extraction
This generate a words embedding (extract numbers out of the text data).<br>
Output is a list of numerical values.

In [None]:
nlp.get("feature-extraction", model="distilbert-base-cased", tokenizer="distilbert-base-cased")("Life is a super cool thing")

## Token classification
Basically NER. If you give names, location, or any "entity" it can detect it.<br>

| Entity abreviation | Description                                                                  |
|--------------|------------------------------------------------------------------------------|
| O            | Outside of a named entity                                                    |
| B-MIS        | Beginning of a miscellaneous entity right after another miscellaneous entity |
| I-MIS        | Miscellaneous entity                                                         |
| B-PER        | Beginning of a person’s name right after another person’s name               |
| I-PER        | Person’s name                                                                |
| B-ORG        | Beginning of an organization right after another organization                |
| I-ORG        | organization                                                                 |
| B-LOC        | Beginning of a location right after another location                         |
| I-LOC        | Location                                                                     |


Full documentation : https://huggingface.co/dslim/bert-base-NER.<br>

In [23]:
nlp.get("token-classification", model="dslim/bert-base-NER", tokenizer="dslim/bert-base-NER")('''

My name is Wolfgang and I live in Berlin

''')

[{'word': 'Wolfgang',
  'score': 0.9990139603614807,
  'entity': 'B-PER',
  'index': 4,
  'start': 13,
  'end': 21},
 {'word': 'Berlin',
  'score': 0.9996449947357178,
  'entity': 'B-LOC',
  'index': 9,
  'start': 36,
  'end': 42}]