# Test Huggingface transformers

In [None]:
from transformers import pipeline, TRANSFORMERS_CACHE
import pandas as pd

## Downloading models and managing them

Every time `pipeline` is asked to produce a model, if it's not already present locally it is downloaded to a local directory called `TRANSFORMERS_CACHE`. Downloaded models can be listed and deleted with the Transformers CLI (`pip install 'huggingface-hub[cli]'` and then `huggingface-cli delete-cache`).

## Sentiment analysis

Standard task for the `text-classification` option.

In [None]:
# An abstraction around models and NLP tasks.
classifier = pipeline('text-classification')

In [None]:
text_unacceptable = """
Not sure who to send this to, but I live in New York and I keep seeing four turtle-shaped
individuals entering the sewers from a manhole that I can clearly
see from my back window... sometimes accompanied by a huge rat!
This is not acceptable!
"""

text_awesome = """
Not sure who to send this to, but I live in New York and I keep seeing four turtle-shaped
individuals entering the sewers from a manhole that I can clearly
see from my back window... sometimes accompanied by a huge rat!
This is awesome!
"""

for text in [text_unacceptable, text_awesome]:
    print('Text:')
    print(text)
    print('Model ouptut:', classifier(text))
    print('\n')


## Named entity recognition

In [None]:
# The aggregation_strategy sets how words are grouped together
# to form single entities.
ner_tagger = pipeline('ner', aggregation_strategy='simple')

In [None]:
ner_text = """
I would like to notify you that the individual
Mr. Krang repeatedly tried to kidnap Dr. Baxter
Stockman from his house in Newark in order to
steal the blueprints for his particle accelerator,
which he intended to mount on his Harley Davidson
motorcycle.
"""

pd.DataFrame(ner_tagger(ner_text))

## Question answering

In [None]:
reader = pipeline('question-answering')

In [None]:
question = 'What does Krang want from Dr. Stockman?'

reader(question=question, context=ner_text)

In [None]:
question = 'Why does Krang want a particle accelerator from Dr. Stockman?'

reader(question=question, context=ner_text)

## Summarization

In [None]:
summarizer = pipeline('summarization')

In [None]:
summ_text = """
This book is the fruit of several decades of reading, teaching, and thinking about the intersections between literature, philosophy, and physics. Obviously that trajectory encompassed so many more writers, thinkers, and scientists than these three. So when the time came to wrangle the project into a book, the question became: How to organize it? Who are the best characters through whom to tell this story, and how many should there be?

My first stabs at a structure were more expansive. I liked the idea of telling stories about specific human beings and distilling their insights out of those stories. But at first there were simply too many stories there — I believe I outlined a 12-chapter book with a different central character for each chapter — and the book felt scattered, even if the core intellectual project was the same. After that I reigned it in, but perhaps a little too much.

I sketched out what would have been a literary biography of one man. It was Boethius, believe it or not, who still plays a minor role in one of the chapters. But Boethius was too far away historically from some of the major innovations of 20th-century physics that I wanted to engage with. Then it hit me. Some years ago, I had published a little article on what would become the topic of the book in The New York Times. It had three central characters: [Jorge Luis] Borges, [Werner] Heisenberg, and [Immanuel] Kant. They had been there all the time! Returning to those specific figures I realized that among them, they had all the elements I needed.

The core idea was always to show how thinking deeply about a problem can lead to profound insights independently of the specific field of the thinker. In other words, “soft” humanistic approaches can enlighten “hard” scientific ones, and vice versa. In these three I believed I had found my proof of concept, since reading Borges, and then using Kant to think through some of the questions provoked by Borges, had over the years led me to a deeper understanding of what Heisenberg had actually discovered than had just reading Heisenberg and explanations of his discovery.

To be a bit more specific, Borges’ story about a man who becomes incapable of forgetting anything, incapable of any slippages or gaps in his perception of the world, when read through Kant’s analysis of the synthesis required for any experience in time and space to take place in the first place, lay bare in clarion (albeit non-mathematical) logic what Heisenberg had proven in his 1927 paper: An observation, say of a particle in motion, can’t ever achieve perfection because the very essence of an observation depends on there being a minimal difference between what is observed and what is observing.
"""

summary = summarizer(summ_text, max_length=100, clean_up_tokenization_spaces=True)

In [None]:
print(summary[0]['summary_text'])

## Translation

In [None]:
translator = pipeline('translation_en_to_fr')

In [None]:
translation = translator(
    ner_text,
    clean_up_tokenization_spaces=True,
    min_length=100
)

translation

In [None]:
translation = translator(
    summ_text[:600],
    clean_up_tokenization_spaces=True,
    min_length=100
)

translation

## Text generation

In [None]:
generator = pipeline('text-generation')

In [None]:
response = 'Good evening sir, this is the police, we just received your message.'
prompt = ner_text + '\n\nAnswer from the police:\n' + response

generated_reponse = generator(prompt, max_length=200)

print(generated_reponse)print(generated_reponse[0]['generated_text'])