# pysentimiento: A transformer-based Sentiment Analysis library for Spanish

En esta notebook mostramos un breve ejemplo de cómo usar [pysentimiento](https://github.com/finiteautomata/pysentimiento/), una librería de análisis de sentimiento en Español.

`pysentimiento` es un pequeño wrapper sobre modelos pre-entrenados de [transformers](https://github.com/huggingface/transformers), usando la implementación de BERT en Español, [BETO](https://github.com/dccuchile/beto) y los datos de la versión 2020 del [Taller de Análisis de Sentimiento (TASS) de la Sociedad Española de Procesamiento de Lenguaje Natural (SEPLN)](http://tass.sepln.org/2020/?page_id=74)

Primero, instalamos la librería

In [None]:
!pip install pysentimiento==0.2.0

Collecting pysentimiento==0.2.0
  Downloading https://files.pythonhosted.org/packages/c7/e7/4d79de3930c7846dc6b9dd0fd8605774137a26a36da2b10d6fbd2b8a3e91/pysentimiento-0.2.0-py3-none-any.whl
Collecting emoji
[?25l  Downloading https://files.pythonhosted.org/packages/24/fa/b3368f41b95a286f8d300e323449ab4e86b85334c2e0b477e94422b8ed0f/emoji-1.2.0-py3-none-any.whl (131kB)
[K     |████████████████████████████████| 133kB 6.7MB/s 
[?25hCollecting transformers==4.6.1
[?25l  Downloading https://files.pythonhosted.org/packages/d5/43/cfe4ee779bbd6a678ac6a97c5a5cdeb03c35f9eaebbb9720b036680f9a2d/transformers-4.6.1-py3-none-any.whl (2.2MB)
[K     |████████████████████████████████| 2.3MB 20.8MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
[K     |████████████████████████████████| 901kB 44.2MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Dow

Creamos un analizador de sentimiento. Como parámetro recibe el lenguaje usado (puede ser `es` o `en` por el momento)

In [None]:
from pysentimiento import SentimentAnalyzer
analyzer = SentimentAnalyzer(lang="es")


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=841.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=241796.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=480717.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=55.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=177.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=528.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=439508881.0, style=ProgressStyle(descri…




Veamos algunos ejemplos:

In [None]:
analyzer.predict("Qué gran jugador es Messi")

SentimentOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})

In [None]:
analyzer.predict("Esto es pésimo")

SentimentOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})

In [None]:
analyzer.predict("Qué es esto?")

SentimentOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})

Soporta también el uso de emojis

In [None]:
analyzer.predict("🤢")

SentimentOutput(output=NEG, probas={NEG: 0.999, NEU: 0.001, POS: 0.001})

O de hashtags

In [None]:
analyzer.predict("#EstoEsUnaMierda")

SentimentOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})

## Emotion Analysis

`pysentimiento` provee análisis de emociones a través de modelos pre-entrenados con los datasets de [EmoEvent](https://github.com/fmplaza/EmoEvent-multilingual-corpus/)

In [None]:
from pysentimiento import EmotionAnalyzer

emotion_analyzer = EmotionAnalyzer(lang="en")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=999.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=843438.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1078931.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=17.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=150.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=295.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=539707292.0, style=ProgressStyle(descri…




In [None]:
emotion_analyzer.predict("This is so terrible...")

EmotionOutput(output=sadness, probas={sadness: 0.950, fear: 0.016, disgust: 0.010, anger: 0.007, surprise: 0.007, others: 0.006, joy: 0.005})

In [None]:
emotion_analyzer.predict("omg")

EmotionOutput(output=surprise, probas={surprise: 0.918, joy: 0.020, fear: 0.016, others: 0.016, sadness: 0.014, anger: 0.011, disgust: 0.005})

In [None]:
emotion_analyzer.predict("yayyyy")

EmotionOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})

In [None]:
emotion_analyzer.predict("People in the world is really worried because of Coronavirus")

EmotionOutput(output=fear, probas={fear: 0.934, others: 0.025, sadness: 0.011, surprise: 0.011, anger: 0.007, disgust: 0.007, joy: 0.007})

## Preprocessing

`pysentimiento` tiene un módulo de preprocesamiento de tweets con varias opciones para manipular hashtags, emojis, repetición de caracteres y demás.

In [None]:
from pysentimiento.preprocessing import preprocess_tweet

preprocess_tweet("📢 @realDonaldTrump ha sido banneado de Twitter #BreakingNews")

'[EMOJI] altavoz de mano [EMOJI] [USER] ha sido banneado de Twitter breaking news'