# Spanish NLP: Classify Notebook

For more information visit [spanish_nlp](https://github.com/jorgeortizfuentes/spanish_nlp) repository on GitHub.

## Available models

| **Model name**     | **Sources**                            |
|--------------------|----------------------------------------|
| hate_speech        | robertuito                             |
| incivility         | bert                                   |
| toxic_speech       | political-tweets-es                    |
| sentiment_analysis | robertuito                             |
| emotion_analysis   | robertuito                             |
| irony_analysis     | robertuito                             |
| sexist_analysis    | sexist_analysis_metwo                  |
| racist_analysis    | racism_paula_lobo_et_al_average_strict |


## Quick usage

In [1]:
from spanish_nlp import classifiers

sc = classifiers.SpanishClassifier(model_name="hate_speech", device="cpu")
t1 = "LAS MUJERES Y GAYS DEBERÍAN SER EXTERMINADOS"
t2 = (
    "El presidente convocó a una reunión a los representantes de los partidos políticos"
)
p1 = sc.predict(t1)
p2 = sc.predict(t2)

print("Text 1: ", t1)
print("Prediction 1: ", p1)
print("Text 2: ", t2)
print("Prediction 2: ", p2)


Text 1:  LAS MUJERES Y GAYS DEBERÍAN SER EXTERMINADOS
Prediction 1:  {'hate_speech': 0.7544152736663818, 'not_hate_speech': 0.24558477103710175}
Text 2:  El presidente convocó a una reunión a los representantes de los partidos políticos
Prediction 2:  {'not_hate_speech': 0.9793208837509155, 'hate_speech': 0.02067909575998783}


## Apply classification for a dataset in pandas

In [2]:
from datetime import datetime

import pandas as pd
import swifter

from spanish_nlp.classifiers import SpanishClassifier
from spanish_nlp import preprocess

# Create DataFrame

texts = ["Deberían ser exterminados los pueblos indígenas",
         "El presidente convocó a una reunión a los representantes de los partidos políticos",
         "Los pingüinos son animales",
         "La vacuna contra el covid-19 ya está disponible",
         "Hay que matar a todos los extranjeros"]

df = pd.DataFrame(texts, columns=["text"])

In [3]:
# Preprocess texts

sp = preprocess.SpanishPreprocess(
        lower=False,
        remove_url=True,
        remove_hashtags=False,
        split_hashtags=True,
        normalize_breaklines=True,
        remove_emoticons=False,
        remove_emojis=False,
        convert_emoticons=False,
        convert_emojis=False,
        normalize_inclusive_language=True,
        reduce_spam=True,
        remove_vowels_accents=True,
        remove_multiple_spaces=True,
        remove_punctuation=True,
        remove_unprintable=True,
        remove_numbers=True,
        remove_stopwords=False,
        stopwords_list=None,
        lemmatize=False,
        stem=False,
        remove_html_tags=True,
)

df["text"] = df["text"].swifter.apply(sp.transform)

df = df[df.text.notnull()]
df = df[df.text != ""]
df = df[df["text"].apply(lambda x: isinstance(x, str))]
df = df.reset_index(drop=True)


def predict_label(text, model):
    try:
        return model.predict(text)
    except Exception as e:
        time = datetime.now().strftime("%d-%Y-%m %H:%M:%S")
        print(f"{time} - {e}")


classifiers_names = [
    "hate_speech",
    "toxic_speech",
    "sentiment_analysis",
    "emotion_analysis",
    "irony_analysis",
    "sexist_analysis",
    "racism_analysis",
]
classifiers = {}

for n in classifiers_names:
    c = SpanishClassifier(model_name=n, device="cpu")
    df[n] = df["text"].swifter.apply(lambda x: c.predict(x))



Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/435M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/384 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/167 [00:00<?, ?B/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/435M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/384 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/167 [00:00<?, ?B/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/915 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/435M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/384 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/167 [00:00<?, ?B/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/795 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/435M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/334 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/834 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/439M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/310 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/650 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/248k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/486k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/134 [00:00<?, ?B/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Unnamed: 0,text,hate_speech,toxic_speech,sentiment_analysis,emotion_analysis,irony_analysis,sexist_analysis,racism_analysis
0,Deberian ser exterminados los pueblos indigenas,"{'not_hate_speech': 0.8328660130500793, 'hate_...","{'toxic': 0.6276343464851379, 'very_toxic': 0....","{'negative': 0.8032280802726746, 'neutral': 0....","{'others': 0.748774528503418, 'anger': 0.16288...","{'not_ironic': 0.9995823502540588, 'ironic': 0...","{'not_sexist': 0.9762647747993469, 'sexist': 0...","{'non-racist': 0.999099612236023, 'racist': 0...."
1,El presidente convoco a una reunion a los repr...,"{'not_hate_speech': 0.983544647693634, 'hate_s...","{'toxic': 0.00566514628008008, 'very_toxic': 0...","{'neutral': 0.8114618062973022, 'positive': 0....","{'others': 0.9919043183326721, 'joy': 0.002639...","{'not_ironic': 0.9993013143539429, 'ironic': 0...","{'not_sexist': 0.9759377837181091, 'sexist': 0...","{'non-racist': 0.9996436834335327, 'racist': 0..."
2,Los pinguinos son animalos,"{'not_hate_speech': 0.9637648463249207, 'hate_...","{'toxic': 0.0185139998793602, 'very_toxic': 0....","{'positive': 0.5787513256072998, 'neutral': 0....","{'others': 0.9116767644882202, 'joy': 0.024299...","{'not_ironic': 0.7218025922775269, 'ironic': 0...","{'not_sexist': 0.9535900950431824, 'sexist': 0...","{'non-racist': 0.9981189370155334, 'racist': 0..."
3,La vacuna contra el covid ya esta disponible,"{'not_hate_speech': 0.9811059832572937, 'hate_...","{'toxic': 0.13025140762329102, 'very_toxic': 0...","{'positive': 0.5552893877029419, 'neutral': 0....","{'others': 0.9687969088554382, 'joy': 0.019537...","{'not_ironic': 0.9697375297546387, 'ironic': 0...","{'not_sexist': 0.9818084836006165, 'sexist': 0...","{'non-racist': 0.9996614456176758, 'racist': 0..."
4,Hay que matar a todos los extranjeros,"{'hate_speech': 0.8915936350822449, 'not_hate_...","{'toxic': 0.7360461950302124, 'very_toxic': 0....","{'negative': 0.7249139547348022, 'neutral': 0....","{'anger': 0.626744270324707, 'disgust': 0.3094...","{'not_ironic': 0.9974295496940613, 'ironic': 0...","{'not_sexist': 0.9626052379608154, 'sexist': 0...","{'racist': 0.9961186647415161, 'non-racist': 0..."
