# Spanish NLP: Classify Notebook

For more information visit [spanish_nlp](https://github.com/jorgeortizfuentes/spanish_nlp) repository on GitHub.

## Available models

| **Model name**     | **Sources**                            |
|--------------------|----------------------------------------|
| hate_speech        | bert, robertuito                       |
| incivility         | bert                                   |
| toxic_speech       | political-tweets-es                    |
| sentiment_analysis | robertuito                             |
| emotion_analysis   | robertuito                             |
| irony_analysis     | robertuito                             |
| sexist_analysis    | sexist_analysis_metwo                  |
| racist_analysis    | racism_paula_lobo_et_al_average_strict |


## Quick usage

In [6]:
from spanish_nlp import classifiers

sc = classifiers.SpanishClassifier(model_name="hate_speech", device="cpu")
t1 = "LAS MUJERES Y GAYS DEBERÍAN SER EXTERMINADOS"
t2 = (
    "El presidente convocó a una reunión a los representantes de los partidos políticos"
)
p1 = sc.predict(t1)
p2 = sc.predict(t2)

print("Text 1: ", t1)
print("Prediction 1: ", p1)
print("Text 2: ", t2)
print("Prediction 2: ", p2)


Text 1:  LAS MUJERES Y GAYS DEBERÍAN SER EXTERMINADOS
Prediction 1:  {'no_hate': 0.8702716827392578, 'hate': 0.12972833216190338}
Text 2:  El presidente convocó a una reunión a los representantes de los partidos políticos
Prediction 2:  {'no_hate': 0.9976341724395752, 'hate': 0.002365865046158433}


## Apply classification for a dataset in pandas

### Load dataset

In [7]:
from datetime import datetime

import pandas as pd
import swifter

from spanish_nlp.classifiers import SpanishClassifier
from spanish_nlp import SpanishPreprocess

# Create DataFrame

texts = ["Deberían ser exterminados los pueblos indígenas",
         "El presidente convocó a una reunión a los representantes de los partidos políticos",
         "Los pingüinos son animales",
         "La vacuna contra el covid-19 ya está disponible",
         "Hay que matar a todos los extranjeros"]

df = pd.DataFrame(texts, columns=["text"])

### Preprocess dataset

In [8]:
# Preprocess texts

sp = SpanishPreprocess(
        lower=False,
        remove_url=True,
        remove_hashtags=False,
        split_hashtags=True,
        normalize_breaklines=True,
        remove_emoticons=False,
        remove_emojis=False,
        convert_emoticons=False,
        convert_emojis=False,
        normalize_inclusive_language=True,
        reduce_spam=True,
        remove_vowels_accents=True,
        remove_multiple_spaces=True,
        remove_punctuation=True,
        remove_unprintable=True,
        remove_numbers=True,
        remove_stopwords=False,
        stopwords_list=None,
        lemmatize=False,
        stem=False,
        remove_html_tags=True,
)

df["text"] = df["text"].swifter.apply(sp.transform)

df = df[df.text.notnull()]
df = df[df.text != ""]
df = df[df["text"].apply(lambda x: isinstance(x, str))]
df = df.reset_index(drop=True)

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

### Classify dataset 

#### Models:
* hate_speech
* incivility
* sentiment analysis
* emotion analysis
* sexist analysisracism analysis

In [9]:
def predict_label(text, model):
    try:
        return model.predict(text)
    except Exception as e:
        time = datetime.now().strftime("%d-%Y-%m %H:%M:%S")
        print(f"{time} - {e}")


classifiers_names = [
    "hate_speech",
    "incivility",
    "sentiment_analysis",
    "emotion_analysis",
    "irony_analysis",
    "sexist_analysis",
    "racism_analysis",
]
classifiers = {}

for n in classifiers_names:
    c = SpanishClassifier(model_name=n, device="cpu")
    df[n] = df["text"].swifter.apply(lambda x: c.predict(x))



Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

Pandas Apply:   0%|          | 0/5 [00:00<?, ?it/s]

In [None]:
df