## **Classification_Naive_Bayes**

Este es un problema de clasificación de texto y análisis de sentimiento utilizando el algoritmo de Naive Bayes. El objetivo es clasificar las revisiones de productos de Amazon en positivas o negativas basado en el contenido textual de las reseñas.

Se ingresa reviews positivos y negativos de un producto de Amazon y se construye la bolsa de palabras que corresponden a revisiones negativas y positivas. A partir de allí, se entrena un modelo Naive-Bayes, para predecir si una nueva revisión es positiva o negativa.

**Nota:** La primer parte del código realiza una clasificación del comentario que se le ingrese. La segunda parte realiza lo mismo solo que se puede escoger que cantidad de comentarios que se ingresara para no tener que ejecutar el código cada vez que se quiera ingresar un nuevo comentario.

### Predecir si **una** revisión es positiva o negativa

In [None]:
# TODO
import nltk
from google.colab import files

nltk.download('punkt')

def main():
    # Read data from files uploaded to Colab
    positives = load_text("Navaja_positivo.txt")
    negatives = load_text("Navaja_negativo.txt")

    # Create a set of all words
    words = set()
    words.update(positives)
    words.update(negatives)

    # Extract features from text
    training = []
    training.extend(generate_features(positives, words, "Positive"))
    training.extend(generate_features(negatives, words, "Negative"))

    # Classify a new sample
    classifier = nltk.NaiveBayesClassifier.train(training)

    # Input a text for classification
    s = input("Ingrese un texto para clasificar: ")
    result = classify(classifier, s, words)

    # Print the probabilities
    for key in result.samples():
        print(f"{key}: {result.prob(key):.4f}")

def load_text(file_name):
    with open(file_name, 'r', encoding='utf-8') as file:
        text = file.read().lower()
        return extract_words(text)

def extract_words(document):
    return set(
        word.lower() for word in nltk.word_tokenize(document)
        if any(c.isalpha() for c in word)
    )

def generate_features(documents, words, label):
    features = []
    for document in documents:
        features.append(({
            word: (word in document)
            for word in words
        }, label))
    return features

def classify(classifier, document, words):
    document_words = extract_words(document)
    features = {
        word: (word in document_words)
        for word in words
    }
    return classifier.prob_classify(features)

if __name__ == "__main__":
    main()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Ingrese un texto para clasificar: Es excelente
Positive: 0.7933
Negative: 0.2067


### Predecir si **n** revisiones es positiva o negativa

In [None]:
import nltk
from google.colab import files

nltk.download('punkt')

def train_classifier(positives, negatives, words):
    # Extract features from text
    training = []
    training.extend(generate_features(positives, words, "Positive"))
    training.extend(generate_features(negatives, words, "Negative"))

    # Train the classifier
    classifier = nltk.NaiveBayesClassifier.train(training)
    return classifier

def main():
    # Read data from files uploaded to Colab
    positives = load_text("Navaja_positivo.txt")
    negatives = load_text("Navaja_negativo.txt")

    # Create a set of all words
    words = set()
    words.update(positives)
    words.update(negatives)

    # Train the classifier
    classifier = train_classifier(positives, negatives, words)

    # Classify multiple samples
    num_samples = int(input("Ingrese el número de textos para clasificar: "))
    for _ in range(num_samples):
        s = input("Ingrese un texto para clasificar: ")
        result = classify(classifier, s, words)

        # Print the probabilities
        for key in result.samples():
            print(f"{key}: {result.prob(key):.4f}")

def load_text(file_name):
    with open(file_name, 'r', encoding='utf-8') as file:
        text = file.read().lower()
        return extract_words(text)

def extract_words(document):
    return set(
        word.lower() for word in nltk.word_tokenize(document)
        if any(c.isalpha() for c in word)
    )

def generate_features(documents, words, label):
    features = []
    for document in documents:
        features.append(({
            word: (word in document)
            for word in words
        }, label))
    return features

def classify(classifier, document, words):
    document_words = extract_words(document)
    features = {
        word: (word in document_words)
        for word in words
    }
    return classifier.prob_classify(features)

if __name__ == "__main__":
    main()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Ingrese el número de textos para clasificar: 3
Ingrese un texto para clasificar: Me defraudo mucho
Positive: 0.0508
Negative: 0.9492
Ingrese un texto para clasificar: Es muy buena
Positive: 0.7907
Negative: 0.2093
Ingrese un texto para clasificar: Falta funciones, muy sencilla
Positive: 0.2115
Negative: 0.7885
