#### Natural Language Processing (NLP) is a field that studies how computers can understand, process, and manipulate human languages. It's a cross-disciplinary field that combines computer science, linguistics, and artificial intelligence.
NLP can involve tasks such as: Interpreting semantic meaning, Translating between languages, and Recognizing patterns.

##### The provided code snippet uses a basic Natural Language Processing (NLP) model to classify text messages as spam or not spam. Here's a detailed explanation of how it works as an NLP model:

**Text Processing**:

The CountVectorizer is used to convert the text messages into a matrix of token counts. This is a form of feature extraction where each unique word in the text data is converted into a feature.
Model Training:

A **MultinomialNB** (Naive Bayes) classifier is used for training. This algorithm is particularly suited for classification with discrete features (like word counts for text classification).

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer#converting a collection of text documents to a matrix of token counts
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pickle

In [2]:
# Load dataset
df = pd.read_csv(r"C:\Users\karee\Downloads\spam.csv", encoding='latin1')
df

Unnamed: 0,class,message,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,
...,...,...,...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...,,,
5568,ham,Will Ì_ b going to esplanade fr home?,,,
5569,ham,"Pity, * was in mood for that. So...any other s...",,,
5570,ham,The guy did some bitching but I acted like i'd...,,,


In [3]:
#Drop unnecessary columns
df.drop (['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'], axis=1, inplace=True)

In [4]:
# Rename the remaining columns for clarity
df.columns = ['label', 'message']
df


Unnamed: 0,label,message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...
5568,ham,Will Ì_ b going to esplanade fr home?
5569,ham,"Pity, * was in mood for that. So...any other s..."
5570,ham,The guy did some bitching but I acted like i'd...


In [5]:
# Encode the labels: 'ham' -> 0, 'spam' -> 1
df['label'] = df['label'].map({'ham': 0, 'spam': 1})
df

Unnamed: 0,label,message
0,0,"Go until jurong point, crazy.. Available only ..."
1,0,Ok lar... Joking wif u oni...
2,1,Free entry in 2 a wkly comp to win FA Cup fina...
3,0,U dun say so early hor... U c already then say...
4,0,"Nah I don't think he goes to usf, he lives aro..."
...,...,...
5567,1,This is the 2nd time we have tried 2 contact u...
5568,0,Will Ì_ b going to esplanade fr home?
5569,0,"Pity, * was in mood for that. So...any other s..."
5570,0,The guy did some bitching but I acted like i'd...


In [6]:
# Extract features and labels
X = df['message']
y = df['label']

In [7]:
# Convert the text data into numerical data
cv = CountVectorizer()
X = cv.fit_transform(X)  # Fit and transform the data


In [8]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

**MultinomialNB** is a Naive Bayes classifier suitable for classification with discrete features, especially for text classification. The multinomial Naive Bayes model is commonly used for document classification problems, where the features are the frequencies of the words present in the document.

What is MultinomialNB?
The MultinomialNB classifier assumes that the features follow a multinomial distribution. This makes it particularly suited for natural language processing tasks such as spam detection, where the features are the word counts or term frequencies in the text.

How MultinomialNB Works
Training:

The model calculates the prior probability of each class based on the training data.
For each feature (word), it calculates the likelihood of the feature given each class.
Prediction:

For a given new document, the model calculates the posterior probability of each class given the document’s features.
It predicts the class with the highest posterior probability.

In [9]:
# Train the Naive Bayes classifier
clf = MultinomialNB()
clf.fit(X_train, y_train)

In [10]:
# Evaluate the model
y_pred = clf.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')


Accuracy: 0.9793365959760739


In [11]:
# Save the trained model and CountVectorizer for later use
with open('nlp_model.pkl', 'wb') as model_file:
    pickle.dump(clf, model_file)
    
with open('tranform.pkl', 'wb') as cv_file:
    pickle.dump(cv, cv_file)
print(f"model saved.")

model saved.


In [12]:
streamlit_code = """
import streamlit as st
import pickle

# Load the pre-trained model and CountVectorizer
model_filename = 'nlp_model.pkl'
cv_filename = 'tranform.pkl'

clf = pickle.load(open(model_filename, 'rb'))
cv = pickle.load(open(cv_filename, 'rb'))

# Title of the web app
st.title('Spam Detector')

# Text input for the message
message = st.text_area('Enter a message:')

# Predict button
if st.button('Predict'):
    if message:
        data = [message]
        vect = cv.transform(data).toarray()
        my_prediction = clf.predict(vect)
        
        # Display the prediction result
        if my_prediction[0] == 1:
            st.write('The message is **spam**.')
        else:
            st.write('The message is **not spam**.')
    else:
        st.write('Please enter a message to predict.')
"""
# Specify the file path where the app.py file will be saved
file_path = 'spamapp.py'

# Write the content to the file
with open(file_path, 'w') as file:
    file.write(streamlit_code)

print(f"File '{file_path}' has been saved.")


File 'spamapp.py' has been saved.


In [13]:
import tkinter as tk
from tkinter import messagebox
import pickle

# Load the pre-trained model and CountVectorizer
model_filename = 'nlp_model.pkl'
cv_filename = 'tranform.pkl'

clf = pickle.load(open(model_filename, 'rb'))
cv = pickle.load(open(cv_filename, 'rb'))

# Function to predict whether a message is spam or not
def predict_spam():
    message = entry.get()  # Get the text entered by the user
    if not message:
        messagebox.showwarning("Input Error", "Please enter a message!")
        return

    # Preprocess the input and make the prediction
    data = [message]
    vect = cv.transform(data).toarray()
    my_prediction = clf.predict(vect)
    
    # Display the result
    if my_prediction[0] == 1:
        result_label.config(text="The message is **Spam**.", fg="red")
    else:
        result_label.config(text="The message is **Not Spam**.", fg="green")

# Setting up the Tkinter window
window = tk.Tk()
window.title("Spam Detector")
window.geometry("500x300")

# Label for instructions
instruction_label = tk.Label(window, text="Enter the message to check if it's spam:", font=("Arial", 12))
instruction_label.pack(pady=10)

# Entry box for message input
entry = tk.Entry(window, width=50)
entry.pack(pady=10)

# Button to trigger prediction
predict_button = tk.Button(window, text="Predict", command=predict_spam, font=("Arial", 10))
predict_button.pack(pady=10)

# Label to display the result
result_label = tk.Label(window, text="", font=("Arial", 14))
result_label.pack(pady=20)

# Run the Tkinter event loop
window.mainloop()
