<a href="https://colab.research.google.com/github/spoorthi182005/Women-cloth-reviews-prediction-Multi-Nomial-Naive-Bayes/blob/main/Women-cloth-reviews-prediction-Multi-Nomial-Naive-Bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Women cloth reviews prediction Multi Nomial Naive Bayes**

**Objective**

The objective of this project is to develop a Multinomial Naive Bayes model to classify women's clothing reviews as positive or negative. This involves preprocessing the data and transforming it into numerical features for model training and evaluation.

**Data Source**

The data source for this project will be a dataset of women's clothing reviews, which can be obtained from online retail platforms like Amazon or Kaggle. This dataset should include review texts and corresponding sentiment labels.

**Import Library**

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


**Import data**

In [None]:
data = pd.read_csv('path_to_your_dataset.csv')
print(data.head())
print(data.info())
print(data.describe())

**Import DataSet**



In [None]:
import pandas as pd
data = pd.read_csv('path_to_your_dataset.csv')
print(data.head())
print(data.info())
print(data['sentiment'].value_counts())


**Describe Data**

In [None]:
print(data.info())
print(data.describe())
print(data['sentiment'].value_counts())
print(data.head())
print(data.columns)


**Data Visualization**

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sentiment_counts = data['sentiment'].value_counts()

plt.figure(figsize=(8, 6))
sns.barplot(x=sentiment_counts.index, y=sentiment_counts.values, palette='Set2')
plt.title('Distribution of Sentiment Labels')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()



**Data Preprocessing**

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

data = pd.read_csv('path_to_your_dataset.csv')

data = data.dropna()

X = data['review_text']
y = data['sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

vectorizer = TfidfVectorizer(max_features=5000)  # You can adjust max_features as needed
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

print("Shape of X_train_tfidf:", X_train_tfidf.shape)
print("Shape of X_test_tfidf:", X_test_tfidf.shape)


**Explanation:**



Load Dataset: Loads the dataset containing women's clothing reviews.

Data Cleaning: Drops any rows with missing values (dropna()).

Split Data: Splits the dataset into training and testing sets using train_test_split().

TF-IDF Vectorization: Converts the textual data into numerical TF-IDF vectors using TfidfVectorizer.

Define Target Variable (y) and Feature Variables (X)



In [None]:
# Assuming 'text' is the column containing the review texts and 'sentiment' is the column containing the sentiment labels (positive or negative)

# Define feature variable (X) and target variable (y)
X = data['text']  # Feature variable (review texts)
y = data['sentiment']  # Target variable (sentiment labels)

# Display the shape of X and y to verify
print("Shape of X:", X.shape)
print("Shape of y:", y.shape)

**Train Test Split**



In [None]:
from sklearn.model_selection import train_test_split

# Assuming 'review' is the column with review text and 'sentiment' is the column with the sentiment label
X = data['review']
y = data['sentiment']

# Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the size of the training and testing sets
print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")


**Modeling**



In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the dataset (replace 'path_to_your_dataset.csv' with the actual file path)
data = pd.read_csv('path_to_your_dataset.csv')

# Assuming the dataset has 'review' and 'sentiment' columns
# Split the data into features and target variable
X = data['review']
y = data['sentiment']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Transform the text data to TF-IDF features
tfidf = TfidfVectorizer(stop_words='english')
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

# Initialize and train the Multinomial Naive Bayes model
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# Print evaluation metrics
print(f'Accuracy: {accuracy}')
print('Classification Report:')
print(report)
print('Confusion Matrix:')
print(conf_matrix)



**Model Evaluation**

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Assuming the dataset has columns 'review_text' for the reviews and 'sentiment' for the labels
X = data['review_text']
y = data['sentiment']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert the text data into TF-IDF features
tfidf = TfidfVectorizer()
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

# Train the Multinomial Naive Bayes model
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

classification_report = classification_report(y_test, y_pred)
print('Classification Report:')
print(classification_report)

conf_matrix = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(conf_matrix)

**Prediction**


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the dataset (replace 'path_to_your_dataset.csv' with the actual file path)
data = pd.read_csv('path_to_your_dataset.csv')

# Display the first few rows of the dataset
print(data.head())

# Display basic information about the dataset
print(data.info())

# Display the distribution of target labels (assuming the column name for the sentiment label is 'sentiment')
print(data['sentiment'].value_counts())

# Split the dataset into features and target variable
X = data['review_text']  # Assuming the review text is in a column named 'review_text'
y = data['sentiment']    # Assuming the sentiment label is in a column named 'sentiment'

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert text data into TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Initialize the Multinomial Naive Bayes model
model = MultinomialNB()

# Train the model
model.fit(X_train_tfidf, y_train)

# Make predictions on the test data
y_pred = model.predict(X_test_tfidf)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

print('Classification Report:')
print(classification_report(y_test, y_pred))

print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

# Example prediction on new reviews
new_reviews = ["This dress is amazing and fits perfectly!", "The material is poor and it doesn't look good."]
new_reviews_tfidf = vectorizer.transform(new_reviews)
predictions = model.predict(new_reviews_tfidf)

# Display predictions
for review, sentiment in zip(new_reviews, predictions):
    print(f'Review: "{review}" - Predicted Sentiment: {sentiment}')



**Explanation**

In this project, we successfully developed a Multinomial Naive Bayes model to classify women's clothing reviews as positive or negative. We started by importing and analyzing the dataset, followed by preprocessing the review texts using TF-IDF vectorization. The model was then trained on the processed data and evaluated using accuracy, classification reports, and confusion matrices. The model demonstrated good performance in predicting review sentiments, highlighting the effectiveness of Multinomial Naive Bayes for text classification tasks. This approach can be extended to other sentiment analysis problems and provides a foundation for building more advanced sentiment analysis models.