#### Installing & importing necessary libraries

In [6]:
import pandas as pd
import numpy as np

In [7]:
!pip install contractions




[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: C:\Users\Shubham Idekar\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip


In [8]:
import nltk
import string
import re
import contractions
nltk.download('wordnet')
nltk.download('omw-1.4')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords,wordnet
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer

[nltk_data] Downloading package wordnet to C:\Users\Shubham
[nltk_data]     Idekar\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to C:\Users\Shubham
[nltk_data]     Idekar\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [None]:
column_name= ['Label','Text']
df = pd.read_csv("ecommerceDataset.csv",names=column_name, header=None)

### Preprocessing data

1) Expand Contraction
2) Remove punctuations
3) Tokenization
4) Convert to lower case
5) Remove words containing numerical digits
6) Remove stopwords
7) Stemming or Lemmatization

In [10]:
#Checking a sample value
df['Text'][0]

'Paper Plane Design Framed Wall Hanging Motivational Office Decor Art Prints (8.7 X 8.7 inch) - Set of 4 Painting made up in synthetic frame with uv textured print which gives multi effects and attracts towards it. This is an special series of paintings which makes your wall very beautiful and gives a royal touch. This painting is ready to hang, you would be proud to possess this unique painting that is a niche apart. We use only the most modern and efficient printing technology on our prints, with only the and inks and precision epson, roland and hp printers. This innovative hd printing technique results in durable and spectacular looking prints of the highest that last a lifetime. We print solely with top-notch 100% inks, to achieve brilliant and true colours. Due to their high level of uv resistance, our prints retain their beautiful colours for many years. Add colour and style to your living space with this digitally printed painting. Some are for pleasure and some for eternal blis

In [11]:
# Drop null values 
df.dropna(inplace=True)

##### 1) Expand Contraction

Contracted words are a common feature of natural language, especially in informal settings such as social media or messaging platforms.

Contractions are shortened versions of words or phrases that are formed by combining two words and replacing one or more letters with an apostrophe. Examples of contractions include:

"can't" (from "cannot")
<br>
"won't" (from "will not")
<br>
"it's" (from "it is" or "it has")
<br>
"shouldn't" (from "should not")
<br>
"didn't" (from "did not")
<br>
"you'll" (from "you will")

It will be beneficial to expand contractions to help with language understanding for which we will use the Contractions library.

In [12]:
df['Contractions'] = df['Text'].apply(lambda x: [contractions.fix(word) for word in x.split()])
df['No_contractions'] = [' '.join(map(str, l)) for l in df['Contractions']]
df.drop('Contractions',axis=1,inplace=True)
df.head()

Unnamed: 0,Label,Text,No_contractions
0,Household,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","SAF 'Floral' Framed Painting (Wood, 30 inch x ..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF 'UV Textured Modern Art Print Framed' Pain...
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1...","SAF Flower Print Framed Painting (Synthetic, 1..."
4,Household,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...


##### 2) Remove Punctuations

Punctuation is often removed to simplify the analysis, and reduce the vocabulary size while preserving the meaningful content of the text.

We will use the punctuation library from the String package.

In [13]:
punc = string.punctuation
df['No_punc'] = df['No_contractions'].apply(lambda x: re.sub('[%s]' % re.escape(string.punctuation), '' , x))
df.head()

Unnamed: 0,Label,Text,No_contractions,No_punc
0,Household,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","SAF 'Floral' Framed Painting (Wood, 30 inch x ...",SAF Floral Framed Painting Wood 30 inch x 10 i...
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF UV Textured Modern Art Print Framed Painti...
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1...","SAF Flower Print Framed Painting (Synthetic, 1...",SAF Flower Print Framed Painting Synthetic 135...
4,Household,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...


##### 3) Tokenization

Tokenization is the process of breaking down text into individual words, phrases, or other meaningful elements, called tokens.

We will use NLTK.word_tokenize() function to create a new column named “tokenized”.

In [14]:
nltk.download('punkt')
df['Tokenized'] = df['No_punc'].apply(word_tokenize)
df.head()

[nltk_data] Downloading package punkt to C:\Users\Shubham
[nltk_data]     Idekar\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Unnamed: 0,Label,Text,No_contractions,No_punc,Tokenized
0,Household,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,"[Paper, Plane, Design, Framed, Wall, Hanging, ..."
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","SAF 'Floral' Framed Painting (Wood, 30 inch x ...",SAF Floral Framed Painting Wood 30 inch x 10 i...,"[SAF, Floral, Framed, Painting, Wood, 30, inch..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF UV Textured Modern Art Print Framed Painti...,"[SAF, UV, Textured, Modern, Art, Print, Framed..."
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1...","SAF Flower Print Framed Painting (Synthetic, 1...",SAF Flower Print Framed Painting Synthetic 135...,"[SAF, Flower, Print, Framed, Painting, Synthet..."
4,Household,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,"[Incredible, Gifts, India, Wooden, Happy, Birt..."


##### 4) Convert to Lower Case

All the alphabetic characters in a text are transformed to their corresponding lower case representation to reduce the vocabulary size and avoid duplication of words during text analysis.

In [15]:
df['Lower'] = df['Tokenized'].apply(lambda x: [text.lower() for text in x])
df.head()

Unnamed: 0,Label,Text,No_contractions,No_punc,Tokenized,Lower
0,Household,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,"[Paper, Plane, Design, Framed, Wall, Hanging, ...","[paper, plane, design, framed, wall, hanging, ..."
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","SAF 'Floral' Framed Painting (Wood, 30 inch x ...",SAF Floral Framed Painting Wood 30 inch x 10 i...,"[SAF, Floral, Framed, Painting, Wood, 30, inch...","[saf, floral, framed, painting, wood, 30, inch..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF UV Textured Modern Art Print Framed Painti...,"[SAF, UV, Textured, Modern, Art, Print, Framed...","[saf, uv, textured, modern, art, print, framed..."
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1...","SAF Flower Print Framed Painting (Synthetic, 1...",SAF Flower Print Framed Painting Synthetic 135...,"[SAF, Flower, Print, Framed, Painting, Synthet...","[saf, flower, print, framed, painting, synthet..."
4,Household,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,"[Incredible, Gifts, India, Wooden, Happy, Birt...","[incredible, gifts, india, wooden, happy, birt..."


##### 5) Remove words containing digits

Eliminating words that contain numeric characters from text analysis to reduce noise and improve the accuracy of language models.

We will eliminate these words using Regular Expression.

In [16]:
df['No_num'] = df['Lower'].apply(lambda x: [re.sub(r'\w*\d\w*','',text) for text in x])
df.head()

Unnamed: 0,Label,Text,No_contractions,No_punc,Tokenized,Lower,No_num
0,Household,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,"[Paper, Plane, Design, Framed, Wall, Hanging, ...","[paper, plane, design, framed, wall, hanging, ...","[paper, plane, design, framed, wall, hanging, ..."
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","SAF 'Floral' Framed Painting (Wood, 30 inch x ...",SAF Floral Framed Painting Wood 30 inch x 10 i...,"[SAF, Floral, Framed, Painting, Wood, 30, inch...","[saf, floral, framed, painting, wood, 30, inch...","[saf, floral, framed, painting, wood, , inch, ..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF UV Textured Modern Art Print Framed Painti...,"[SAF, UV, Textured, Modern, Art, Print, Framed...","[saf, uv, textured, modern, art, print, framed...","[saf, uv, textured, modern, art, print, framed..."
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1...","SAF Flower Print Framed Painting (Synthetic, 1...",SAF Flower Print Framed Painting Synthetic 135...,"[SAF, Flower, Print, Framed, Painting, Synthet...","[saf, flower, print, framed, painting, synthet...","[saf, flower, print, framed, painting, synthet..."
4,Household,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,"[Incredible, Gifts, India, Wooden, Happy, Birt...","[incredible, gifts, india, wooden, happy, birt...","[incredible, gifts, india, wooden, happy, birt..."


##### 6) Remove Stopwords
Process of eliminating common words such as "the", "a", "an", and "in" from text to reduce the dimensionality of the data, and to focus on the more meaningful words that carry the essence of the text.

We will use the stopwords library from the nltk module.

In [17]:
nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
df['Stopwords_removed'] = df['No_num'].apply(lambda x: [word for word in x if word not in stop_words])
df.head()

[nltk_data] Downloading package stopwords to C:\Users\Shubham
[nltk_data]     Idekar\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Unnamed: 0,Label,Text,No_contractions,No_punc,Tokenized,Lower,No_num,Stopwords_removed
0,Household,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,Paper Plane Design Framed Wall Hanging Motivat...,"[Paper, Plane, Design, Framed, Wall, Hanging, ...","[paper, plane, design, framed, wall, hanging, ...","[paper, plane, design, framed, wall, hanging, ...","[paper, plane, design, framed, wall, hanging, ..."
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","SAF 'Floral' Framed Painting (Wood, 30 inch x ...",SAF Floral Framed Painting Wood 30 inch x 10 i...,"[SAF, Floral, Framed, Painting, Wood, 30, inch...","[saf, floral, framed, painting, wood, 30, inch...","[saf, floral, framed, painting, wood, , inch, ...","[saf, floral, framed, painting, wood, , inch, ..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF 'UV Textured Modern Art Print Framed' Pain...,SAF UV Textured Modern Art Print Framed Painti...,"[SAF, UV, Textured, Modern, Art, Print, Framed...","[saf, uv, textured, modern, art, print, framed...","[saf, uv, textured, modern, art, print, framed...","[saf, uv, textured, modern, art, print, framed..."
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1...","SAF Flower Print Framed Painting (Synthetic, 1...",SAF Flower Print Framed Painting Synthetic 135...,"[SAF, Flower, Print, Framed, Painting, Synthet...","[saf, flower, print, framed, painting, synthet...","[saf, flower, print, framed, painting, synthet...","[saf, flower, print, framed, painting, synthet..."
4,Household,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,Incredible Gifts India Wooden Happy Birthday U...,"[Incredible, Gifts, India, Wooden, Happy, Birt...","[incredible, gifts, india, wooden, happy, birt...","[incredible, gifts, india, wooden, happy, birt...","[incredible, gifts, india, wooden, happy, birt..."


##### 7) Stemming or Lemmatization

Stemming and lemmatization are two techniques used in NLP to normalize words by reducing them to their base or root form; stemming chops off the end of words, while lemmatization uses a vocabulary and morphological analysis to reduce words to their canonical form.

Stemming: The stem of "running" is "run". Using a stemming algorithm, "running", "runs", and "runner" would all be reduced to the stem "run".
Lemmatization: The lemma of "running" is "run". Using a lemmatization algorithm, "running" and "runs" would be reduced to "run", while "runner" would be reduced to "run" as well, but only if the context suggests that it is being used as a verb.
We will apply parts of speech tags, in other words, determine the part of speech (ie. noun, verb, adverb, etc.) for each word.

There are various stemmers and one lemmatizer in NLTK, the most common being:

- Porter Stemmer from Porter (1980)
- Wordnet Lemmatizer (port of the Morphy: https://wordnet.princeton.edu/man/morphy.7WN.html)

**Action** : We will apply NLTK’s Porter Stemmer within our trusty list comprehension.

In [18]:
stemmer = PorterStemmer()
df['Stem'] = df['Stopwords_removed'].apply(lambda x: [stemmer.stem(word) for word in x])

In [19]:
# Displaying the original text and processed data
df[['Text','Stem']]

Unnamed: 0,Text,Stem
0,Paper Plane Design Framed Wall Hanging Motivat...,"[paper, plane, design, frame, wall, hang, moti..."
1,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","[saf, floral, frame, paint, wood, , inch, x, ,..."
2,SAF 'UV Textured Modern Art Print Framed' Pain...,"[saf, uv, textur, modern, art, print, frame, p..."
3,"SAF Flower Print Framed Painting (Synthetic, 1...","[saf, flower, print, frame, paint, synthet, , ..."
4,Incredible Gifts India Wooden Happy Birthday U...,"[incred, gift, india, wooden, happi, birthday,..."
...,...,...
50420,Strontium MicroSD Class 10 8GB Memory Card (Bl...,"[strontium, microsd, class, , , memori, card, ..."
50421,CrossBeats Wave Waterproof Bluetooth Wireless ...,"[crossbeat, wave, waterproof, bluetooth, wirel..."
50422,Karbonn Titanium Wind W4 (White) Karbonn Titan...,"[karbonn, titanium, wind, , white, karbonn, ti..."
50423,"Samsung Guru FM Plus (SM-B110E/D, Black) Colou...","[samsung, guru, fm, plu, , black, colourblack,..."


In [20]:
df[['Label','Text','Stem']]

Unnamed: 0,Label,Text,Stem
0,Household,Paper Plane Design Framed Wall Hanging Motivat...,"[paper, plane, design, frame, wall, hang, moti..."
1,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...","[saf, floral, frame, paint, wood, , inch, x, ,..."
2,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,"[saf, uv, textur, modern, art, print, frame, p..."
3,Household,"SAF Flower Print Framed Painting (Synthetic, 1...","[saf, flower, print, frame, paint, synthet, , ..."
4,Household,Incredible Gifts India Wooden Happy Birthday U...,"[incred, gift, india, wooden, happi, birthday,..."
...,...,...,...
50420,Electronics,Strontium MicroSD Class 10 8GB Memory Card (Bl...,"[strontium, microsd, class, , , memori, card, ..."
50421,Electronics,CrossBeats Wave Waterproof Bluetooth Wireless ...,"[crossbeat, wave, waterproof, bluetooth, wirel..."
50422,Electronics,Karbonn Titanium Wind W4 (White) Karbonn Titan...,"[karbonn, titanium, wind, , white, karbonn, ti..."
50423,Electronics,"Samsung Guru FM Plus (SM-B110E/D, Black) Colou...","[samsung, guru, fm, plu, , black, colourblack,..."


**Pre-processing** of text data is an essential step in natural language processing (NLP) that involves cleaning and transforming raw text data into a format that is suitable for analysis by NLP algorithms. Techniques such as tokenization, converting to lower case, removing digits and punctuations, and eliminating stopwords can help reduce the dimensionality of the data and improve the accuracy of language models. Stemming and lemmatization can further normalize the text data by reducing words to their base or root form.

Overall, pre-processing plays a crucial role in preparing text data for various NLP tasks such as sentiment analysis, text classification, and language translation.

#### Baseline Model - Multinomial Naive Bayes

- Text Vectorization
<br>
In order to perform machine learning on text data, we must transform the documents into vector representations. In natural language processing, **text vectorization** is the process of converting words, sentences, or even larger units of text data to numerical vectors.

In [21]:
from sklearn.model_selection import train_test_split

# Split the dataset into a training set and a testing set while preserving the class distribution
X = df['Stem']  # Features
y = df['Label']  # Target variable

# Use stratified sampling to ensure the same class distribution in both sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

In [22]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Convert the lists of words into strings
X_train = X_train.apply(lambda word_list: ' '.join(word_list))
X_test = X_test.apply(lambda word_list: ' '.join(word_list))

tfidf_vectorizer = TfidfVectorizer()
X_train = tfidf_vectorizer.fit_transform(X_train)
X_test = tfidf_vectorizer.transform(X_test)

In [23]:
from sklearn.naive_bayes import MultinomialNB

nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)

In [24]:
predictions = nb_classifier.predict(X_test)
print(predictions)

['Household' 'Household' 'Electronics' ... 'Household' 'Household'
 'Clothing & Accessories']


In [25]:
from sklearn.metrics import classification_report, accuracy_score

accuracy = accuracy_score(y_test, predictions)
report = classification_report(y_test, predictions)
print(f"Accuracy: {accuracy}")
print(report)

Accuracy: 0.9411998016856717
                        precision    recall  f1-score   support

                 Books       0.98      0.92      0.95      2364
Clothing & Accessories       0.98      0.94      0.96      1734
           Electronics       0.96      0.90      0.93      2124
             Household       0.90      0.98      0.94      3863

              accuracy                           0.94     10085
             macro avg       0.95      0.93      0.94     10085
          weighted avg       0.94      0.94      0.94     10085



##### Multinomial Naive Bayes Model Evaluation:
**Precision**:
<br>
- Books: 0.98 - Among the instances predicted as "Books," 98% are correctly classified.
- Clothing & Accessories: 0.98 - 98% of instances predicted as "Clothing & Accessories" are correct.
- Electronics: 0.96 - 96% of instances predicted as "Electronics" are correct.
- Household: 0.90 - 90% of instances predicted as "Household" are correct.

**Recall**:
<br>
- Books: 0.92 - The model correctly identifies 92% of the actual instances of "Books."
- Clothing & Accessories: 0.94 - 94% of instances of "Clothing & Accessories" are correctly identified.
- Electronics: 0.90 - The model captures 90% of instances of "Electronics."
- Household: 0.98 - An impressive 98% of instances of "Household" are correctly identified.

**F1-Score**:
<br>
- Books: 0.95 - The harmonic mean of precision and recall for "Books."
- Clothing & Accessories: 0.96 - The F1-score for "Clothing & Accessories."
- Electronics: 0.93 - The harmonic mean of precision and recall for "Electronics."
- Household: 0.94 - The F1-score for "Household."

**Support**:
<br>
- Books: 2364 - There are 2364 instances of "Books" in the test set.
- Clothing & Accessories: 1734 - There are 1734 instances of "Clothing & Accessories."
- Electronics: 2124 - There are 2124 instances of "Electronics."
- Household: 3863 - There are 3863 instances of "Household."

**Accuracy**:
<br>
The overall accuracy of the MultinomialNB model is approximately 94%, meaning it correctly classifies instances around 94% of the time.

**Macro Avg and Weighted Avg**:
<br>
- Macro Avg: 0.94 - The average precision, recall, and F1-score across all classes without considering class imbalance.
- Weighted Avg: 0.94 - Similar to macro avg, but considering the number of samples for each class, giving more weight to classes with more instances.

**Interpretation**:
<br>
The MultinomialNB model shows good performance across all classes.
It performs particularly well in correctly identifying instances of "Books," "Clothing & Accessories," and "Household."
The weighted average considers the class imbalance, providing a balanced overview of model performance.

In [26]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Conv1D, GlobalMaxPooling1D, Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [24]:
# Split the data into training and testing sets
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42, stratify=df['Label'])


# Advanced Model: Convolutional Neural Network (CNN)
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(train_data['Label'])
y_test_encoded = label_encoder.transform(test_data['Label'])

tokenizer = Tokenizer()
tokenizer.fit_on_texts(train_data['Stem'])
X_train_sequences = tokenizer.texts_to_sequences(train_data['Stem'])
X_test_sequences = tokenizer.texts_to_sequences(test_data['Stem'])

max_sequence_length = max(max(len(seq) for seq in X_train_sequences), max(len(seq) for seq in X_test_sequences))
X_train_padded = pad_sequences(X_train_sequences, maxlen=max_sequence_length, padding='post')
X_test_padded = pad_sequences(X_test_sequences, maxlen=max_sequence_length, padding='post')

embedding_dim = 50
vocab_size = len(tokenizer.word_index) + 1

cnn_model = Sequential()
cnn_model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_sequence_length))
cnn_model.add(Conv1D(filters=128, kernel_size=5, activation='relu'))
cnn_model.add(GlobalMaxPooling1D())
cnn_model.add(Dense(64, activation='relu'))
cnn_model.add(Dropout(0.5))
cnn_model.add(Dense(len(label_encoder.classes_), activation='softmax'))

cnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
cnn_model.fit(X_train_padded, y_train_encoded, epochs=5, batch_size=64, validation_split=0.2)

cnn_accuracy = cnn_model.evaluate(X_test_padded, y_test_encoded, verbose=0)[1]
print("\nAdvanced Model (Convolutional Neural Network) Results:")
print(f"Accuracy: {cnn_accuracy}")


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

Advanced Model (Convolutional Neural Network) Results:
Accuracy: 0.9734258651733398


In [25]:
from sklearn.metrics import classification_report

cnn_predictions = cnn_model.predict(X_test_padded)
cnn_predicted_labels = label_encoder.inverse_transform(cnn_predictions.argmax(axis=1))
cnn_true_labels = test_data['Label']

# Convert labels to numerical values for scikit-learn metrics
cnn_true_encoded = label_encoder.transform(cnn_true_labels)

# Generate classification report
cnn_report = classification_report(cnn_true_encoded, cnn_predictions.argmax(axis=1), target_names=label_encoder.classes_)

# Print the classification report
print("Classification Report for CNN Model:")
print(cnn_report)

Classification Report for CNN Model:
                        precision    recall  f1-score   support

                 Books       0.97      0.97      0.97      2364
Clothing & Accessories       0.98      0.98      0.98      1734
           Electronics       0.97      0.97      0.97      2124
             Household       0.97      0.97      0.97      3863

              accuracy                           0.97     10085
             macro avg       0.97      0.97      0.97     10085
          weighted avg       0.97      0.97      0.97     10085



##### CNN Model Evaluation: 
- **Precision**:
<br>
Precision is the ratio of correctly predicted positive observations to the total predicted positives.
For each class, precision is calculated as TP / (TP + FP), where TP is the number of true positives and FP is the number of false positives.
In your report, precision values range from 0.97 to 0.98, indicating high precision for all classes.
<br>
- **Recall**:
<br>
Recall (or sensitivity or true positive rate) is the ratio of correctly predicted positive observations to the all observations in the actual class.
For each class, recall is calculated as TP / (TP + FN), where TP is the number of true positives and FN is the number of false negatives.
In your report, recall values range from 0.97 to 0.98, indicating high recall for all classes.
<br>
- **F1-Score**:
<br>
F1-score is the weighted average of precision and recall. It considers both false positives and false negatives.
For each class, F1-score is calculated as 2 * (precision * recall) / (precision + recall).
In your report, F1-score values range from 0.97 to 0.98, indicating a good balance between precision and recall for all classes.
<br>
- **Support**:
<br>
Support is the number of actual occurrences of the class in the specified dataset.
It gives you an idea of how many samples in your test set belong to each class.
<br>
- **Overall Accuracy**:
<br>
The accuracy is the overall correctly predicted instances divided by the total instances.
In your case, the overall accuracy is around 97%, indicating the proportion of correctly classified instances.
<br>
- **Macro Avg and Weighted Avg**:
<br>
Macro avg is the average of precision, recall, and F1-score across all classes without considering class imbalance.
Weighted avg is the same as macro avg, but it considers the number of samples for each class, giving more weight to classes with more instances.
<br>
- **Interpretation**:
<br>
The high precision, recall, and F1-score values for each class and the overall accuracy indicate that our CNN model is performing well on the test set.