# E-commerce Text Classification

### Data Description:
This is the classification based E-commerce text dataset for 4 categories - "Electronics", "Household", "Books" and "Clothing & Accessories", which almost cover 80% of any E-commerce website.
The dataset is in ".csv" format with two columns - the first column is the class name and the second one is the data point of that class. The data point is the product and description from the e-commerce website.

### Dataset:
The dataset has the following features:
1. Data Set Characteristics: Multivariate
2. Number of Instances: 50424
3. Number of classes: 4

### Objective:
To implement the techniques learnt as a part of the course.

#### Import the libraries, load dataset. (3 Marks)

In [2]:
# install and import necessary libraries.

!pip install contractions

import re, string, unicodedata                          # Import Regex, string and unicodedata.
import contractions                                     # Import contractions library.
from bs4 import BeautifulSoup                           # Import BeautifulSoup.

import numpy as np                                      # Import numpy.
import pandas as pd                                     # Import pandas.
import nltk                                             # Import Natural Language Tool-Kit.

nltk.download('stopwords')                              # Download Stopwords.
nltk.download('punkt')
nltk.download('wordnet')

from nltk.corpus import stopwords                       # Import stopwords.
from nltk.tokenize import word_tokenize, sent_tokenize  # Import Tokenizer.
from nltk.stem.wordnet import WordNetLemmatizer         # Import Lemmatizer.



[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Prachi\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Prachi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Prachi\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


#### Exploratory Data Analysis and Understanding of data-columns: (12 Marks)
1. Print Shape of data.
2. Print data description and info about the data. Comment about the result.
3. Check the data-type of Text column’s first value.
4. Check for null values and remove the rows in which null values are present.
5. Check for unique labels in the ‘Label’ column.
6. Save the unique labels in the list named ‘labels’.
7. Print first 5 rows of data.

In [5]:
# Loading data into pandas dataframe
data = pd.read_csv('ecommerceDataset.csv')

In [12]:
# Printing Shape of data.
data.shape

(50428, 204)

In [9]:
# Printing the info 
data.info()

# Here is we show that our data set has 204 columns but we have only 2 columns.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50428 entries, 0 to 50427
Columns: 204 entries, Label to Unnamed: 203
dtypes: object(204)
memory usage: 78.5+ MB


In [19]:
data = data[['Label', 'Text']]

In [20]:
# Printing the decription
data.describe().transpose()

Unnamed: 0,count,unique,top,freq
Label,50428,7,Household,19312
Text,50427,27803,"Think & Grow Rich About the Author NAPOLEON HILL, born in Pound, Southwest Virginia in 1883, was a very successful American author in the area of the new thought movement—one of the earliest producers of the modern genre of personal-success literature. He is widely considered to be one of the great writers on success. The turning point in Hill’s life occurred in the year 1908 when he interviewed the industrialist Andrew Carnegie—one of the most powerful men in the world at that time, as part of an assignment—an interview which ultimately led to the publication of Think and Grow Rich, one of his best-selling books of all time. the book examines the power of personal beliefs and the role they play in personal success. Hill, who had even served as the advisor to President Franklin D. Roosevelt from 1933-36, passed away at the age of 87.",30


In [21]:
# Printing the data types and description of the independent attributes
data.dtypes

# The data type of the Text column is of type object

Label    object
Text     object
dtype: object

In [27]:
# Cheking for any null values in the data set.
data.isnull().values.any()

True

In [28]:
data.isnull().sum(axis=0)
# Text column has one null value

Label    0
Text     1
dtype: int64

In [29]:
# Dropping the null value from Text column
data.dropna(subset=['Text'], inplace=True)

In [34]:
data.isnull().sum(axis=0)

Label    0
Text     0
dtype: int64

In [30]:
# Display full dataframe information.
pd.set_option('display.max_colwidth', None) 

In [32]:
# Checking for unique Label
data['Label'].unique()

array(['Household', 'Clothing & Accessories', 'Electronics', 'Books',
       'cularists have little personal experience of religion and can be strikingly ignorant on religious \xa0subjects. \xa0There’s also a reflexive hostility to institutional religion',
       'arate us. We will probably be married another ten years.Elizabeth Taylor',
       ' our lives in our own hands. 2Certainty Is an IllusionNothing will ever separate us. We will probably be married another ten years.Elizabeth Taylor'],
      dtype=object)

In [33]:
data['Label'].value_counts()

Household                                                                                                                                                                     19312
Books                                                                                                                                                                         11820
Electronics                                                                                                                                                                   10621
Clothing & Accessories                                                                                                                                                         8670
arate us. We will probably be married another ten years.Elizabeth Taylor                                                                                                          2
cularists have little personal experience of religion and can be strikingly ignorant on religious  s

In [35]:
# Printing the first 5 rows of data.
data.head(5)

Unnamed: 0,Label,Text
0,Household,"Styleys Wrought Iron Coat Rack Hanger Creative Fashion Bedroom for Hanging Clothes Shelves, Wrought Iron Racks Standing Coat Rack (Black) Color Name:Black Styleys Coat Stand is great for homes and rooms with limited space, as having one standing rack takes up less space compared to drawers and cupboards. Easy for guests to keep their items, especially bags and scarves, when visiting, as they can always keep an eye on it and easily grab it when they're leaving. Makes a smart décor piece for your home or room as occupied stands can show off your stylish handbags, accessories, and hangman achievement medals. Dimensions: 45cm x 31cm x 175cm Weight: 2.4kg Material: steel Colour: white, black, or pink No. of hook: 7 + 3 (straight pegs) Suitable to hang coats, clothes, scarves, handbags, hats, and accessories"
1,Household,"Cuisinart CCO-50BKN Deluxe Electric Can Opener, Black Size:None | Color Name:Black Style, convenience, and power come together in the Cuisinart electric can open. With chrome accents and elegant contours, it fits nicely with other modern countertop appliances. The easy single-touc"
2,Household,Anchor Penta 6 Amp 1 -Way Switch (White) - Pack of 20 Anchor Penta 6 Amp 1 -Way Switch (White)- Pack of 20 comes with Spark Shield - Concealed Terminals - Silver Cadmium Contacts - IP 20 Protection - Captive Screw.
3,Clothing & Accessories,"Proline Men's Track Jacket Proline Woven, 100% Polyester High neck Wind Cheater with colour Blocked Detail"
4,Household,"Chef's Garage 2 Slot Edge Grip Kitchen Knife Sharpener, Helps to Sharpen The Dull Knives (Black) Chef's Garage Mini Knife sharpener helps to sharpen your dull knives. This tiny knife sharpener has 2 stage sharpening system. First stage is for damaged and dull knives, it will sharpen the knife on the coarse slot. The coarse slot is made of carbide. Second stage is fine slot, once you have honed the knife on coarse slot it will helps to give the finishing touch. The fine slot is made of ceramic for fine sharpening. It’s give a quick touch up on already sharper knives or for finishing off knives that have already passed through the coarse slot.Also it comes with one of the unique edge grip feature to sharpen on the edge of the table or counter top. Key Features: Very easy to use. Non-slip base for added stability and control Carbide and ceramic blades on these sharpening slots are long lasting. Strong and hard with flexibility of an edge grip feature for bigger knives Small in size 9.50 x 5.0 x 4.50 cms. Weights less - 70 grams Instructions:1. Insert the blade into the slot at a 90-degree angle to the mini sharpener.2. Place the edge in coarse slot (Black in color)3. Pull the knife straight back towards you 2 to 3 times while applying a light pressure.4. Place the blade in fine slot (White in color)5. Pull the knife straight back towards you 5 to 6 times while applying a heavy pressure.6. If blade is still dull repeat these steps until blade is sharp."


#### Text pre-processing: Data preparation. (15Marks)
1. Html tag removal.
2. Remove the numbers.
3. Tokenization.
4. Removal of Special Characters and Punctuations.
5. Conversion to lowercase.
6. Lemmatize or stemming.
7. Join the words in the list to convert back to text string in the dataframe. (So that each row contains the data in text format.)
8. Print first 5 rows of data after pre-processing.

In [36]:
# Removal of html tags
def strip_html(text):
    soup = BeautifulSoup(text, "html.parser")
    return soup.get_text()

data['Text'] = data['Text'].apply(lambda x: strip_html(x))
data.head()

Unnamed: 0,Label,Text
0,Household,"Styleys Wrought Iron Coat Rack Hanger Creative Fashion Bedroom for Hanging Clothes Shelves, Wrought Iron Racks Standing Coat Rack (Black) Color Name:Black Styleys Coat Stand is great for homes and rooms with limited space, as having one standing rack takes up less space compared to drawers and cupboards. Easy for guests to keep their items, especially bags and scarves, when visiting, as they can always keep an eye on it and easily grab it when they're leaving. Makes a smart décor piece for your home or room as occupied stands can show off your stylish handbags, accessories, and hangman achievement medals. Dimensions: 45cm x 31cm x 175cm Weight: 2.4kg Material: steel Colour: white, black, or pink No. of hook: 7 + 3 (straight pegs) Suitable to hang coats, clothes, scarves, handbags, hats, and accessories"
1,Household,"Cuisinart CCO-50BKN Deluxe Electric Can Opener, Black Size:None | Color Name:Black Style, convenience, and power come together in the Cuisinart electric can open. With chrome accents and elegant contours, it fits nicely with other modern countertop appliances. The easy single-touc"
2,Household,Anchor Penta 6 Amp 1 -Way Switch (White) - Pack of 20 Anchor Penta 6 Amp 1 -Way Switch (White)- Pack of 20 comes with Spark Shield - Concealed Terminals - Silver Cadmium Contacts - IP 20 Protection - Captive Screw.
3,Clothing & Accessories,"Proline Men's Track Jacket Proline Woven, 100% Polyester High neck Wind Cheater with colour Blocked Detail"
4,Household,"Chef's Garage 2 Slot Edge Grip Kitchen Knife Sharpener, Helps to Sharpen The Dull Knives (Black) Chef's Garage Mini Knife sharpener helps to sharpen your dull knives. This tiny knife sharpener has 2 stage sharpening system. First stage is for damaged and dull knives, it will sharpen the knife on the coarse slot. The coarse slot is made of carbide. Second stage is fine slot, once you have honed the knife on coarse slot it will helps to give the finishing touch. The fine slot is made of ceramic for fine sharpening. It’s give a quick touch up on already sharper knives or for finishing off knives that have already passed through the coarse slot.Also it comes with one of the unique edge grip feature to sharpen on the edge of the table or counter top. Key Features: Very easy to use. Non-slip base for added stability and control Carbide and ceramic blades on these sharpening slots are long lasting. Strong and hard with flexibility of an edge grip feature for bigger knives Small in size 9.50 x 5.0 x 4.50 cms. Weights less - 70 grams Instructions:1. Insert the blade into the slot at a 90-degree angle to the mini sharpener.2. Place the edge in coarse slot (Black in color)3. Pull the knife straight back towards you 2 to 3 times while applying a light pressure.4. Place the blade in fine slot (White in color)5. Pull the knife straight back towards you 5 to 6 times while applying a heavy pressure.6. If blade is still dull repeat these steps until blade is sharp."


In [38]:
# Removal of the numbers
def remove_numbers(text):
  text = re.sub(r'\d+', '', text)
  return text

data['Text'] = data['Text'].apply(lambda x: remove_numbers(x))
data.head()

Unnamed: 0,Label,Text
0,Household,"Styleys Wrought Iron Coat Rack Hanger Creative Fashion Bedroom for Hanging Clothes Shelves, Wrought Iron Racks Standing Coat Rack (Black) Color Name:Black Styleys Coat Stand is great for homes and rooms with limited space, as having one standing rack takes up less space compared to drawers and cupboards. Easy for guests to keep their items, especially bags and scarves, when visiting, as they can always keep an eye on it and easily grab it when they're leaving. Makes a smart décor piece for your home or room as occupied stands can show off your stylish handbags, accessories, and hangman achievement medals. Dimensions: cm x cm x cm Weight: .kg Material: steel Colour: white, black, or pink No. of hook: + (straight pegs) Suitable to hang coats, clothes, scarves, handbags, hats, and accessories"
1,Household,"Cuisinart CCO-BKN Deluxe Electric Can Opener, Black Size:None | Color Name:Black Style, convenience, and power come together in the Cuisinart electric can open. With chrome accents and elegant contours, it fits nicely with other modern countertop appliances. The easy single-touc"
2,Household,Anchor Penta Amp -Way Switch (White) - Pack of Anchor Penta Amp -Way Switch (White)- Pack of comes with Spark Shield - Concealed Terminals - Silver Cadmium Contacts - IP Protection - Captive Screw.
3,Clothing & Accessories,"Proline Men's Track Jacket Proline Woven, % Polyester High neck Wind Cheater with colour Blocked Detail"
4,Household,"Chef's Garage Slot Edge Grip Kitchen Knife Sharpener, Helps to Sharpen The Dull Knives (Black) Chef's Garage Mini Knife sharpener helps to sharpen your dull knives. This tiny knife sharpener has stage sharpening system. First stage is for damaged and dull knives, it will sharpen the knife on the coarse slot. The coarse slot is made of carbide. Second stage is fine slot, once you have honed the knife on coarse slot it will helps to give the finishing touch. The fine slot is made of ceramic for fine sharpening. It’s give a quick touch up on already sharper knives or for finishing off knives that have already passed through the coarse slot.Also it comes with one of the unique edge grip feature to sharpen on the edge of the table or counter top. Key Features: Very easy to use. Non-slip base for added stability and control Carbide and ceramic blades on these sharpening slots are long lasting. Strong and hard with flexibility of an edge grip feature for bigger knives Small in size . x . x . cms. Weights less - grams Instructions:. Insert the blade into the slot at a -degree angle to the mini sharpener.. Place the edge in coarse slot (Black in color). Pull the knife straight back towards you to times while applying a light pressure.. Place the blade in fine slot (White in color). Pull the knife straight back towards you to times while applying a heavy pressure.. If blade is still dull repeat these steps until blade is sharp."


In [39]:
# Tokenization of data
data['Text'] = data.apply(lambda row: nltk.word_tokenize(row['Text']), axis=1)

In [40]:
# Data after it has been tokenized
data.head()                                                                   

Unnamed: 0,Label,Text
0,Household,"[Styleys, Wrought, Iron, Coat, Rack, Hanger, Creative, Fashion, Bedroom, for, Hanging, Clothes, Shelves, ,, Wrought, Iron, Racks, Standing, Coat, Rack, (, Black, ), Color, Name, :, Black, Styleys, Coat, Stand, is, great, for, homes, and, rooms, with, limited, space, ,, as, having, one, standing, rack, takes, up, less, space, compared, to, drawers, and, cupboards, ., Easy, for, guests, to, keep, their, items, ,, especially, bags, and, scarves, ,, when, visiting, ,, as, they, can, always, keep, an, eye, on, it, and, easily, grab, it, when, they, 're, leaving, ., Makes, a, smart, décor, piece, for, your, home, or, room, as, ...]"
1,Household,"[Cuisinart, CCO-BKN, Deluxe, Electric, Can, Opener, ,, Black, Size, :, None, |, Color, Name, :, Black, Style, ,, convenience, ,, and, power, come, together, in, the, Cuisinart, electric, can, open, ., With, chrome, accents, and, elegant, contours, ,, it, fits, nicely, with, other, modern, countertop, appliances, ., The, easy, single-touc]"
2,Household,"[Anchor, Penta, Amp, -Way, Switch, (, White, ), -, Pack, of, Anchor, Penta, Amp, -Way, Switch, (, White, ), -, Pack, of, comes, with, Spark, Shield, -, Concealed, Terminals, -, Silver, Cadmium, Contacts, -, IP, Protection, -, Captive, Screw, .]"
3,Clothing & Accessories,"[Proline, Men, 's, Track, Jacket, Proline, Woven, ,, %, Polyester, High, neck, Wind, Cheater, with, colour, Blocked, Detail]"
4,Household,"[Chef, 's, Garage, Slot, Edge, Grip, Kitchen, Knife, Sharpener, ,, Helps, to, Sharpen, The, Dull, Knives, (, Black, ), Chef, 's, Garage, Mini, Knife, sharpener, helps, to, sharpen, your, dull, knives, ., This, tiny, knife, sharpener, has, stage, sharpening, system, ., First, stage, is, for, damaged, and, dull, knives, ,, it, will, sharpen, the, knife, on, the, coarse, slot, ., The, coarse, slot, is, made, of, carbide, ., Second, stage, is, fine, slot, ,, once, you, have, honed, the, knife, on, coarse, slot, it, will, helps, to, give, the, finishing, touch, ., The, fine, slot, is, made, of, ceramic, for, ...]"


In [48]:
# Removal of Special Characters and Punctuations
def remove_punctuation(text):
    new_text = []                        
    for word in text:
        new_word = re.sub(r'[^\w\s]', '', word)
        if new_word != '':
            new_text.append(new_word)    
    return new_text

data['Text'] = data['Text'].apply(lambda x: remove_punctuation(x))
data.head()

Unnamed: 0,Label,Text
0,Household,"[Styleys, Wrought, Iron, Coat, Rack, Hanger, Creative, Fashion, Bedroom, for, Hanging, Clothes, Shelves, Wrought, Iron, Racks, Standing, Coat, Rack, Black, Color, Name, Black, Styleys, Coat, Stand, is, great, for, homes, and, rooms, with, limited, space, as, having, one, standing, rack, takes, up, less, space, compared, to, drawers, and, cupboards, Easy, for, guests, to, keep, their, items, especially, bags, and, scarves, when, visiting, as, they, can, always, keep, an, eye, on, it, and, easily, grab, it, when, they, re, leaving, Makes, a, smart, décor, piece, for, your, home, or, room, as, occupied, stands, can, show, off, your, stylish, handbags, accessories, and, ...]"
1,Household,"[Cuisinart, CCOBKN, Deluxe, Electric, Can, Opener, Black, Size, None, Color, Name, Black, Style, convenience, and, power, come, together, in, the, Cuisinart, electric, can, open, With, chrome, accents, and, elegant, contours, it, fits, nicely, with, other, modern, countertop, appliances, The, easy, singletouc]"
2,Household,"[Anchor, Penta, Amp, Way, Switch, White, Pack, of, Anchor, Penta, Amp, Way, Switch, White, Pack, of, comes, with, Spark, Shield, Concealed, Terminals, Silver, Cadmium, Contacts, IP, Protection, Captive, Screw]"
3,Clothing & Accessories,"[Proline, Men, s, Track, Jacket, Proline, Woven, Polyester, High, neck, Wind, Cheater, with, colour, Blocked, Detail]"
4,Household,"[Chef, s, Garage, Slot, Edge, Grip, Kitchen, Knife, Sharpener, Helps, to, Sharpen, The, Dull, Knives, Black, Chef, s, Garage, Mini, Knife, sharpener, helps, to, sharpen, your, dull, knives, This, tiny, knife, sharpener, has, stage, sharpening, system, First, stage, is, for, damaged, and, dull, knives, it, will, sharpen, the, knife, on, the, coarse, slot, The, coarse, slot, is, made, of, carbide, Second, stage, is, fine, slot, once, you, have, honed, the, knife, on, coarse, slot, it, will, helps, to, give, the, finishing, touch, The, fine, slot, is, made, of, ceramic, for, fine, sharpening, It, s, give, a, quick, touch, up, on, ...]"


In [55]:
def remove_special_characters(text, remove_digits=False):
    pattern = r'[^a-zA-z0-9\s]' if not remove_digits else r'[^a-zA-z\s]'
    text = re.sub(pattern, '', text)
    return text

data.head()

Unnamed: 0,Label,Text
0,Household,"[Styleys, Wrought, Iron, Coat, Rack, Hanger, Creative, Fashion, Bedroom, for, Hanging, Clothes, Shelves, Wrought, Iron, Racks, Standing, Coat, Rack, Black, Color, Name, Black, Styleys, Coat, Stand, is, great, for, homes, and, rooms, with, limited, space, as, having, one, standing, rack, takes, up, less, space, compared, to, drawers, and, cupboards, Easy, for, guests, to, keep, their, items, especially, bags, and, scarves, when, visiting, as, they, can, always, keep, an, eye, on, it, and, easily, grab, it, when, they, re, leaving, Makes, a, smart, décor, piece, for, your, home, or, room, as, occupied, stands, can, show, off, your, stylish, handbags, accessories, and, ...]"
1,Household,"[Cuisinart, CCOBKN, Deluxe, Electric, Can, Opener, Black, Size, None, Color, Name, Black, Style, convenience, and, power, come, together, in, the, Cuisinart, electric, can, open, With, chrome, accents, and, elegant, contours, it, fits, nicely, with, other, modern, countertop, appliances, The, easy, singletouc]"
2,Household,"[Anchor, Penta, Amp, Way, Switch, White, Pack, of, Anchor, Penta, Amp, Way, Switch, White, Pack, of, comes, with, Spark, Shield, Concealed, Terminals, Silver, Cadmium, Contacts, IP, Protection, Captive, Screw]"
3,Clothing & Accessories,"[Proline, Men, s, Track, Jacket, Proline, Woven, Polyester, High, neck, Wind, Cheater, with, colour, Blocked, Detail]"
4,Household,"[Chef, s, Garage, Slot, Edge, Grip, Kitchen, Knife, Sharpener, Helps, to, Sharpen, The, Dull, Knives, Black, Chef, s, Garage, Mini, Knife, sharpener, helps, to, sharpen, your, dull, knives, This, tiny, knife, sharpener, has, stage, sharpening, system, First, stage, is, for, damaged, and, dull, knives, it, will, sharpen, the, knife, on, the, coarse, slot, The, coarse, slot, is, made, of, carbide, Second, stage, is, fine, slot, once, you, have, honed, the, knife, on, coarse, slot, it, will, helps, to, give, the, finishing, touch, The, fine, slot, is, made, of, ceramic, for, fine, sharpening, It, s, give, a, quick, touch, up, on, ...]"


In [56]:
# Conversion to lowercase
def to_lowercase(text):
    new_text = []                        
    for word in text:
        new_word = word.lower()           
        new_text.append(new_word)        
    return new_text

data['Text'] = data['Text'].apply(lambda x: to_lowercase(x))
data.head()

Unnamed: 0,Label,Text
0,Household,"[styleys, wrought, iron, coat, rack, hanger, creative, fashion, bedroom, for, hanging, clothes, shelves, wrought, iron, racks, standing, coat, rack, black, color, name, black, styleys, coat, stand, is, great, for, homes, and, rooms, with, limited, space, as, having, one, standing, rack, takes, up, less, space, compared, to, drawers, and, cupboards, easy, for, guests, to, keep, their, items, especially, bags, and, scarves, when, visiting, as, they, can, always, keep, an, eye, on, it, and, easily, grab, it, when, they, re, leaving, makes, a, smart, décor, piece, for, your, home, or, room, as, occupied, stands, can, show, off, your, stylish, handbags, accessories, and, ...]"
1,Household,"[cuisinart, ccobkn, deluxe, electric, can, opener, black, size, none, color, name, black, style, convenience, and, power, come, together, in, the, cuisinart, electric, can, open, with, chrome, accents, and, elegant, contours, it, fits, nicely, with, other, modern, countertop, appliances, the, easy, singletouc]"
2,Household,"[anchor, penta, amp, way, switch, white, pack, of, anchor, penta, amp, way, switch, white, pack, of, comes, with, spark, shield, concealed, terminals, silver, cadmium, contacts, ip, protection, captive, screw]"
3,Clothing & Accessories,"[proline, men, s, track, jacket, proline, woven, polyester, high, neck, wind, cheater, with, colour, blocked, detail]"
4,Household,"[chef, s, garage, slot, edge, grip, kitchen, knife, sharpener, helps, to, sharpen, the, dull, knives, black, chef, s, garage, mini, knife, sharpener, helps, to, sharpen, your, dull, knives, this, tiny, knife, sharpener, has, stage, sharpening, system, first, stage, is, for, damaged, and, dull, knives, it, will, sharpen, the, knife, on, the, coarse, slot, the, coarse, slot, is, made, of, carbide, second, stage, is, fine, slot, once, you, have, honed, the, knife, on, coarse, slot, it, will, helps, to, give, the, finishing, touch, the, fine, slot, is, made, of, ceramic, for, fine, sharpening, it, s, give, a, quick, touch, up, on, ...]"


In [59]:
# Lemmatization and Joining the words in the list to convert back to text string in the dataframe.

lemmatizer = WordNetLemmatizer()

def to_lowercase(words):
    new_words = []
    for word in words:
        new_word = word.lower()
        new_words.append(new_word)
    return new_words

def remove_punctuation(words):
    new_words = []
    for word in words:
        new_word = re.sub(r'[^\w\s]', '', word)
        if new_word != '':
            new_words.append(new_word)
    return new_words

def lemmatize_list(words):
    new_words = []
    for word in words:
      new_words.append(lemmatizer.lemmatize(word, pos='v'))
    return new_words

def normalize(words):
    words = to_lowercase(words)
    words = remove_punctuation(words)
    words = lemmatize_list(words)
    return ' '.join(words)

data['Text'] = data.apply(lambda row: normalize(row['Text']), axis=1)
data.head()

Unnamed: 0,Label,Text
0,Household,styleys work iron coat rack hanger creative fashion bedroom for hang clothe shelve work iron rack stand coat rack black color name black styleys coat stand be great for home and room with limit space as have one stand rack take up less space compare to drawers and cupboards easy for guests to keep their items especially bag and scarves when visit as they can always keep an eye on it and easily grab it when they re leave make a smart décor piece for your home or room as occupy stand can show off your stylish handbags accessories and hangman achievement medals dimension cm x cm x cm weight kg material steel colour white black or pink no of hook straight peg suitable to hang coat clothe scarves handbags hat and accessories
1,Household,cuisinart ccobkn deluxe electric can opener black size none color name black style convenience and power come together in the cuisinart electric can open with chrome accent and elegant contour it fit nicely with other modern countertop appliances the easy singletouc
2,Household,anchor penta amp way switch white pack of anchor penta amp way switch white pack of come with spark shield conceal terminals silver cadmium contact ip protection captive screw
3,Clothing & Accessories,proline men s track jacket proline weave polyester high neck wind cheater with colour block detail
4,Household,chef s garage slot edge grip kitchen knife sharpener help to sharpen the dull knives black chef s garage mini knife sharpener help to sharpen your dull knives this tiny knife sharpener have stage sharpen system first stage be for damage and dull knives it will sharpen the knife on the coarse slot the coarse slot be make of carbide second stage be fine slot once you have hone the knife on coarse slot it will help to give the finish touch the fine slot be make of ceramic for fine sharpen it s give a quick touch up on already sharper knives or for finish off knives that have already pass through the coarse slotalso it come with one of the unique edge grip feature to sharpen on the edge of the table or counter top key feature very easy to use nonslip base for add stability and control carbide and ceramic blades on these sharpen slot be long last strong and hard with flexibility of an edge grip feature for bigger knives small in size x x cms weight less grams instructions insert the blade into the slot at a degree angle to the mini sharpener place the edge in coarse slot black in color pull the knife straight back towards you to time while apply a light pressure place the blade in fine slot white in color pull the knife straight back towards you to time while apply a heavy pressure if blade be still dull repeat these step until blade be sharp


In [60]:
# Print first 5 rows of data after pre-processing
data.head(5)

Unnamed: 0,Label,Text
0,Household,styleys work iron coat rack hanger creative fashion bedroom for hang clothe shelve work iron rack stand coat rack black color name black styleys coat stand be great for home and room with limit space as have one stand rack take up less space compare to drawers and cupboards easy for guests to keep their items especially bag and scarves when visit as they can always keep an eye on it and easily grab it when they re leave make a smart décor piece for your home or room as occupy stand can show off your stylish handbags accessories and hangman achievement medals dimension cm x cm x cm weight kg material steel colour white black or pink no of hook straight peg suitable to hang coat clothe scarves handbags hat and accessories
1,Household,cuisinart ccobkn deluxe electric can opener black size none color name black style convenience and power come together in the cuisinart electric can open with chrome accent and elegant contour it fit nicely with other modern countertop appliances the easy singletouc
2,Household,anchor penta amp way switch white pack of anchor penta amp way switch white pack of come with spark shield conceal terminals silver cadmium contact ip protection captive screw
3,Clothing & Accessories,proline men s track jacket proline weave polyester high neck wind cheater with colour block detail
4,Household,chef s garage slot edge grip kitchen knife sharpener help to sharpen the dull knives black chef s garage mini knife sharpener help to sharpen your dull knives this tiny knife sharpener have stage sharpen system first stage be for damage and dull knives it will sharpen the knife on the coarse slot the coarse slot be make of carbide second stage be fine slot once you have hone the knife on coarse slot it will help to give the finish touch the fine slot be make of ceramic for fine sharpen it s give a quick touch up on already sharper knives or for finish off knives that have already pass through the coarse slotalso it come with one of the unique edge grip feature to sharpen on the edge of the table or counter top key feature very easy to use nonslip base for add stability and control carbide and ceramic blades on these sharpen slot be long last strong and hard with flexibility of an edge grip feature for bigger knives small in size x x cms weight less grams instructions insert the blade into the slot at a degree angle to the mini sharpener place the edge in coarse slot black in color pull the knife straight back towards you to time while apply a light pressure place the blade in fine slot white in color pull the knife straight back towards you to time while apply a heavy pressure if blade be still dull repeat these step until blade be sharp


#### Vectorization: (10Marks)
1. Use CountVectorizer. (use parameter: max_features=1000)
2. Use TfidfVectorizer. (use parameter: max_features=1000)

#### Fit and evaluate model using both type of vectorization. Print confusion matrix. (6+6Marks)

In [61]:
# Count vectorizer
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(max_features=1000)                
data_features = vectorizer.fit_transform(data['Text'])

data_features = data_features.toarray() 

In [62]:
data_features.shape

(50427, 1000)

In [63]:
X = data_features

y = data.Text

In [64]:
# Split data into training and testing set.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [66]:
# Using Random Forest to build model for the classification of reviews.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

forest = RandomForestClassifier(n_estimators=5, n_jobs=2)

forest = forest.fit(X_train, y_train)

print(forest)

print(np.mean(cross_val_score(forest, X, y, cv=10)))

MemoryError: could not allocate 6376652800 bytes

In [67]:
# Predicting the result for test data

result = forest.predict(X_test)

IndexError: list index out of range

In [68]:
# Ploting Confusion matrix

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

conf_mat = confusion_matrix(y_test, result)

print(conf_mat)

df_cm = pd.DataFrame(conf_mat, index = [i for i in ['Household', 'Books', 'Electronics','Clothing & Accessories']],
                  columns = [i for i in ['Household', 'Books', 'Electronics','Clothing & Accessories']])
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=True, fmt='g')

NameError: name 'result' is not defined

In [69]:
# Tfidf vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(max_features=1000)
data_features = vectorizer.fit_transform(data['Text'])

data_features = data_features.toarray()

data_features.shape

(50427, 1000)

In [70]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

import numpy as np

forest = RandomForestClassifier(n_estimators=5, n_jobs=2)

forest = forest.fit(X_train, y_train)

print(forest)

print(np.mean(cross_val_score(forest, X, y, cv=10)))

MemoryError: could not allocate 6376652800 bytes

In [71]:
result = forest.predict(X_test)

IndexError: list index out of range

In [72]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

conf_mat = confusion_matrix(y_test, result)

df_cm = pd.DataFrame(conf_mat, index = [i for i in ['Household', 'Books', 'Electronics','Clothing & Accessories']],
                  columns = [i for i in ['Household', 'Books', 'Electronics','Clothing & Accessories']])
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=True, fmt='g')

NameError: name 'result' is not defined

#### Summarize your understanding of the application of Various Pre-processing and Vectorization and performance of your model on this dataset. (8 Marks)

1. We used dataset which has ecommerce Dataset in text format and their Label ('Household', 'Books', 'Electronics','Clothing & Accessories').
2. We Pre-processed the data using variuos techniques and libraries.
3. The pre-precessed data is converted to numbers, so that we can feed the data in the model.
4. After building the classification model, we predicted the result for the test data.
5. We noticed that using the above techniques, our model performed good in perspective of how the text classification models perform.