### 🗞️ecommerce Dataset

📂Step 1: Install & Import Libraries

In [17]:
import pandas as pd
import spacy

🗂️Step 2: Load data set

In [7]:
df = pd.read_csv("ecommerceDataset.csv") 
df.head()

Unnamed: 0,Household,"Paper Plane Design Framed Wall Hanging Motivational Office Decor Art Prints (8.7 X 8.7 inch) - Set of 4 Painting made up in synthetic frame with uv textured print which gives multi effects and attracts towards it. This is an special series of paintings which makes your wall very beautiful and gives a royal touch. This painting is ready to hang, you would be proud to possess this unique painting that is a niche apart. We use only the most modern and efficient printing technology on our prints, with only the and inks and precision epson, roland and hp printers. This innovative hd printing technique results in durable and spectacular looking prints of the highest that last a lifetime. We print solely with top-notch 100% inks, to achieve brilliant and true colours. Due to their high level of uv resistance, our prints retain their beautiful colours for many years. Add colour and style to your living space with this digitally printed painting. Some are for pleasure and some for eternal bliss.so bring home this elegant print that is lushed with rich colors that makes it nothing but sheer elegance to be to your friends and family.it would be treasured forever by whoever your lucky recipient is. Liven up your place with these intriguing paintings that are high definition hd graphic digital prints for home, office or any room."
0,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ..."
1,Household,SAF 'UV Textured Modern Art Print Framed' Pain...
2,Household,"SAF Flower Print Framed Painting (Synthetic, 1..."
3,Household,Incredible Gifts India Wooden Happy Birthday U...
4,Household,Pitaara Box Romantic Venice Canvas Painting 6m...


In [8]:
df.columns = ['label', 'text']
df.head()

Unnamed: 0,label,text
0,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ..."
1,Household,SAF 'UV Textured Modern Art Print Framed' Pain...
2,Household,"SAF Flower Print Framed Painting (Synthetic, 1..."
3,Household,Incredible Gifts India Wooden Happy Birthday U...
4,Household,Pitaara Box Romantic Venice Canvas Painting 6m...


🔍Step 3: Explore Data

In [9]:
df.shape

(50424, 2)

In [10]:
df.isnull().sum()

label    0
text     1
dtype: int64

In [11]:
df.dropna(inplace=True)

In [12]:
df.duplicated().sum()

22622

In [14]:
df.drop_duplicates(inplace=True)
df.shape

(27801, 2)

In [15]:
df.label.value_counts()

label
Household                 10563
Books                      6256
Clothing & Accessories     5674
Electronics                5308
Name: count, dtype: int64

In [16]:
df['label'] = df['label'].replace('Clothing & Accessories', 'Clothing_Accessories')
df.label.value_counts()

label
Household               10563
Books                    6256
Clothing_Accessories     5674
Electronics              5308
Name: count, dtype: int64

🔃Step 4: Preprocess Text with spaCy

- Lemmatize, 
- remove stopwords 
- punctuation 
- lowercase

In [18]:
nlp = spacy.load("en_core_web_sm")

def preprocess(text):
    doc = nlp(text)
    # Lemmatize, remove stopwords, punctuation, and lowercase
    tokens = [token.lemma_.lower() for token in doc 
              if not token.is_stop and not token.is_punct and not token.is_space]
    return " ".join(tokens)


In [19]:
df["processed_text"] = df["text"].apply(preprocess)
df.head()

Unnamed: 0,label,text,processed_text
0,Household,"SAF 'Floral' Framed Painting (Wood, 30 inch x ...",saf floral framed painting wood 30 inch x 10 i...
1,Household,SAF 'UV Textured Modern Art Print Framed' Pain...,saf uv texture modern art print framed paintin...
2,Household,"SAF Flower Print Framed Painting (Synthetic, 1...",saf flower print framed painting synthetic 13....
3,Household,Incredible Gifts India Wooden Happy Birthday U...,incredible gifts india wooden happy birthday u...
4,Household,Pitaara Box Romantic Venice Canvas Painting 6m...,pitaara box romantic venice canvas painting 6 ...


In [22]:
df.to_csv("processed_ecommerceDataset.csv", index=False)