## 2 -- DATA WRANGLING

Data wrangling—also called data cleaning, data remediation, or data munging—refers to a variety of processes designed to transform raw data into more readily used formats. The exact methods differ from project to project depending on the data you're leveraging and the goal you're trying to achieve.

### 2.1 -- Importing Dependencies

In [1]:
import pandas as pd
from tqdm import tqdm
import random # This module can be used to perform random actions such as generating random numbers, print random a value for a list or string, etc.
import warnings # The warning module is actually a subclass of Exception which is a built-in class in Python.

### 2.2 -- Reading csv file to perform Data Wrangling

In [2]:
# Using pandas .read method to read our csv file..
df_2 = pd.read_csv('01_Scraping_Unstructured_Data.csv')

In [3]:
# Formatting date & time
df_2['REVIEW_DATE'] = pd.to_datetime(df_2['REVIEW_DATE/TIME']).dt.date
df_2['REVIEW_TIME'] = pd.to_datetime(df_2['REVIEW_DATE/TIME']).dt.time
df_2['DATE_OF_CREATION'] = pd.to_datetime(df_2['DATE_OF_CREATION']).dt.date
df_2['LAST_UPDATED_DATE'] = pd.to_datetime(df_2['LAST_UPDATED_DATE']).dt.date
df_2

Unnamed: 0,ID,SKU,PRODUCT_NAME,PRICE,SLUG,URL,value,unit,PACK_SIZE,ALGOLIA_OBJECT_ID,...,CUSTOM_ATTRIBUTES,REVIEW_COUNT,REVIEW_DATE/TIME,REVIEWER_NAME,PRICE_RATING,QUALITY_RATING,VALUE_RATING,REVIEW_CONTENT,REVIEW_DATE,REVIEW_TIME
0,639,8904417300048,ME White Musk Eau De Parfum For a Fragrance Cl...,699.00,me-white-musk-eau-de-parfum-for-a-fragrance-cl...,https://mamaearth.in/product/me-white-musk-eau...,50,ml,50ml,1990186000,...,,,,,,,,,NaT,NaT
1,638,8904417300031,ME Floral Eau De Parfum - Live in the Moment -...,699.00,me-floral-eau-de-parfum-live-in-the-moment-50-ml,https://mamaearth.in/product/me-floral-eau-de-...,50,ml,50ml,1990184000,...,,,,,,,,,NaT,NaT
2,636,8904417300024,ME Oud Eau De Parfum to Unleash Your Confidenc...,699.00,me-oud-eau-de-parfum-to-unleash-your-confidenc...,https://mamaearth.in/product/me-oud-eau-de-par...,50,ml,50ml,1661771144595,...,,,,,,,,,NaT,NaT
3,634,8904417300055,ME First Rain Eau De Parfum to Refresh Your Se...,699.00,first-rain-eau-de-parfum-to-refresh-your-sense...,https://mamaearth.in/product/first-rain-eau-de...,50,ml,50ml,1575346001,...,,,,,,,,,NaT,NaT
4,626,8904417301540,Lash Care Volumizing Mascara with Castor Oil &...,499.00,mamaearth-lash-care-volumizing-mascara-with-ca...,https://mamaearth.in/product/mamaearth-lash-ca...,13,g,13g,1953270000,...,,,,,,,,,NaT,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28263,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,plant-based-multipurpose-cleanser-for-babies-5...,https://mamaearth.in/product/plant-based-multi...,500,ml,500ml,8599427000,...,"[{'attribute_code': 'image', 'value': '/p/l/pl...",20.0,2020-11-23 10:38:25,Pooja,5.0,0.0,0.0,I use it as baby bottle cleanser and then wash...,2020-11-23,10:38:25
28264,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,plant-based-multipurpose-cleanser-for-babies-5...,https://mamaearth.in/product/plant-based-multi...,500,ml,500ml,8599427000,...,"[{'attribute_code': 'image', 'value': '/p/l/pl...",20.0,2020-11-23 10:38:01,Mamta,5.0,0.0,0.0,Best cleanser for baby toys ever! It's natural...,2020-11-23,10:38:01
28265,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,plant-based-multipurpose-cleanser-for-babies-5...,https://mamaearth.in/product/plant-based-multi...,500,ml,500ml,8599427000,...,"[{'attribute_code': 'image', 'value': '/p/l/pl...",20.0,2020-11-23 10:37:41,Sushma,5.0,0.0,0.0,I bought this baby liquid cleanser for toys an...,2020-11-23,10:37:41
28266,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,plant-based-multipurpose-cleanser-for-babies-5...,https://mamaearth.in/product/plant-based-multi...,500,ml,500ml,8599427000,...,"[{'attribute_code': 'image', 'value': '/p/l/pl...",20.0,2020-10-30 17:58:11,Niketa,5.0,0.0,0.0,I was recommended by one of my friend ..to use...,2020-10-30,17:58:11


In [4]:
# Rearranging the columns
df_2 = df_2.reindex(columns=['ID','SKU','PRODUCT_NAME','PRICE','PRODUCT_CATEGORY','PACK_SIZE','SLUG','REVIEW_COUNT','REVIEW_DATE/TIME','REVIEW_DATE','REVIEW_TIME','REVIEWER_NAME','PRICE_RATING','QUALITY_RATING','VALUE_RATING','REVIEW_CONTENT','URL','CUSTOM_ATTRIBUTES','PARENT','SIBLINGS','IS_IN_STOCK','IS_SALEABLE','ALGOLIA_OBJECT_ID','CATEGORIES','CONFIGURABLE_OPTION','DATE_OF_CREATION','LAST_UPDATED_DATE','STATUS','TYPE','VISIBILITY'])
df_2

Unnamed: 0,ID,SKU,PRODUCT_NAME,PRICE,PRODUCT_CATEGORY,PACK_SIZE,SLUG,REVIEW_COUNT,REVIEW_DATE/TIME,REVIEW_DATE,...,IS_IN_STOCK,IS_SALEABLE,ALGOLIA_OBJECT_ID,CATEGORIES,CONFIGURABLE_OPTION,DATE_OF_CREATION,LAST_UPDATED_DATE,STATUS,TYPE,VISIBILITY
0,639,8904417300048,ME White Musk Eau De Parfum For a Fragrance Cl...,699.00,,50ml,me-white-musk-eau-de-parfum-for-a-fragrance-cl...,,,NaT,...,1,True,1990186000,"['2', '21', '45', '197']","{'attributeId': 222, 'optionValue': '94', 'att...",2022-08-29,2022-09-02,1,simple,4
1,638,8904417300031,ME Floral Eau De Parfum - Live in the Moment -...,699.00,,50ml,me-floral-eau-de-parfum-live-in-the-moment-50-ml,,,NaT,...,1,True,1990184000,"['2', '21', '45', '197']","{'attributeId': 222, 'optionValue': '93', 'att...",2022-08-29,2022-09-02,1,simple,4
2,636,8904417300024,ME Oud Eau De Parfum to Unleash Your Confidenc...,699.00,,50ml,me-oud-eau-de-parfum-to-unleash-your-confidenc...,,,NaT,...,1,True,1661771144595,"['2', '21', '45', '197']","{'attributeId': 221, 'optionValue': '91', 'att...",2022-08-29,2022-09-02,1,simple,4
3,634,8904417300055,ME First Rain Eau De Parfum to Refresh Your Se...,699.00,,50ml,first-rain-eau-de-parfum-to-refresh-your-sense...,,,NaT,...,1,True,1575346001,"['2', '45', '197', '21']",Not found,2022-08-29,2022-09-02,1,simple,4
4,626,8904417301540,Lash Care Volumizing Mascara with Castor Oil &...,499.00,,13g,mamaearth-lash-care-volumizing-mascara-with-ca...,,,NaT,...,1,True,1953270000,"['2', '21', '195']",Not found,2022-08-22,2022-08-31,1,simple,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28263,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,plant-based-multipurpose-cleanser-for-babies-5...,20.0,2020-11-23 10:38:25,2020-11-23,...,0,True,8599427000,"['2', '6', '5', '8', '10', '64']",Not found,2021-03-25,2022-09-02,1,simple,4
28264,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,plant-based-multipurpose-cleanser-for-babies-5...,20.0,2020-11-23 10:38:01,2020-11-23,...,0,True,8599427000,"['2', '6', '5', '8', '10', '64']",Not found,2021-03-25,2022-09-02,1,simple,4
28265,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,plant-based-multipurpose-cleanser-for-babies-5...,20.0,2020-11-23 10:37:41,2020-11-23,...,0,True,8599427000,"['2', '6', '5', '8', '10', '64']",Not found,2021-03-25,2022-09-02,1,simple,4
28266,219,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,plant-based-multipurpose-cleanser-for-babies-5...,20.0,2020-10-30 17:58:11,2020-10-30,...,0,True,8599427000,"['2', '6', '5', '8', '10', '64']",Not found,2021-03-25,2022-09-02,1,simple,4


In [5]:
# Dropping unwanted columns
df_2.drop(['ID','SLUG','REVIEW_DATE/TIME','REVIEWER_NAME','CUSTOM_ATTRIBUTES','PARENT','SIBLINGS','IS_IN_STOCK','IS_SALEABLE','ALGOLIA_OBJECT_ID','CATEGORIES','CONFIGURABLE_OPTION','STATUS','TYPE','VISIBILITY'], axis = 1, inplace = True)
df_2

Unnamed: 0,SKU,PRODUCT_NAME,PRICE,PRODUCT_CATEGORY,PACK_SIZE,REVIEW_COUNT,REVIEW_DATE,REVIEW_TIME,PRICE_RATING,QUALITY_RATING,VALUE_RATING,REVIEW_CONTENT,URL,DATE_OF_CREATION,LAST_UPDATED_DATE
0,8904417300048,ME White Musk Eau De Parfum For a Fragrance Cl...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/me-white-musk-eau...,2022-08-29,2022-09-02
1,8904417300031,ME Floral Eau De Parfum - Live in the Moment -...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/me-floral-eau-de-...,2022-08-29,2022-09-02
2,8904417300024,ME Oud Eau De Parfum to Unleash Your Confidenc...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/me-oud-eau-de-par...,2022-08-29,2022-09-02
3,8904417300055,ME First Rain Eau De Parfum to Refresh Your Se...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/first-rain-eau-de...,2022-08-29,2022-09-02
4,8904417301540,Lash Care Volumizing Mascara with Castor Oil &...,499.00,,13g,,NaT,NaT,,,,,https://mamaearth.in/product/mamaearth-lash-ca...,2022-08-22,2022-08-31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28263,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,20.0,2020-11-23,10:38:25,5.0,0.0,0.0,I use it as baby bottle cleanser and then wash...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02
28264,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,20.0,2020-11-23,10:38:01,5.0,0.0,0.0,Best cleanser for baby toys ever! It's natural...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02
28265,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,20.0,2020-11-23,10:37:41,5.0,0.0,0.0,I bought this baby liquid cleanser for toys an...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02
28266,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,ba_body,500ml,20.0,2020-10-30,17:58:11,5.0,0.0,0.0,I was recommended by one of my friend ..to use...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02


In [6]:
# Cleaning the category column
df_2['PRODUCT_CATEGORY']=df_2['PRODUCT_CATEGORY'].str.replace('be_','').str.replace('ba_','')
df_2

Unnamed: 0,SKU,PRODUCT_NAME,PRICE,PRODUCT_CATEGORY,PACK_SIZE,REVIEW_COUNT,REVIEW_DATE,REVIEW_TIME,PRICE_RATING,QUALITY_RATING,VALUE_RATING,REVIEW_CONTENT,URL,DATE_OF_CREATION,LAST_UPDATED_DATE
0,8904417300048,ME White Musk Eau De Parfum For a Fragrance Cl...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/me-white-musk-eau...,2022-08-29,2022-09-02
1,8904417300031,ME Floral Eau De Parfum - Live in the Moment -...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/me-floral-eau-de-...,2022-08-29,2022-09-02
2,8904417300024,ME Oud Eau De Parfum to Unleash Your Confidenc...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/me-oud-eau-de-par...,2022-08-29,2022-09-02
3,8904417300055,ME First Rain Eau De Parfum to Refresh Your Se...,699.00,,50ml,,NaT,NaT,,,,,https://mamaearth.in/product/first-rain-eau-de...,2022-08-29,2022-09-02
4,8904417301540,Lash Care Volumizing Mascara with Castor Oil &...,499.00,,13g,,NaT,NaT,,,,,https://mamaearth.in/product/mamaearth-lash-ca...,2022-08-22,2022-08-31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28263,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,body,500ml,20.0,2020-11-23,10:38:25,5.0,0.0,0.0,I use it as baby bottle cleanser and then wash...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02
28264,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,body,500ml,20.0,2020-11-23,10:38:01,5.0,0.0,0.0,Best cleanser for baby toys ever! It's natural...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02
28265,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,body,500ml,20.0,2020-11-23,10:37:41,5.0,0.0,0.0,I bought this baby liquid cleanser for toys an...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02
28266,8906087772927,Plant-Based Multipurpose Cleanser for Babies -...,499.00,body,500ml,20.0,2020-10-30,17:58:11,5.0,0.0,0.0,I was recommended by one of my friend ..to use...,https://mamaearth.in/product/plant-based-multi...,2021-03-25,2022-09-02


In [7]:
# Droping Blank fields in 'REVIEW_COUNT' column
# df_2 = df_2.dropna(subset=['REVIEW_COUNT'])

In [8]:
# Creating the 'Other' category for left over products
values = {"PRODUCT_CATEGORY": 'Other'}
df_2 = df_2.fillna(value=values)

In [9]:
# Filling the PACK_SIZE information of products as 'No_data' whose PACK_SIZE is not given
values = {"PACK_SIZE": 'No_data'}
df_2 = df_2.fillna(value=values)

### 2.3 -- Assigning the Random States

In this we will assign the Random States to each row to perform an analysis on the basis of their geographical location..

In [10]:
# Ignoring warnings
warnings.filterwarnings('ignore')

In [11]:
# Creating a list of some random states
states = ['Andhra Pradesh','Arunachal Pradesh','Goa','Gujarat','Haryana','Himachal Pradesh','Jammu and Kashmir','Jharkhand','Karnataka','Maharashtra','Odisha','Punjab','Rajasthan','Tamil Nadu','Uttar Pradesh','Uttarakhand','West Bengal','Chandigarh','Madhya Pradesh','National Capital Territory of Delhi']
df_2['STATES'] = None
for i in tqdm(range(len(df_2))):
    
#     This random module will assign each state randomly in every row
    state = random.choices(states, weights=(37,29,23,33,35,22,7,10,31,79,19,40,46,14,54,26,21,24,62,39))
    df_2['STATES'][i] = state[0]

100%|███████████████████████████████████████████████████████████████████████████| 28268/28268 [03:18<00:00, 142.64it/s]


In [12]:
df_2["STATES"].value_counts()

Maharashtra                            3445
Madhya Pradesh                         2786
Uttar Pradesh                          2423
Rajasthan                              1998
Punjab                                 1692
National Capital Territory of Delhi    1632
Andhra Pradesh                         1569
Haryana                                1556
Karnataka                              1445
Gujarat                                1430
Arunachal Pradesh                      1193
Uttarakhand                            1173
Chandigarh                              972
Goa                                     953
West Bengal                             916
Himachal Pradesh                        914
Odisha                                  774
Tamil Nadu                              662
Jharkhand                               429
Jammu and Kashmir                       306
Name: STATES, dtype: int64

In [13]:
# In case you want separate data file
df_2.to_csv('02_Data_Wrangling.csv',index=False)

So, these are the codes for Data Wrangling which we have performed in the previous data. We now have the structured/cleaned data which may going to be very helpful in our further projects.