# Prepare Macys dataset

* https://www.kaggle.com/PromptCloudHQ/innerwear-data-from-victorias-secret-and-others?select=macys_com.csv
* Macy’s dataset was found on the Kaggle datasets repository. Macy’s is an American department store selling products such as clothing, footwear or accessories. The dataset contains innerwear and swimwear products. The dataset has 40897 records and 14 columns (e.g. product_name, price, pdp_url, brand_name, rating) in CSV format. More detailed information about the dataset can be found in Table 3.1 and in the Dataset Kaggle link.
* Detail information about datasets can be found in the Master's thesis.

In [None]:
import pandas as pd
import numpy as np

pd.options.display.max_columns = None
pd.options.display.max_rows = None

In [2]:
orig_data = pd.read_csv('./../data/macys-dataset.csv')

In [3]:
display(orig_data.shape)
display(orig_data.describe())
orig_data.info()

(40897, 14)

Unnamed: 0,rating,review_count
count,26092.0,26101.0
mean,4.447934,31.035094
std,0.607859,60.745419
min,0.0,1.0
25%,4.2,2.0
50%,4.6,7.0
75%,4.9,25.0
max,5.0,406.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40897 entries, 0 to 40896
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   product_name      40897 non-null  object 
 1   mrp               40897 non-null  object 
 2   price             40897 non-null  object 
 3   pdp_url           40897 non-null  object 
 4   brand_name        40897 non-null  object 
 5   product_category  40897 non-null  object 
 6   retailer          40897 non-null  object 
 7   description       40897 non-null  object 
 8   rating            26092 non-null  float64
 9   review_count      26101 non-null  float64
 10  style_attributes  40897 non-null  object 
 11  total_sizes       40897 non-null  object 
 12  available_size    40897 non-null  object 
 13  color             40897 non-null  object 
dtypes: float64(2), object(12)
memory usage: 4.4+ MB


In [4]:
orig_data["has_synthetic_dq_issue"] = 0
orig_data["synthetic_dq_issues_count"] = 0
orig_data["synthetic_dq_issues"] = ""
orig_data["synthetic_dq_issue_columns"] = ""

In [5]:
orig_data.head()

Unnamed: 0,product_name,mrp,price,pdp_url,brand_name,product_category,retailer,description,rating,review_count,style_attributes,total_sizes,available_size,color,has_synthetic_dq_issue,synthetic_dq_issues_count,synthetic_dq_issues,synthetic_dq_issue_columns
0,ID String Bikini QF1754,$20.00,$20.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,The perfect amount of coverage in a subtle sil...,,,"[""Thin elastic waistband "", ""Repeating logo at...","[""XS"", ""S"", ""M"", ""L"", ""XL""]","[""XS"", ""S"", ""M"", ""L"", ""XL""]",Black,0,0,,
1,ID String Bikini QF1754,$20.00,$20.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,The perfect amount of coverage in a subtle sil...,,,"[""Thin elastic waistband "", ""Repeating logo at...","[""XS"", ""S"", ""M"", ""L"", ""XL""]","[""XS"", ""S"", ""M"", ""L"", ""XL""]",Simple Grid,0,0,,
2,ID String Bikini QF1754,$20.00,$20.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,The perfect amount of coverage in a subtle sil...,,,"[""Thin elastic waistband "", ""Repeating logo at...","[""XS"", ""S"", ""M"", ""L"", ""XL""]","[""XS"", ""S"", ""M"", ""L"", ""XL""]",White,0,0,,
3,CK Black Collection Embrace Lace-Waist Thong Q...,$26.00,$26.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,Exquisitely designed embroidered lace beautifu...,,,"[""Elastic lace waistband"", ""Lace at front and ...","[""M""]","[""M""]",Regal Sensous,0,0,,
4,Halo Lace Boyshort 870205,$15.00,$15.00,http://www1.macys.com/shop/product/wacoal-halo...,Wacoal,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,A fit that is heavenly. The stretch lace Halo ...,5.0,2.0,"[""Stretch lace waistband"", ""All-over floral la...","[""S"", ""M"", ""L"", ""XL""]","[""S"", ""M"", ""L"", ""XL""]",Black,0,0,,


## Create corrupted data

### Common methods

In [6]:
def set_synthetic_dq_issues_cols(row, dq_issues):
    row['has_synthetic_dq_issue'] = 1
    row['synthetic_dq_issues'] = dq_issues
    return row

In [7]:
def show_synthetic_dq_issues_cols(row):
    print(row['has_synthetic_dq_issue'])
    print(row['synthetic_dq_issues'])

In [8]:
def print_cols_name_type(data, index):
    for col_name in data.columns:     
        print("[{}] - {} - {}".format(col_name, type(data.iloc[index][col_name]), data.iloc[index][col_name]))

In [9]:
def create_dq_issue_at_index(data, index, column_name, new_column_value, dq_issue_description, append=False):
    print("############## BEFORE ###################")
    print("[{}] - {}".format(column_name, data.iloc[index][column_name]))
    print("[has_synthetic_dq_issue] - {}".format(data.iloc[index]['has_synthetic_dq_issue']))
    print("[synthetic_dq_issues] - {}".format(data.iloc[index]['synthetic_dq_issues']))
    print("[synthetic_dq_issues_count] - {}".format(data.iloc[index]['synthetic_dq_issues_count']))
    print("[synthetic_dq_issue_columns] - {}".format(data.iloc[index]['synthetic_dq_issue_columns']))

    data.at[index, column_name] = new_column_value
    data.at[index, 'has_synthetic_dq_issue'] = 1
    data.at[index, 'synthetic_dq_issues_count'] = data.at[index, 'synthetic_dq_issues_count'] + 1
    
    if append:
        synthetic_dq_issues = data.at[index, 'synthetic_dq_issues']
        data.at[index, 'synthetic_dq_issues'] = synthetic_dq_issues + "; " + dq_issue_description
        
        synthetic_dq_issue_columns = data.at[index, 'synthetic_dq_issue_columns']
        data.at[index, 'synthetic_dq_issue_columns'] = synthetic_dq_issue_columns + "; " + column_name
    else:
        data.at[index, 'synthetic_dq_issues'] = dq_issue_description
        data.at[index, 'synthetic_dq_issue_columns'] = column_name

    print("############## AFTER ###################")
    print("[{}] - {}".format(column_name, data.iloc[index][column_name]))
    print("[has_synthetic_dq_issue] - {}".format(data.iloc[index]['has_synthetic_dq_issue']))
    print("[synthetic_dq_issues] - {}".format(data.iloc[index]['synthetic_dq_issues']))
    print("[synthetic_dq_issues_count] - {}".format(data.iloc[index]['synthetic_dq_issues_count']))
    print("[synthetic_dq_issue_columns] - {}".format(data.iloc[index]['synthetic_dq_issue_columns']))

### 1. Common data corruption

#### A)

In [10]:
print(len(orig_data))

40897


In [11]:
print_cols_name_type(orig_data, 3386)

[product_name] - <class 'str'> - Skinsense High-Cut Seamless Brief 871254
[mrp] - <class 'str'> - $18.00 
[price] - <class 'str'> - $18.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/wacoal-skinsense-high-cut-seamless-brief-871254?ID=2906506&CategoryID=55805
[brand_name] - <class 'str'> - Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Panties
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Light and comfy, this seamless stretch brief from Wacoal disappears under your clothes for the perfect silhouette.
[rating] - <class 'numpy.float64'> - 5.0
[review_count] - <class 'numpy.float64'> - 2.0
[style_attributes] - <class 'str'> - ["High cut", "Heat seal at leg for a clean finish", "Lined at gusset", "Nylon/spandex/cotton", "Hand wash", "Imported", "Web ID: 2906506"]
[total_sizes] - <class 'str'> - ["S", "M", "L", "XL"]
[available_size] - <class 'str'> - ["S", "M", "L", "XL"]
[color] - <class 'str'> - Conch Shell
[has_synt

In [12]:
create_dq_issue_at_index(orig_data, 3386, 'description', 'I do not have time to create a description.', "[description][Incomplete information] Insufficient product description.")

############## BEFORE ###################
[description] - Light and comfy, this seamless stretch brief from Wacoal disappears under your clothes for the perfect silhouette.
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[description] - I do not have time to create a description.
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [description][Incomplete information] Insufficient product description.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - description


#### B)

In [13]:
print_cols_name_type(orig_data, 13386)

[product_name] - <class 'str'> - ID Mesh Logo Thong QF1368
[mrp] - <class 'str'> - $22.00 
[price] - <class 'str'> - $22.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-id-mesh-logo-thong-qf1368?ID=4548295&CategoryID=65739
[brand_name] - <class 'str'> - Calvin Klein
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - The sporty style of Calvin Klein's logo thong takes a sexy turn by incorporating sheer mesh details.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Colorblocked elastic waistband with logo at front", "Solid panel at front; sheer fishnet mesh at sides and back", "Lined at gusset", "Cotton/elastane", "Hand wash", "Imported", "Web ID: 4548295"]
[total_sizes] - <class 'str'> - ["S", "M", "L"]
[available_size] - <class 'str'> - ["S", "M", "L"]
[color] - <class 'str'

In [14]:
create_dq_issue_at_index(orig_data, 13386, 'color', 'blakc', "[color][Typo] Wrong color name.")

############## BEFORE ###################
[color] - White
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[color] - blakc
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [color][Typo] Wrong color name.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - color


#### C)

In [15]:
print_cols_name_type(orig_data, 24123)

[product_name] - <class 'str'> - Brenton Striped Brazilian Lace Bikini 2V2105
[mrp] - <class 'str'> - $34.00 
[price] - <class 'str'> - $34.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/hanky-panky-brenton-striped-brazilian-lace-bikini-2v2105?ID=3161246&CategoryID=65739
[brand_name] - <class 'str'> - Hanky Panky
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - A cool striped print adds lovely style to this soft, lacy bikini from Hanky Panky.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Soft lace", "One size fits all", "Lined at gusset", "Nylon; exclusive of trim; gusset: cotton", "Hand wash", "Made in USA", "Web ID: 3161246"]
[total_sizes] - <class 'str'> - ["XS", "S", "M", "L"]
[available_size] - <class 'str'> - ["XS", "S", "M", "L"]
[color] - <class 'str'> - Blue/White
[has_syn

In [16]:
create_dq_issue_at_index(orig_data, 24123, 'price', '$0.50', "[price][Typo] Low product price.")

############## BEFORE ###################
[price] - $34.00 
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[price] - $0.50
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [price][Typo] Low product price.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - price


#### D)

In [17]:
print_cols_name_type(orig_data, 19386)

[product_name] - <class 'str'> - Intense Power Low-Impact Mesh-Racerback Bralette QF1540
[mrp] - <class 'str'> - $36.00 
[price] - <class 'str'> - $18.90 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-intense-power-low-impact-mesh-racerback-bralette-qf1540?ID=2747953&CategoryID=55799
[brand_name] - <class 'str'> - Calvin Klein
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Bras
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - The iconic logo is the perfect finishing touch to this comfy bralette from Intense Power by Calvin Klein.
[rating] - <class 'numpy.float64'> - 3.5
[review_count] - <class 'numpy.float64'> - 2.0
[style_attributes] - <class 'str'> - ["Wireless cups", "Wide band below bust with logo", "Racerback with sheer mesh panel", "Pull-on style", "Low-impact sports bra ideal for yoga, pilates and lounging", "Nylon/elastane", "Machine washable", "Imported", "Web ID: 2747953"]
[total_sizes] - <class 'str'> - 

In [18]:
create_dq_issue_at_index(orig_data, 19386, 'product_category', 'Headphones', "[product_category][Wrong categorization] Inadequate category.")

############## BEFORE ###################
[product_category] - Women - Lingerie & Shapewear - Bras
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[product_category] - Headphones
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [product_category][Wrong categorization] Inadequate category.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - product_category


#### E)

In [19]:
print_cols_name_type(orig_data, 23346)

[product_name] - <class 'str'> - Retro Chic Wireless Bra 852186
[mrp] - <class 'str'> - $58.00 
[price] - <class 'str'> - $34.99 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/wacoal-retro-chic-wireless-bra-852186?ID=2692399&CategoryID=65739
[brand_name] - <class 'str'> - Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Slip on this comfortable and supportive bra from Wacoal, feauring a gorgeous sheer mesh lace overlay for a feminine touch.
[rating] - <class 'numpy.float64'> - 3.9
[review_count] - <class 'numpy.float64'> - 13.0
[style_attributes] - <class 'str'> - ["Adjustable, stretch straps", "Wireless, full coverage cups with sheer mesh and lace overlay", "Bow and jewel at center gore", "Leotard back with two-ply mesh", "Triple back hook-and-eye closure", "Nylon/spandex; lace: nylon", "Hand wash", "Imported", "Web ID: 2692399"]
[total_sizes] - <class 'st

In [20]:
create_dq_issue_at_index(orig_data, 23346, 'pdp_url', 'http://www1.macys.uk/shop/product/wacoal-retro-chic-wireless-bra-852186?ID=2692399&amp;CategoryID=65739', "[pdp_url][Wrong domain URL] Product URL uses UK domain.")

############## BEFORE ###################
[pdp_url] - http://www1.macys.com/shop/product/wacoal-retro-chic-wireless-bra-852186?ID=2692399&CategoryID=65739
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[pdp_url] - http://www1.macys.uk/shop/product/wacoal-retro-chic-wireless-bra-852186?ID=2692399&amp;CategoryID=65739
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [pdp_url][Wrong domain URL] Product URL uses UK domain.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - pdp_url


#### F)

In [21]:
print_cols_name_type(orig_data, 31346)

[product_name] - <class 'str'> - Seductive Comfort Lace Demi Bra QF1444
[mrp] - <class 'str'> - $46.00 
[price] - <class 'str'> - $24.15 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-seductive-comfort-lace-demi-bra-qf1444?ID=2929187&CategoryID=65739
[brand_name] - <class 'str'> - Calvin Klein
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Lacy detailing adds a touch of elegance to this Seductive Comfort demi bra from Calvin Klein.
[rating] - <class 'numpy.float64'> - 4.5
[review_count] - <class 'numpy.float64'> - 2.0
[style_attributes] - <class 'str'> - ["Adjustable straps", "Lined, push-up, demi cups with lace trim", "Racerback straps convert two ways", "Lace wings", "Double hook-and-eye back closure", "Nylon/elastane", "Hand wash", "Imported", "Web ID: 2929187", "Halter strap at back neck", "Unlined, wireless cups", "Wide lace band under bust", "

In [22]:
create_dq_issue_at_index(orig_data, 31346, 'price', '$24,50', "[price][Format] Wrong currency separator.")

############## BEFORE ###################
[price] - $24.15 
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[price] - $24,50
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [price][Format] Wrong currency separator.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - price


#### G)

In [23]:
print_cols_name_type(orig_data, 4127)

[product_name] - <class 'str'> - Low-Impact Logo Longline Bralette QF1567
[mrp] - <class 'str'> - $36.00 
[price] - <class 'str'> - $20.25 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-low-impact-logo-longline-bralette-qf1567?ID=2964366&CategoryID=55799
[brand_name] - <class 'str'> - Calvin Klein
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Bras
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - A contrast logo is a chic touch to this comfortable, everyday bralette from Calvin Klein.
[rating] - <class 'numpy.float64'> - 4.5
[review_count] - <class 'numpy.float64'> - 6.0
[style_attributes] - <class 'str'> - ["Adjustable, multiway straps", "Unlined cups", "Elastic band under bust features logo", "Pull-on style, no closure", "Nylon/polyester/elastane", "Machine washable", "Imported", "Web ID: 2964366"]
[total_sizes] - <class 'str'> - ["XS", "S", "M", "L"]
[available_size] - <class 'str'> - ["XS", "M", "L"]
[color] - 

In [24]:
create_dq_issue_at_index(orig_data, 4127, 'rating', '6', "[rating][OutOfRange] Rating value is out of range (6).")

############## BEFORE ###################
[rating] - 4.5
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[rating] - 6.0
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [rating][OutOfRange] Rating value is out of range (6).
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - rating


#### H)

In [25]:
print_cols_name_type(orig_data, 928)

[product_name] - <class 'str'> - Stark Beauty Rose-Lace Contour Bra 853225
[mrp] - <class 'str'> - $68.00 
[price] - <class 'str'> - $68.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/wacoal-stark-beauty-rose-lace-contour-bra-853225?ID=4365414&CategoryID=65739
[brand_name] - <class 'str'> - Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Create a smooth silhouette with the refined, abstract rose lace of the Stark Beauty bra from Wacoal.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Split straps adjust at back", "Underwire, contour cups provide modesty and shaping", "Sheer mesh panels at sides and back", "Double hook-and-eye back closure", "Nylon/spandex", "Hand wash", "Imported", "Web ID: 4365414"]
[total_sizes] - <class 'str'> - ["30D", "30DD", "30DDD", "32C", "32D", "32DD

In [26]:
german_text = "Deutsches Ipsum Dolor sit amet, Weltschmerz adipiscing elit, sed do eiusmod Flughafen incididunt ut labore et dolore Die unendliche Geschichte aliqua. Ut enim ad minim Käsefondue quis nostrud exercitation ullamco laboris Brezel ut aliquip ex ea commodo Currywurst Duis aute irure dolor in Müller Rice in voluptate velit esse cillum Deutschland eu fugiat nulla pariatur. Excepteur Vorsprung durch Technik occaecat cupidatat non proident, sunt Doppelscheren-Hubtischwagen culpa qui officia deserunt mollit Reinheitsgebot id est laborum"

In [27]:
create_dq_issue_at_index(orig_data, 928, 'description', german_text, "[description][Wrong language] The description is in German.")

############## BEFORE ###################
[description] - Create a smooth silhouette with the refined, abstract rose lace of the Stark Beauty bra from Wacoal.
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[description] - Deutsches Ipsum Dolor sit amet, Weltschmerz adipiscing elit, sed do eiusmod Flughafen incididunt ut labore et dolore Die unendliche Geschichte aliqua. Ut enim ad minim Käsefondue quis nostrud exercitation ullamco laboris Brezel ut aliquip ex ea commodo Currywurst Duis aute irure dolor in Müller Rice in voluptate velit esse cillum Deutschland eu fugiat nulla pariatur. Excepteur Vorsprung durch Technik occaecat cupidatat non proident, sunt Doppelscheren-Hubtischwagen culpa qui officia deserunt mollit Reinheitsgebot id est laborum
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [description][Wrong language] The description is in German.
[synthetic_dq_issue

#### I)

In [28]:
print_cols_name_type(orig_data, 928)

[product_name] - <class 'str'> - Stark Beauty Rose-Lace Contour Bra 853225
[mrp] - <class 'str'> - $68.00 
[price] - <class 'str'> - $68.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/wacoal-stark-beauty-rose-lace-contour-bra-853225?ID=4365414&CategoryID=65739
[brand_name] - <class 'str'> - Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Deutsches Ipsum Dolor sit amet, Weltschmerz adipiscing elit, sed do eiusmod Flughafen incididunt ut labore et dolore Die unendliche Geschichte aliqua. Ut enim ad minim Käsefondue quis nostrud exercitation ullamco laboris Brezel ut aliquip ex ea commodo Currywurst Duis aute irure dolor in Müller Rice in voluptate velit esse cillum Deutschland eu fugiat nulla pariatur. Excepteur Vorsprung durch Technik occaecat cupidatat non proident, sunt Doppelscheren-Hubtischwagen culpa qui officia deserunt mollit Reinheitsgebot id est

In [29]:
total_sizes_wrong_format = "{\"32A\", \"32B\", \"32C\", \"32D\", \"32DD\", \"32DDD\", \"34A\", \"34B\", \"34C\", \"34D\", \"34DD\", \"34DDD\", \"36A\", \"36B\", \"36C\", \"36D\", \"36DD\", \"36DDD\", \"38A\", \"38B\", \"38C\", \"38D\", \"38DD\", \"38DDD\"}"

In [30]:
create_dq_issue_at_index(orig_data, 928, 'total_sizes', total_sizes_wrong_format, "[total_sizes][Wrong format] Curly braces instead of square brackets.", True)

############## BEFORE ###################
[total_sizes] - ["30D", "30DD", "30DDD", "32C", "32D", "32DD", "32DDD", "32G", "34B", "34C", "34D", "34DD", "34DDD", "34G", "36B", "36C", "36D", "36DD", "36DDD", "36G", "38B", "38C", "38D", "38DD", "38DDD", "38G", "40C", "40D", "40DD", "40DDD"]
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [description][Wrong language] The description is in German.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - description
############## AFTER ###################
[total_sizes] - {"32A", "32B", "32C", "32D", "32DD", "32DDD", "34A", "34B", "34C", "34D", "34DD", "34DDD", "36A", "36B", "36C", "36D", "36DD", "36DDD", "38A", "38B", "38C", "38D", "38DD", "38DDD"}
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [description][Wrong language] The description is in German.; [total_sizes][Wrong format] Curly braces instead of square brackets.
[synthetic_dq_issues_count] - 2
[synthetic_dq_issue_columns] - description; total_sizes


#### J)

In [31]:
print_cols_name_type(orig_data, 50)

[product_name] - <class 'str'> - Modern Logo Pants D1632
[mrp] - <class 'str'> - $46.00 
[price] - <class 'str'> - $46.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-modern-logo-pants-d1632?ID=2926654&CategoryID=65739
[brand_name] - <class 'str'> - Calvin Klein
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Comfy and low-key, these Modern pants from Calvin Klein are the perfect layering piece for almost any occasion.
[rating] - <class 'numpy.float64'> - 5.0
[review_count] - <class 'numpy.float64'> - 1.0
[style_attributes] - <class 'str'> - ["Wide elastic waistband features logo", "Full length", "Cotton/modal/elastane", "Machine washable", "Imported", "Web ID: 2926654"]
[total_sizes] - <class 'str'> - ["S", "M", "L"]
[available_size] - <class 'str'> - ["S", "M", "L"]
[color] - <class 'str'> - Black
[has_synthetic_dq_issue] - <class 'numpy.int64'> 

In [32]:
create_dq_issue_at_index(orig_data, 50, 'brand_name', 'Coca-Cola', "[brand_name][Wrong information] Product name that does not give contextual meaning.")

############## BEFORE ###################
[brand_name] - Calvin Klein
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[brand_name] - Coca-Cola
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [brand_name][Wrong information] Product name that does not give contextual meaning.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - brand_name


#### K)

In [33]:
print_cols_name_type(orig_data, 50)

[product_name] - <class 'str'> - Modern Logo Pants D1632
[mrp] - <class 'str'> - $46.00 
[price] - <class 'str'> - $46.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-modern-logo-pants-d1632?ID=2926654&CategoryID=65739
[brand_name] - <class 'str'> - Coca-Cola
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Comfy and low-key, these Modern pants from Calvin Klein are the perfect layering piece for almost any occasion.
[rating] - <class 'numpy.float64'> - 5.0
[review_count] - <class 'numpy.float64'> - 1.0
[style_attributes] - <class 'str'> - ["Wide elastic waistband features logo", "Full length", "Cotton/modal/elastane", "Machine washable", "Imported", "Web ID: 2926654"]
[total_sizes] - <class 'str'> - ["S", "M", "L"]
[available_size] - <class 'str'> - ["S", "M", "L"]
[color] - <class 'str'> - Black
[has_synthetic_dq_issue] - <class 'numpy.int64'> - 1

In [34]:
create_dq_issue_at_index(orig_data, 50, 'price', '£16.50', "[price][Wrong format] Wrong currency.", True)

############## BEFORE ###################
[price] - $46.00 
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [brand_name][Wrong information] Product name that does not give contextual meaning.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - brand_name
############## AFTER ###################
[price] - £16.50
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [brand_name][Wrong information] Product name that does not give contextual meaning.; [price][Wrong format] Wrong currency.
[synthetic_dq_issues_count] - 2
[synthetic_dq_issue_columns] - brand_name; price


In [35]:
print_cols_name_type(orig_data, 50)

[product_name] - <class 'str'> - Modern Logo Pants D1632
[mrp] - <class 'str'> - $46.00 
[price] - <class 'str'> - £16.50
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-modern-logo-pants-d1632?ID=2926654&CategoryID=65739
[brand_name] - <class 'str'> - Coca-Cola
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Comfy and low-key, these Modern pants from Calvin Klein are the perfect layering piece for almost any occasion.
[rating] - <class 'numpy.float64'> - 5.0
[review_count] - <class 'numpy.float64'> - 1.0
[style_attributes] - <class 'str'> - ["Wide elastic waistband features logo", "Full length", "Cotton/modal/elastane", "Machine washable", "Imported", "Web ID: 2926654"]
[total_sizes] - <class 'str'> - ["S", "M", "L"]
[available_size] - <class 'str'> - ["S", "M", "L"]
[color] - <class 'str'> - Black
[has_synthetic_dq_issue] - <class 'numpy.int64'> - 1


#### L)

In [36]:
print_cols_name_type(orig_data, 129)

[product_name] - <class 'str'> - CK One Logo Thong QF1368
[mrp] - <class 'str'> - $20.00 
[price] - <class 'str'> - $20.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-ck-one-logo-thong-qf1368?ID=4548343&CategoryID=65739
[brand_name] - <class 'str'> - Calvin Klein
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Calvin Klein's sleek thong is a classic look that puts comfort first.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Elastic waistband ", "Contrast at waist with repeating logo", "Lined at gusset", "Cotton/elastane", "Hand wash", "Imported", "Web ID: 4548343"]
[total_sizes] - <class 'str'> - ["S", "M", "L"]
[available_size] - <class 'str'> - ["S", "M", "L"]
[color] - <class 'str'> - Desert Sunset
[has_synthetic_dq_issue] - <class 'numpy.int64'> - 0
[synthetic_dq_

In [37]:
create_dq_issue_at_index(orig_data, 129, 'price', '$16', "[price][Wrong format] Missing floating point.")

############## BEFORE ###################
[price] - $20.00 
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[price] - $16
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [price][Wrong format] Missing floating point.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - price


In [38]:
print_cols_name_type(orig_data, 129)

[product_name] - <class 'str'> - CK One Logo Thong QF1368
[mrp] - <class 'str'> - $20.00 
[price] - <class 'str'> - $16
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/calvin-klein-ck-one-logo-thong-qf1368?ID=4548343&CategoryID=65739
[brand_name] - <class 'str'> - Calvin Klein
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Calvin Klein's sleek thong is a classic look that puts comfort first.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Elastic waistband ", "Contrast at waist with repeating logo", "Lined at gusset", "Cotton/elastane", "Hand wash", "Imported", "Web ID: 4548343"]
[total_sizes] - <class 'str'> - ["S", "M", "L"]
[available_size] - <class 'str'> - ["S", "M", "L"]
[color] - <class 'str'> - Desert Sunset
[has_synthetic_dq_issue] - <class 'numpy.int64'> - 1
[synthetic_dq_issu

#### M)

In [39]:
print_cols_name_type(orig_data, 1291)

[product_name] - <class 'str'> - b.provocative Contrast-Lace Bra 951222
[mrp] - <class 'str'> - $40.00 
[price] - <class 'str'> - $22.50 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/b.temptd-by-wacoal-b.provocative-contrast-lace-bra-951222?ID=4461281&CategoryID=55799
[brand_name] - <class 'str'> - b.tempt'd by Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Bras
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Flattering sheer nude panels are decorated with intricate lacework to create a memorable figure in the b.provocative bra from b.tempt'd by Wacoal.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Adjustable straps", "Delicate elastic at neckline creates illusion of open window", "Sheer, underwire cups", "Double hook-and-eye back closure", "Nylon/spandex", "Hand wash", "Imported", "Web ID: 4461281"]
[total_sizes] - <class 'str'> - ["32C",

In [40]:
create_dq_issue_at_index(orig_data, 1291, 'price', '$0.0', "[price][Suspicious value] Misleading price.")

############## BEFORE ###################
[price] - $22.50 
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[price] - $0.0
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [price][Suspicious value] Misleading price.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - price


In [41]:
print_cols_name_type(orig_data, 1291)

[product_name] - <class 'str'> - b.provocative Contrast-Lace Bra 951222
[mrp] - <class 'str'> - $40.00 
[price] - <class 'str'> - $0.0
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/b.temptd-by-wacoal-b.provocative-contrast-lace-bra-951222?ID=4461281&CategoryID=55799
[brand_name] - <class 'str'> - b.tempt'd by Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Bras
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Flattering sheer nude panels are decorated with intricate lacework to create a memorable figure in the b.provocative bra from b.tempt'd by Wacoal.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Adjustable straps", "Delicate elastic at neckline creates illusion of open window", "Sheer, underwire cups", "Double hook-and-eye back closure", "Nylon/spandex", "Hand wash", "Imported", "Web ID: 4461281"]
[total_sizes] - <class 'str'> - ["32C", "3

#### N)

In [42]:
print_cols_name_type(orig_data, 14121)

[product_name] - <class 'str'> - Lace Kiss Thong 3-Pack 976282
[mrp] - <class 'str'> - $33.00 
[price] - <class 'str'> - $33.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/b.temptd-by-wacoal-lace-kiss-thong-3-pack-976282?ID=2687057&CategoryID=65739
[brand_name] - <class 'str'> - b.tempt'd by Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Tap into your softer side with this 3-pack of gorgeous, ultra-feminine lace thongs from b.tempt'd by Wacoal.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Lace waistband", "Comes in a pack of 3", "Lined at gusset", "Nylon/spandex; gusset: cotton", "Hand wash", "Imported", "Web ID: 2687057"]
[total_sizes] - <class 'str'> - ["S", "M", "L", "XL"]
[available_size] - <class 'str'> - ["S", "M"]
[color] - <class 'str'> - Peacoat/Cherry Tomato/Lim

In [43]:
create_dq_issue_at_index(orig_data, 14121, 'rating', '9.0', "[rating][OutOfRange] Rating value is out of range (9.0).")

############## BEFORE ###################
[rating] - nan
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[rating] - 9.0
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [rating][OutOfRange] Rating value is out of range (9.0).
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - rating


In [44]:
print_cols_name_type(orig_data, 14121)

[product_name] - <class 'str'> - Lace Kiss Thong 3-Pack 976282
[mrp] - <class 'str'> - $33.00 
[price] - <class 'str'> - $33.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/b.temptd-by-wacoal-lace-kiss-thong-3-pack-976282?ID=2687057&CategoryID=65739
[brand_name] - <class 'str'> - b.tempt'd by Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Tap into your softer side with this 3-pack of gorgeous, ultra-feminine lace thongs from b.tempt'd by Wacoal.
[rating] - <class 'numpy.float64'> - 9.0
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["Lace waistband", "Comes in a pack of 3", "Lined at gusset", "Nylon/spandex; gusset: cotton", "Hand wash", "Imported", "Web ID: 2687057"]
[total_sizes] - <class 'str'> - ["S", "M", "L", "XL"]
[available_size] - <class 'str'> - ["S", "M"]
[color] - <class 'str'> - Peacoat/Cherry Tomato/Lim

#### O)

In [45]:
print_cols_name_type(orig_data, 17141)

[product_name] - <class 'str'> - Visual Effects Minimizer Bra 857210
[mrp] - <class 'str'> - $65.00 
[price] - <class 'str'> - $65.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/wacoal-visual-effects-minimizer-bra-857210?ID=2135112&CategoryID=65739
[brand_name] - <class 'str'> - Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Get the support you've been looking for with the Visual Effects Minimizer Bra by Wacoal. Style #857210
[rating] - <class 'numpy.float64'> - 4.2
[review_count] - <class 'numpy.float64'> - 27.0
[style_attributes] - <class 'str'> - ["Fully-adjustable straps ", "Molded engineered pattern", "Unlined underwire cups with sheer mesh lining", "Double back hook-and-eye closure", "Spandex/nylon", "Hand wash", "Imported", "Web ID: 2135112"]
[total_sizes] - <class 'str'> - ["32D", "32DD", "32DDD", "32G", "32H", "34C", "34D", "34DD", "34DDD", "3

In [46]:
create_dq_issue_at_index(orig_data, 17141, 'description', 'Short desc.', "[description][Incomplete information] Very short description.")

############## BEFORE ###################
[description] - Get the support you've been looking for with the Visual Effects Minimizer Bra by Wacoal. Style #857210
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[description] - Short desc.
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [description][Incomplete information] Very short description.
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - description


In [47]:
print_cols_name_type(orig_data, 17141)

[product_name] - <class 'str'> - Visual Effects Minimizer Bra 857210
[mrp] - <class 'str'> - $65.00 
[price] - <class 'str'> - $65.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/wacoal-visual-effects-minimizer-bra-857210?ID=2135112&CategoryID=65739
[brand_name] - <class 'str'> - Wacoal
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - Short desc.
[rating] - <class 'numpy.float64'> - 4.2
[review_count] - <class 'numpy.float64'> - 27.0
[style_attributes] - <class 'str'> - ["Fully-adjustable straps ", "Molded engineered pattern", "Unlined underwire cups with sheer mesh lining", "Double back hook-and-eye closure", "Spandex/nylon", "Hand wash", "Imported", "Web ID: 2135112"]
[total_sizes] - <class 'str'> - ["32D", "32DD", "32DDD", "32G", "32H", "34C", "34D", "34DD", "34DDD", "34G", "34H", "36C", "36D", "36DD", "36DDD", "36G", "36H", "38C", "38D", "38DD", "38DDD", "38

#### P)

In [48]:
print_cols_name_type(orig_data, 37141)

[product_name] - <class 'str'> - Plus Size Printed Signature Lace Vikini
[mrp] - <class 'str'> - $45.00 
[price] - <class 'str'> - $45.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/hanky-panky-plus-size-printed-signature-lace-vikini?ID=2944192&CategoryID=65739
[brand_name] - <class 'str'> - Hanky Panky
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Macys US
[description] - <class 'str'> - A gorgeous, lush floral design is the perfect finishing touch for this lacy plus size vikini from Hanky Panky.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["High waist", "Soft, stretch Signature lace", "Scallop trim at leg openings", "Lined at gusset", "Nylon/spandex; gusset: cotton", "Hand wash", "Made in USA", "Web ID: 2944192"]
[total_sizes] - <class 'str'> - ["1X", "2X"]
[available_size] - <class 'str'> - ["1X", "2X"]
[color] - <c

In [49]:
create_dq_issue_at_index(orig_data, 37141, 'retailer', 'Kaufland US', "[retailer][Nonsens information] Nonsense retailer name..")

############## BEFORE ###################
[retailer] - Macys US
[has_synthetic_dq_issue] - 0
[synthetic_dq_issues] - 
[synthetic_dq_issues_count] - 0
[synthetic_dq_issue_columns] - 
############## AFTER ###################
[retailer] - Kaufland US
[has_synthetic_dq_issue] - 1
[synthetic_dq_issues] - [retailer][Nonsens information] Nonsense retailer name..
[synthetic_dq_issues_count] - 1
[synthetic_dq_issue_columns] - retailer


In [50]:
print_cols_name_type(orig_data, 37141)

[product_name] - <class 'str'> - Plus Size Printed Signature Lace Vikini
[mrp] - <class 'str'> - $45.00 
[price] - <class 'str'> - $45.00 
[pdp_url] - <class 'str'> - http://www1.macys.com/shop/product/hanky-panky-plus-size-printed-signature-lace-vikini?ID=2944192&CategoryID=65739
[brand_name] - <class 'str'> - Hanky Panky
[product_category] - <class 'str'> - Women - Lingerie & Shapewear - Designer Lingerie
[retailer] - <class 'str'> - Kaufland US
[description] - <class 'str'> - A gorgeous, lush floral design is the perfect finishing touch for this lacy plus size vikini from Hanky Panky.
[rating] - <class 'numpy.float64'> - nan
[review_count] - <class 'numpy.float64'> - nan
[style_attributes] - <class 'str'> - ["High waist", "Soft, stretch Signature lace", "Scallop trim at leg openings", "Lined at gusset", "Nylon/spandex; gusset: cotton", "Hand wash", "Made in USA", "Web ID: 2944192"]
[total_sizes] - <class 'str'> - ["1X", "2X"]
[available_size] - <class 'str'> - ["1X", "2X"]
[color] -

## Create dataset with synthetic DQ issues

In [51]:
orig_data[50:51]

Unnamed: 0,product_name,mrp,price,pdp_url,brand_name,product_category,retailer,description,rating,review_count,style_attributes,total_sizes,available_size,color,has_synthetic_dq_issue,synthetic_dq_issues_count,synthetic_dq_issues,synthetic_dq_issue_columns
50,Modern Logo Pants D1632,$46.00,£16.50,http://www1.macys.com/shop/product/calvin-klei...,Coca-Cola,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,"Comfy and low-key, these Modern pants from Cal...",5.0,1.0,"[""Wide elastic waistband features logo"", ""Full...","[""S"", ""M"", ""L""]","[""S"", ""M"", ""L""]",Black,1,2,[brand_name][Wrong information] Product name t...,brand_name; price


In [52]:
orig_data.head()

Unnamed: 0,product_name,mrp,price,pdp_url,brand_name,product_category,retailer,description,rating,review_count,style_attributes,total_sizes,available_size,color,has_synthetic_dq_issue,synthetic_dq_issues_count,synthetic_dq_issues,synthetic_dq_issue_columns
0,ID String Bikini QF1754,$20.00,$20.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,The perfect amount of coverage in a subtle sil...,,,"[""Thin elastic waistband "", ""Repeating logo at...","[""XS"", ""S"", ""M"", ""L"", ""XL""]","[""XS"", ""S"", ""M"", ""L"", ""XL""]",Black,0,0,,
1,ID String Bikini QF1754,$20.00,$20.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,The perfect amount of coverage in a subtle sil...,,,"[""Thin elastic waistband "", ""Repeating logo at...","[""XS"", ""S"", ""M"", ""L"", ""XL""]","[""XS"", ""S"", ""M"", ""L"", ""XL""]",Simple Grid,0,0,,
2,ID String Bikini QF1754,$20.00,$20.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,The perfect amount of coverage in a subtle sil...,,,"[""Thin elastic waistband "", ""Repeating logo at...","[""XS"", ""S"", ""M"", ""L"", ""XL""]","[""XS"", ""S"", ""M"", ""L"", ""XL""]",White,0,0,,
3,CK Black Collection Embrace Lace-Waist Thong Q...,$26.00,$26.00,http://www1.macys.com/shop/product/calvin-klei...,Calvin Klein,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,Exquisitely designed embroidered lace beautifu...,,,"[""Elastic lace waistband"", ""Lace at front and ...","[""M""]","[""M""]",Regal Sensous,0,0,,
4,Halo Lace Boyshort 870205,$15.00,$15.00,http://www1.macys.com/shop/product/wacoal-halo...,Wacoal,Women - Lingerie & Shapewear - Designer Lingerie,Macys US,A fit that is heavenly. The stretch lace Halo ...,5.0,2.0,"[""Stretch lace waistband"", ""All-over floral la...","[""S"", ""M"", ""L"", ""XL""]","[""S"", ""M"", ""L"", ""XL""]",Black,0,0,,


In [53]:
orig_data.to_csv('./../data/macys-dataset-experimental-synthetic-dq-issues.csv', index_label='id')