# Project VI

This dataset provides information on the price, composition and eco-tagging of Zara's clothing items. Specifically, it includes item codes, names, descriptions, joint life titles, joint life descriptions, item prices and compositions. Thank to this dataset, we can analyze and compare prices, compositions and eco-tagging of Zara clothing items to determine which are the more affordable and sustainable items of Zara clothing.

We have two DataFrames: 
1. Composition: 
    - item_code: ``int``, foreign key referencing items
    - part_name: ``str``, part of the clothing item (e.g. Inside/Outside)
    - material: ``str``, name of the material
    - percent: ``str``, percentage of the material relative to the clothing part
    
    
2. Items: 
    - item_code: ``int``, numeric code assigned to each clothing item (primary key)
    - item_name: ``str``, name of the item
    - item_desc: ``str``, description of the item
    - join_life: ``bool``, whether or not the item is eco-tagged
    - joinlife_title: ``str``, ecologic marking
    - joinlife_desc: ``str``, description of the ecologic action taken
    - item_price: ``num``, price of the item in cents of €

In [1]:
import pandas as pd 
import re
import numpy as np

In [2]:
composition = pd.read_csv('C:/Users/manya/Documents/Ironhack/Course/Project-IV/data/fastFashionCompDim.csv', sep = '|')

In [3]:
composition

Unnamed: 0,item_code,part_name,material,percent
0,200000,EXTERIOR,algodon,100%
1,200001,EXTERIOR,algodon,100%
2,200002,EXTERIOR,viscosa,62%
3,200002,EXTERIOR,fibra metalizada,37%
4,200002,EXTERIOR,elastano,1%
...,...,...,...,...
452,500036,EXTERIOR,algodon,97%
453,500036,EXTERIOR,elastano,3%
454,500037,EXTERIOR,poliester,67%
455,500037,EXTERIOR,viscosa,29%


In [4]:
items = pd.read_csv('C:/Users/manya/Documents/Ironhack/Course/Project-IV/data/fastFasionItemsDim.csv')

In [5]:
items

Unnamed: 0,item_code,item_name,item_desc,join_life,joinlife_title,joinlife_desc,item_price
0,200000,CAMISA POPELÍN,"""Camisa de cuello solapa y escote pico. Manga ...",True,JOIN LIFE Care for fiber: 100% algodon organico.,"""Algodon cultivado utilizando fertilizantes y ...",1995.0
1,200001,CAMISA POPELÍN,"""Camisa de cuello solapa y escote pico. Manga ...",True,JOIN LIFE Care for fiber: 100% algodon organico.,"""Algodon cultivado utilizando fertilizantes y ...",1995.0
2,200002,BLUSA HILO METALIZADO,"""Blusa semitransparente de cuello solapa y esc...",False,,,3995.0
3,200003,BLUSA SATINADA ALAMARES,"""Blusa de cuello subido y escote pico. Manga l...",False,,,2995.0
4,200004,BLUSA ESTAMPADA CROPPED,"""Blusa satinada de cuello solapa y manga larga...",False,,,1995.0
...,...,...,...,...,...,...,...
271,500033,PANTALÓN PITILLO,"""Pantalon de tiro medio. Cintura con elastico ...",True,JOIN LIFE Care for fiber: al menos 25% poliest...,"""Esta fibra se obtiene a partir del reciclaje ...",1299.0
272,500034,PANTALÓN CINTURÓN RAFIA,"""Pantalon de tiro alto con cintura elastica. B...",False,,,1599.0
273,500035,PANTALÓN ESTAMPADO,"""Pantalon de tiro alto. Cintura elastica ajust...",False,,,1599.0
274,500036,PANTALÓN GARDEN,"""The Garden Pant In Grey.<br/><br/>Pantalon de...",False,,,1599.0


## Cleaning dataframes 

From ``items`` dataframe, the following columns will be cleaned: 

     - item_price: convert the values from cents to euros
     - item_name: make the values lower case
     - item_desc and joinlife_desc: drop de "" from the descriptions 
     - Create a new column name item_category based on the type of item
     - join_life: if true  == Eco-label else Not Eco-Label
     
From ``composition`` dataframe, the following columns will be cleaned: 

    - percent: convert the values to int 
    - part_name: make the values lower case 

In [6]:
def cents_euros (df, column_name):
    
    """Function to convert the cents to euros applying a lambda to the column's value.
    It takes two arguments, a DataFrame and the column name"""
    
    df[column_name] = df[column_name].apply(lambda x: x/100)
    
    return df

In [7]:
def lower_case (df, column_name): 
    """Function to transform the string values of a column into lower case. 
    It takes two arguments, a DataFrame and the column name """
    
    df[column_name] = df[column_name].str.lower()
    
    return df 

In [8]:
def percent_to_float (df, column_name): 
    
    """Function to convert the string % into a float applying it to the values
    of a column. It takes two arguments, a DataFrame and the column name."""
    
    df[column_name] = df[column_name].str.replace('%','').astype(float)
    df[column_name] =  df[column_name].apply(lambda x: x/100)
    
    return df 

In [9]:
def remove_commas (df,column_name): 
    
    """Function to remove the "" or '' from a str applying it to the values
    of a column only when it is a string. It takes two arguments, a DataFrame and the column name."""
    
    df[column_name] = df[column_name].apply(lambda x: x.replace('"', '').replace("'", "") if isinstance(x, str) else x)
    
    return df 

In [10]:
def first_word (text): 
    """Function to detect the first word of an string and grouped it in order to applied later on a column to create
    a new one"""
    match = re.search(r'^\w+', text)
    if match:
        return match.group()
    return ''
    

In [11]:
def eco_label (df, column_name):
    df[column_name] = df[column_name].apply(lambda x: 'Eco-label' if x == True else 'Not Eco-label')

Applying the fucnitons to the dataframe ``items``

In [12]:

cents_euros(items, 'item_price')
lower_case(items, 'item_name')
remove_commas(items,'item_desc')
remove_commas(items,'joinlife_desc')
items['item_category'] = items['item_name'].apply(first_word)

In [13]:
eco_label(items, 'join_life')

In [14]:
items

Unnamed: 0,item_code,item_name,item_desc,join_life,joinlife_title,joinlife_desc,item_price,item_category
0,200000,camisa popelín,Camisa de cuello solapa y escote pico. Manga a...,Eco-label,JOIN LIFE Care for fiber: 100% algodon organico.,Algodon cultivado utilizando fertilizantes y p...,19.95,camisa
1,200001,camisa popelín,Camisa de cuello solapa y escote pico. Manga a...,Eco-label,JOIN LIFE Care for fiber: 100% algodon organico.,Algodon cultivado utilizando fertilizantes y p...,19.95,camisa
2,200002,blusa hilo metalizado,Blusa semitransparente de cuello solapa y esco...,Not Eco-label,,,39.95,blusa
3,200003,blusa satinada alamares,Blusa de cuello subido y escote pico. Manga la...,Not Eco-label,,,29.95,blusa
4,200004,blusa estampada cropped,Blusa satinada de cuello solapa y manga larga ...,Not Eco-label,,,19.95,blusa
...,...,...,...,...,...,...,...,...
271,500033,pantalón pitillo,Pantalon de tiro medio. Cintura con elastico e...,Eco-label,JOIN LIFE Care for fiber: al menos 25% poliest...,Esta fibra se obtiene a partir del reciclaje d...,12.99,pantalón
272,500034,pantalón cinturón rafia,Pantalon de tiro alto con cintura elastica. Bo...,Not Eco-label,,,15.99,pantalón
273,500035,pantalón estampado,Pantalon de tiro alto. Cintura elastica ajusta...,Not Eco-label,,,15.99,pantalón
274,500036,pantalón garden,The Garden Pant In Grey.<br/><br/>Pantalon de ...,Not Eco-label,,,15.99,pantalón


In [15]:
items.item_category.value_counts()

vestido     58
blusa       41
falda       40
pantalón    32
camiseta    25
body        16
camisa      15
bermuda     13
cuerpo      11
shorts       7
top          5
chaqueta     5
jogging      3
sudadera     2
legging      2
braguita     1
Name: item_category, dtype: int64

In [16]:
#Observe that there are items that aren't a unique category, we must assign them to one of the other.
 
    ## Cuerpo = Camiseta
    ## jogging, sudadera, legging = sportswear
    
items['item_category'] = items['item_category'].replace(['jogging', 'sudadera', 'legging'], 'sportswear')
items['item_category'] = items['item_category'].replace('cuerpo', 'camiseta')   

Applying the functions to dataframe  ``composition``:

In [17]:
percent_to_float(composition, 'percent')

Unnamed: 0,item_code,part_name,material,percent
0,200000,EXTERIOR,algodon,1.00
1,200001,EXTERIOR,algodon,1.00
2,200002,EXTERIOR,viscosa,0.62
3,200002,EXTERIOR,fibra metalizada,0.37
4,200002,EXTERIOR,elastano,0.01
...,...,...,...,...
452,500036,EXTERIOR,algodon,0.97
453,500036,EXTERIOR,elastano,0.03
454,500037,EXTERIOR,poliester,0.67
455,500037,EXTERIOR,viscosa,0.29


In [18]:
lower_case(composition,'part_name')

Unnamed: 0,item_code,part_name,material,percent
0,200000,exterior,algodon,1.00
1,200001,exterior,algodon,1.00
2,200002,exterior,viscosa,0.62
3,200002,exterior,fibra metalizada,0.37
4,200002,exterior,elastano,0.01
...,...,...,...,...
452,500036,exterior,algodon,0.97
453,500036,exterior,elastano,0.03
454,500037,exterior,poliester,0.67
455,500037,exterior,viscosa,0.29


Now that we have the dataframes cleaned, we save them as .csv files in a folder clena inside data. 

In [19]:
composition.to_csv('C:/Users/manya/Documents/Ironhack/Course/Project-IV/data/clean/composition.csv', index = False)

In [20]:
items.to_csv('C:/Users/manya/Documents/Ironhack/Course/Project-IV/data/clean/items.csv', index = False)