# Del entendimiento de los datos a su preparación

## Objetivos

*   Entender los datos
*   Preparar los datos para análisis e inferencias.


 Continuaremos aprendiendo sobre la metodología de la ciencia de datos y nos centraremos en las etapas **Entendimiento de los datos** y **Preparación de los datos**.


## Contenido


1.  [Recapitulación](#0)<br>
2.  [Entendimiento de los datos (data understanding)](#2)<br>
3.  [Preparación de los datos (data preparation)](#4)<br>

<hr>


# Recapitulación <a id="0"></a>


 Hemos aprendido que los datos que necesitamos para responder a la pregunta desarrollada en la etapa de comprensión de negocio, a saber, *¿podemos automatizar el proceso de determinación de la cocina de una receta dada?*, están fácilmente disponibles. Un investigador llamado Yong-Yeol Ahn extrajo decenas de miles de recetas de comida (cocinas e ingredientes) de tres sitios web diferentes:


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0103EN-SkillsNetwork/labs/Module%202/images/lab2_fig3_allrecipes.png" width=500>
<div align="center">
www.allrecipes.com
</div>
<br/><br/>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0103EN-SkillsNetwork/labs/Module%202/images/lab2_fig4_epicurious.png" width=500>
<div align="center">
www.epicurious.com
</div>
<br/><br/>
<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0103EN-SkillsNetwork/labs/Module%202/images/lab2_fig5_menupan.png" width=500>
<div align="center">
www.menupan.com
</div>
<br/><br/>


Para obtener más información sobre Yong-Yeol Ahn y su investigación, podemos leer su artículo [Flavor Network and the Principles of Food Pairing](http://yongyeol.com/papers/ahn-flavornet-2011.pdf?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDS0103ENSkillsNetwork20083987-2022-01-01) (en inglés).


# Entendimiento de los datos (data understanding) <a id="2"></a>


<figure style = 'text-align:center'><img src="Datos/metodologia_ent.jpg" width=350></figure>


In [26]:
import pandas as pd # importamos la libreria para leer datos cómo dataframes
import numpy as np # importamos numpy 
import re # importamos libraria para expresiones regulares (regular expression)

In [27]:
recipes = pd.read_csv('Datos/recipes.csv')

In [28]:
recipes.head()

Unnamed: 0,country,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
1,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
2,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
3,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
4,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No


In [29]:
recipes.shape

(57691, 383)

 Nuestro conjunto de datos(<i>dataset</i>) consta de 57,691 recetas(<i>recipes</i>). Cada fila representa una receta, y para cada receta, se documenta el tipo de cocina correspondiente, y si existen 384 ingredientes en la receta o no, comenzando con almendra y terminando con calabacín.

Sabemos que una receta básica de sushi incluye los ingredientes:

* arroz
* salsa de soja
* wasabi
* algo de pescado/verduras


Verificamos que estos ingredientes existan en nuestro dataframe:

In [30]:
ingredientes = list(recipes.columns.values)

print([match.group(0) for ingrediente in ingredientes for match in [(re.compile(".*(rice).*")).search(ingrediente)] if match])
print([match.group(0) for ingrediente in ingredientes for match in [(re.compile(".*(wasabi).*")).search(ingrediente)] if match])
print([match.group(0) for ingrediente in ingredientes for match in [(re.compile(".*(soy).*")).search(ingrediente)] if match])

['brown_rice', 'licorice', 'rice']
['wasabi']
['soy_sauce', 'soybean', 'soybean_oil']


Sí, existen:

*   El arroz(rice) existe como rice.
*   wasabi existe como wasabi.
*   La soja(soy) existe como soy_sauce.

Entonces, tal vez si una receta contiene los tres ingredientes: rice(arroz), wasabi y soy_sauce(salsa de soya), entonces podemos decir con confianza que la receta pertenece a la cocina **japonesa**. Querremos tener esto en cuenta.
***


# Preparación de los datos(data preparation) <a id="4"></a>

<figure style = 'text-align:center'><img src="Datos/metodologia_prep.jpg" width=350></figure>



En esta sección, prepararemos los datos para la siguiente etapa en la metodología de la ciencia de datos, que es el modelado. Esta etapa implica explorar más los datos y asegurarse de que estén en el formato correcto para el algoritmo de aprendizaje automático que seleccionamos en la etapa de enfoque analítico, que fueron los árboles de decisión.

Primero observamos los datos para determinar si hace falta limpiarlos.

In [32]:
recipes["country"].value_counts() # Tabla de frecuencia

American        40150
Mexico           1754
Italian          1715
Italy            1461
Asian            1176
                ...  
Indonesia          12
Belgium            11
East-African       11
Israel              9
Bangladesh          4
Name: country, Length: 69, dtype: int64

Al observar la tabla anterior, podemos hacer las siguientes observaciones:

1. La columna cuisine está etiquetada como country, lo cual es impreciso.
2. Los nombres de las cocinas no son consistentes, ya que no todos comienzan con una primera letra mayúscula.
3. Algunas cocinas se duplican como variación del nombre del país, como Vietnam y Vietnamita.
4. Algunas cocinas tienen muy pocas recetas.

#### Vamos a arreglar estos problemas.


    1- Arreglamos el nombre de la columna que muestra la cocina(cuisine) como 'country'.

In [33]:
recipes.columns.values[0] 

'country'

In [34]:
nombres_columnas = recipes.columns.values
nombres_columnas[0] = "cuisine"
recipes.columns = nombres_columnas

recipes

Unnamed: 0,cuisine,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
1,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
2,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
3,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
4,Vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57686,Japan,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
57687,Japan,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
57688,Japan,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
57689,Japan,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No


Hacemos que todos los nombres de la cocina esten en minusculas.

In [35]:
recipes["cuisine"] = recipes["cuisine"].str.lower()

Hacemos los nombres más consistentes

In [36]:
recipes.loc[recipes["cuisine"] == "austria", "cuisine"] = "austrian"
recipes.loc[recipes["cuisine"] == "belgium", "cuisine"] = "belgian"
recipes.loc[recipes["cuisine"] == "china", "cuisine"] = "chinese"
recipes.loc[recipes["cuisine"] == "canada", "cuisine"] = "canadian"
recipes.loc[recipes["cuisine"] == "netherlands", "cuisine"] = "dutch"
recipes.loc[recipes["cuisine"] == "france", "cuisine"] = "french"
recipes.loc[recipes["cuisine"] == "germany", "cuisine"] = "german"
recipes.loc[recipes["cuisine"] == "india", "cuisine"] = "indian"
recipes.loc[recipes["cuisine"] == "indonesia", "cuisine"] = "indonesian"
recipes.loc[recipes["cuisine"] == "iran", "cuisine"] = "iranian"
recipes.loc[recipes["cuisine"] == "italy", "cuisine"] = "italian"
recipes.loc[recipes["cuisine"] == "japan", "cuisine"] = "japanese"
recipes.loc[recipes["cuisine"] == "israel", "cuisine"] = "israeli"
recipes.loc[recipes["cuisine"] == "korea", "cuisine"] = "korean"
recipes.loc[recipes["cuisine"] == "lebanon", "cuisine"] = "lebanese"
recipes.loc[recipes["cuisine"] == "malaysia", "cuisine"] = "malaysian"
recipes.loc[recipes["cuisine"] == "mexico", "cuisine"] = "mexican"
recipes.loc[recipes["cuisine"] == "pakistan", "cuisine"] = "pakistani"
recipes.loc[recipes["cuisine"] == "philippines", "cuisine"] = "philippine"
recipes.loc[recipes["cuisine"] == "scandinavia", "cuisine"] = "scandinavian"
recipes.loc[recipes["cuisine"] == "spain", "cuisine"] = "spanish_portuguese"
recipes.loc[recipes["cuisine"] == "portugal", "cuisine"] = "spanish_portuguese"
recipes.loc[recipes["cuisine"] == "switzerland", "cuisine"] = "swiss"
recipes.loc[recipes["cuisine"] == "thailand", "cuisine"] = "thai"
recipes.loc[recipes["cuisine"] == "turkey", "cuisine"] = "turkish"
recipes.loc[recipes["cuisine"] == "vietnam", "cuisine"] = "vietnamese"
recipes.loc[recipes["cuisine"] == "uk-and-ireland", "cuisine"] = "uk-and-irish"
recipes.loc[recipes["cuisine"] == "irish", "cuisine"] = "uk-and-irish"

recipes

Unnamed: 0,cuisine,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
1,vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
2,vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
3,vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
4,vietnamese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57686,japanese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
57687,japanese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
57688,japanese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No
57689,japanese,No,No,No,No,No,No,No,No,No,...,No,No,No,No,No,No,No,No,No,No


    2- Removemos las cocinas con menos de 50 recetas.


In [37]:
recipes["cuisine"].value_counts()

american                   40150
italian                     3250
mexican                     2390
french                      1264
asian                       1193
east_asian                   951
korean                       799
canadian                     774
indian                       598
western                      450
chinese                      442
spanish_portuguese           416
uk-and-irish                 368
southern_soulfood            346
japanese                     320
jewish                       320
thai                         289
german                       289
mediterranean                289
scandinavian                 250
middleeastern                248
central_southamerican        241
eastern-europe               235
greek                        225
english_scottish             204
caribbean                    183
easterneuropean_russian      146
cajun_creole                 146
moroccan                     137
african                      115
southweste

In [38]:
# tomamos una lista de las cocinas a dejar
recipes_counts = recipes["cuisine"].value_counts()
cuisines_indices = recipes_counts > 50

cuisines_to_keep = list(np.array(recipes_counts.index.values)[np.array(cuisines_indices)])

In [39]:
filas_antes = recipes.shape[0] # numero de celdas del dataframe original
print("El número de filas del dataframe original es {}.".format(filas_antes))

recipes = recipes.loc[recipes['cuisine'].isin(cuisines_to_keep)]

filas_despues = recipes.shape[0] # number of rows of processed dataframe
print("El número de filas del dataframe procesado es {}.".format(filas_despues))

print("{} celdas removidas.".format(filas_antes - filas_despues))

El número de filas del dataframe original es 57691.
El número de filas del dataframe procesado es 57394.
297 celdas removidas.


      3 - Convertimos todos los [Yes] a 1, y todos los [No] a 0

In [40]:
recipes = recipes.replace(to_replace="Yes", value=1)
recipes = recipes.replace(to_replace="No", value=0)

#### Analicemos los datos un poco más para aprender mejor de estos y anotar cualquier observación preliminar interesante.


Ejecutamos el código para ver las recetas que contienen **rice**(arroz) *y* **soy**(soja) *y* **wasabi** *y* **seaweed**(algas).


In [41]:
recipes.head()

Unnamed: 0,cuisine,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
0,vietnamese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,vietnamese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,vietnamese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,vietnamese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,vietnamese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [42]:
check_recipes = recipes.loc[
    (recipes["rice"] == 1) &
    (recipes["soy_sauce"] == 1) &
    (recipes["wasabi"] == 1) &
    (recipes["seaweed"] == 1)]

check_recipes

Unnamed: 0,cuisine,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
11306,japanese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
11321,japanese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
11361,japanese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12171,asian,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
12385,asian,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13010,asian,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13159,asian,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13513,japanese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13586,japanese,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
13625,east_asian,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Basandonos en el resultado, podemos clasificar todas las recetas que contienen **rice** *y* **soy** *y* **wasabi** *y* **seaweed** como **Japanese cusine**?<br>

    · No, porque otras recetas cómo asian y east_asian pueden contener también estos ingredientes.


Vamos a contar los ingredientes entre todas las recetas.

In [43]:
recipes.iloc[:, 1:].sum(axis = 0)

angelica           1
anise            223
anise_seed        87
apple           2420
apple_brandy      37
                ... 
wood              33
yam               85
yeast           3385
yogurt          1033
zucchini        1102
Length: 382, dtype: int64

In [44]:
#sumamos cada columna de ingredientes
ing = recipes.iloc[:, 1:].sum(axis=0)

In [45]:
# definimos cada columna como una serie de pandas
ingredient = pd.Series(ing.index.values, index = np.arange(len(ing)))
count = pd.Series(list(ing), index = np.arange(len(ing)))

# create the dataframe
ing_df = pd.DataFrame(dict(ingredient = ingredient, count = count))
ing_df = ing_df[["ingredient", "count"]]
print(ing_df.to_string())

                  ingredient  count
0                   angelica      1
1                      anise    223
2                 anise_seed     87
3                      apple   2420
4               apple_brandy     37
5                    apricot    620
6                   armagnac     11
7                  artemisia     13
8                  artichoke    391
9                  asparagus    460
10                   avocado    660
11                     bacon   2169
12              baked_potato      9
13                      balm      3
14                    banana    989
15                    barley    266
16             bartlett_pear     23
17                     basil   3842
18                       bay   1463
19                      bean   1992
20                     beech      1
21                      beef   4902
22                beef_broth    845
23                beef_liver     10
24                      beer    307
25                      beet    233
26               bell_pepper

Now we have a dataframe of ingredients and their total counts across all recipes. Let's sort this dataframe in descending order.
<br>
Ahora tenemos un dataframe de nuestros ingredientes y cantidad total en todas las recetas. Ordenamos el dataframe en orden descendente.

In [46]:
ing_df.sort_values(["count"], ascending=False, inplace=True)
ing_df.reset_index(inplace=True, drop=True)

print(ing_df)

        ingredient  count
0              egg  21022
1            wheat  20775
2           butter  20715
3            onion  18078
4           garlic  17351
..             ...    ...
377    roasted_nut      1
378  roasted_pecan      1
379       geranium      1
380       angelica      1
381         durian      0

[382 rows x 2 columns]


#### Cúales son los 3 ingredientes más populares?


1.  Egg(huevo) con <strong>21,025</strong> apariciones.

2.  Wheat(harina) con <strong>20,781</strong> apariciones.

3.  Manteca con <strong>20,719</strong> apariciones.


Debemos notar que hay un problema con la tabla de arriba. En nuestro <i>dataframe</i> hay 40000 recetas de cocina americana(american), lo que significa que nuestros datos estan sesgados(<i>biased</i>) hacia los ingredientes de cocina americana.


**Por lo tanto** vamos a computar un sumario más objetivo de los ingredientes, por medio de observar a cada ingrediente por cocina. 


#### Vamos a crear un perfil para cada una de las cocinas.

In other words, let's try to find out what ingredients Chinese people typically use, and what is **Canadian** food for example.
En otras palabras, tratemos de encontrar que ingredientes usan tipicamente las personas de China, y, por ejemeplo, cúal es la cocina **canadiense**(canadian)


In [47]:
cuisines = recipes.groupby("cuisine").mean()
cuisines.head()

Unnamed: 0_level_0,angelica,anise,anise_seed,apple,apple_brandy,apricot,armagnac,artemisia,artichoke,asparagus,...,whiskey,white_bread,white_wine,whole_grain_wheat_flour,wine,wood,yam,yeast,yogurt,zucchini
cuisine,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
african,0.0,0.0,0.0,0.034783,0.0,0.069565,0.0,0.0,0.0,0.0,...,0.0,0.008696,0.043478,0.008696,0.017391,0.0,0.008696,0.017391,0.0,0.034783
american,2.5e-05,0.003014,0.000573,0.052055,0.000623,0.011308,0.0001,0.0,0.006351,0.007522,...,0.002964,0.006874,0.030809,0.014819,0.011009,0.000672,0.001445,0.068219,0.016912,0.01863
asian,0.0,0.000838,0.002515,0.012573,0.0,0.005029,0.0,0.0,0.0,0.019279,...,0.000838,0.001676,0.038558,0.001676,0.124895,0.0,0.001676,0.004191,0.010897,0.011735
cajun_creole,0.0,0.0,0.0,0.006849,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.006849,0.082192,0.0,0.191781,0.0,0.006849,0.034247,0.006849,0.0
canadian,0.0,0.0,0.0,0.036176,0.0,0.002584,0.0,0.0,0.001292,0.007752,...,0.002584,0.003876,0.029716,0.020672,0.003876,0.0,0.001292,0.067183,0.01938,0.011628


Cómo observamos anteriormente, hemos creado un dataframe donde cada fila es una cocina y cada columna (excepto por la primera) es un ingrediente, y los valores de las filas representan el porcentaje de cada ingrediente por cocina correspondiente.

**Por ejemplo**:

*   *almond* se encuentra presente en 15.65% de todas las recetas **African**.
*   *butter* se encuentra presente en 38.11% de todas las recetas **Canadian**.


Let's print out the profile for each cuisine by displaying the top four ingredients in each cuisine.


In [48]:
num_ingredients = 4 # define number of top ingredients to print

# define a function that prints the top ingredients for each cuisine
def print_top_ingredients(row):
    print(row.name.upper())
    row_sorted = row.sort_values(ascending=False)*100
    top_ingredients = list(row_sorted.index.values)[0:num_ingredients]
    row_sorted = list(row_sorted)[0:num_ingredients]

    for ind, ingredient in enumerate(top_ingredients):
        print("%s (%d%%)" % (ingredient, row_sorted[ind]), end=' ')
    print("\n")

# aplicar funciones al dataframe de cocinas
create_cuisines_profiles = cuisines.apply(print_top_ingredients, axis=1)

AFRICAN
onion (53%) olive_oil (52%) garlic (49%) cumin (42%) 

AMERICAN
butter (41%) egg (40%) wheat (39%) onion (29%) 

ASIAN
soy_sauce (49%) ginger (48%) garlic (47%) rice (41%) 

CAJUN_CREOLE
onion (69%) cayenne (56%) garlic (48%) butter (36%) 

CANADIAN
wheat (39%) butter (38%) egg (35%) onion (34%) 

CARIBBEAN
onion (51%) garlic (50%) black_pepper (31%) vegetable_oil (31%) 

CENTRAL_SOUTHAMERICAN
garlic (56%) onion (54%) cayenne (51%) tomato (41%) 

CHINESE
soy_sauce (68%) ginger (53%) garlic (52%) scallion (48%) 

EAST_ASIAN
garlic (55%) soy_sauce (50%) scallion (49%) cayenne (47%) 

EASTERN-EUROPE
wheat (53%) egg (52%) butter (48%) onion (45%) 

EASTERNEUROPEAN_RUSSIAN
butter (60%) egg (50%) wheat (49%) onion (38%) 

ENGLISH_SCOTTISH
butter (67%) wheat (62%) egg (53%) cream (41%) 

FRENCH
butter (50%) egg (44%) wheat (37%) olive_oil (27%) 

GERMAN
wheat (64%) egg (60%) butter (47%) onion (34%) 

GREEK
olive_oil (76%) garlic (44%) onion (36%) lemon_juice (33%) 

INDIAN
cumin (60%

A este punto, podemos decir que entendemos los datos bien y que los datos estan listos y es en el formado correcto para crear los modelos.

***
