# Ingesta de datos para los ejercicios

Vamos a crear el íncide que utilizaremos en los ejercicios e ingestar los datos necesarios para poder ejecutar las consultas.

Vamos a almacenar recetas de cocina. Lo primero que vamos a ver es el formato de los datos que vamos a insertar.

In [14]:
import pandas as pd

df = pd.read_json("../../data/elasticsearch/recipes/recipes_copia.json", lines = True)
df.head()

Unnamed: 0,author,date,description,ingredients,instructions,picture_link,rating,summary,title,url
0,Hetty McKinnon,July 2021,Grilled broccoli is one of life’s simple pleas...,"[8 large eggs, 1½ lb. broccoli (about 3 small ...",[Bring a medium pot of water to a boil. Carefu...,https://assets.epicurious.com/photos/60ff0fc09...,"{'ratingValue': '0', 'bestRating': '4', 'worst...","{'yield': '4 Servings', 'active-time': '22 min...",Egg Salad With Grilled Broccoli and Chili Crisp,http://www.epicurious.com/recipes/food/views/e...
1,Hetty McKinnon,April 2021,This chile–oat crisp can be used like chile oi...,"[3 shallots, finely diced, 2 garlic cloves, fi...","[To make the chili crisp, place the shallots, ...",https://assets.epicurious.com/photos/6107f447d...,"{'ratingValue': '4', 'bestRating': '4', 'worst...",,Chili Crisp With Oats,http://www.epicurious.com/recipes/food/views/h...
2,Katie Button,October 2016,Esqueixar means “to shred” and that’s what’s d...,"[1 lemon, 1½ teaspoon honey, ½ teaspoon kosher...",[To make the lemon vinaigrette: To make the vi...,https://assets.epicurious.com/photos/60fecf88d...,"{'ratingValue': '0', 'bestRating': '4', 'worst...",{'yield': 'Serves 4 as a main dish or 8 as a s...,Esqueixada de Montaña (Cured Trout With Tomato...,http://www.epicurious.com/recipes/food/views/c...
3,Salma Hage,April 2016,A colorful treat that I often made for my gran...,"[2⁄3 cup (5 fl oz/150 ml) pomegranate juice, 2...",[Pour equal amounts of pomegranate juice into ...,https://assets.epicurious.com/photos/60fee41bf...,"{'ratingValue': '0', 'bestRating': '4', 'worst...","{'yield': '6 ice pops (ice lollies)', 'active-...",Pomegranate-Yogurt Ice Pops,http://www.epicurious.com/recipes/food/views/p...
4,Katie Button,October 2016,My chefs and I like to joke that this salad ha...,"[1 tablespoon honey, 1½ tablespoons reserve sh...","[Whisk the honey, vinegar, and ½ teaspoon salt...",https://assets.epicurious.com/photos/60fee610f...,"{'ratingValue': '0', 'bestRating': '4', 'worst...",{'yield': 'Serves 8 as a small plate'},Ensalada de Sandía y Tomate (Watermelon Tomato...,http://www.epicurious.com/recipes/food/views/w...


## Paso 1: Crear el índice

Para crear el índice ejecuta la siguiente sentencia usando la herramienta dev tools de Kibana.

`
PUT recipes
{
    "aliases": {},
    "settings" : { 
        "index" : {
            "number_of_shards" : 4,
            "number_of_replicas" : 1
        },
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "tokenizer": "autocomplete",
              "filter": [
                "lowercase"
              ]
            },
            "autocomplete_search": {
              "tokenizer": "lowercase"
            }
          },
          "tokenizer": {
            "autocomplete": {
              "type": "ngram",
              "min_gram": 1,
              "max_gram": 2,
              "token_chars": [
                "letter"
              ]
            }
          }
        }
    },
    "mappings": {
      "properties": {
        "author": {
          "type": "keyword",
          "eager_global_ordinals": true,
          "fields": {
            "text": {
              "type": "text"
            }
          }
        },
        "date": {
          "type": "date",
          "format": "[MMMM yyyy]",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "description": {
          "type": "text",
          "analyzer": "english"
        },
        "ingredients": {
          "type": "text",
          "analyzer": "english"
        },
        "instructions": {
          "type": "text",
          "analyzer": "english"
        },
        "picture_link": {
          "type": "keyword"
        },
        "rating": {
          "properties": {
            "bestRating": {
              "type": "float"
            },
            "prepareAgainPct": {
              "type": "float"
            },
            "ratingValue": {
              "type": "float"
            },
            "worstRating": {
              "type": "float"
            }
          }
        },
        "recipe_id": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "summary": {
          "properties": {
            "active-time": {
              "type": "keyword"
            },
            "total-time": {
              "type": "keyword"
            },
            "yield": {
              "type": "keyword"
            }
          }
        },
        "title": {
          "type": "text",
          "analyzer": "english",
          "fields": {
            "keyword": {
              "type": "keyword"
            },
            "suggestion": {
              "type": "completion",
              "analyzer": "english",
              "preserve_separators": false,
              "preserve_position_increments": false,
              "max_input_length": 50
            },
            "ngram": {
            	"type": "text",
            	"analyzer": "autocomplete",
        		"search_analyzer": "autocomplete_search"
            }
          }
        },
        "url": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
`

## Paso 2: Levantar Logstash

Vamos a ingestar los datos usando Logsetash usando una imagen de docker para ejecutarlo:

* Para que pueda encontrar el servicio de Elasticsearch vamos a añadir el conteneror a la red de nuestro laboratorio, `--network=curso-els-paradigma_default`. Puedes consultar la red creada por tu compose usando el comando `docker networks ls`
* Montamos el volumen donde se encuentra nuestro fichero con el pipeline y los referenciamos a la carpeta del contenedor donde Logstash espera encontrar esa configuración, `-v /Users/rgarrote/desarrollo/datahack-nosql/work/data/elasticsearch/web_logs/pipeline/:/usr/share/logstash/pipeline/`.
* Montamos el volumen donde dejaremos los ficheros de log a parsear. `-v /Users/rgarrote/desarrollo/datahack-nosql/work/data/elasticsearch/web_logs/data/:/tmp/data/`.

In [None]:
docker run --rm -it --network=curso-els-paradigma_default \
    -v /Users/rgarrote/desarrollo/cursoELS/curso-els-paradigma/work/data/elasticsearch/recipies/pipeline/:/usr/share/logstash/pipeline/ \
    -v /Users/rgarrote/desarrollo/cursoELS/curso-els-paradigma/work/data/elasticsearch/recipies/data/:/tmp/data/ \
docker.elastic.co/logstash/logstash:8.3.3

## Paso 3: Ingestar los datos.

Una vez que Logstash haya levantado y esté listo para procesar ficheros, copia el fichero que encontrarás en la ruta `work/data/elasticsearch/recipes/recipes.json` en la carpeta `workdata/elasticsearch/recipies/data`.

Por cada documento ingestado se mostrará un punto en la pantalla.


## Paso 4: Comprobar que el proceso se está realizando correctamente

1. Comprueba en Kibana que se ha ceado el íncide recipes.
2. Desde la sección de Index Management averigua cuantas recetas se han insertado en el índice recipes de Elasticsearch.