# Map, Filter y Reduce en Python

Vamos a hacer algunos ejemplos para entender como estos metodos hacen sinergia entre si para un procesamiento de datos con enfoque funcional. 

In [23]:
import pandas as pd
import numpy as np
from functools import reduce

In [11]:

# Cargar el dataset
df = pd.read_csv('datasets/global-missing-migrants-dataset.csv')
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13020 entries, 0 to 13019
Data columns (total 19 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Incident Type                        13020 non-null  object 
 1   Incident year                        13020 non-null  int64  
 2   Reported Month                       13020 non-null  object 
 3   Region of Origin                     12998 non-null  object 
 4   Region of Incident                   13020 non-null  object 
 5   Country of Origin                    13012 non-null  object 
 6   Number of Dead                       12470 non-null  float64
 7   Minimum Estimated Number of Missing  13020 non-null  int64  
 8   Total Number of Dead and Missing     13020 non-null  int64  
 9   Number of Survivors                  13020 non-null  int64  
 10  Number of Females                    13020 non-null  int64  
 11  Number of Males             

## Map

Vamos a usar map para convertir todos los nombres de las regiones a minúsculas

In [50]:
df['Region of Origin'].head(5)

0                  central america
1    latin america / caribbean (p)
2    latin america / caribbean (p)
3                  central america
4                  northern africa
Name: Region of Origin, dtype: object

In [52]:
df['Region of Origin'] = df['Region of Origin'].astype(str).map(lambda x: x.lower())

In [53]:
df['Region of Origin'].head(5)

0                  central america
1    latin america / caribbean (p)
2    latin america / caribbean (p)
3                  central america
4                  northern africa
Name: Region of Origin, dtype: object

Tambien podemos usar map() para crear una columna en base a otra. Por ejemplo, vamos a crear una columna nueva que indique si hubo niños involucrados en el incidente. La unica columna que nos puede dar informacion sobre esto es 'Number Of Children' asi que vamos a usarla 

In [49]:
def has_children(x):
    if x == 0:
        return 'no'
    else:
        return 'si'

df['Hubo niños'] = df['Number of Children'].astype(int).map(has_children)
df.head(10)

Unnamed: 0,Incident Type,Incident year,Reported Month,Region of Origin,Region of Incident,Country of Origin,Number of Dead,Minimum Estimated Number of Missing,Total Number of Dead and Missing,Number of Survivors,Number of Females,Number of Males,Number of Children,Cause of Death,Migration route,Location of death,Information Source,Coordinates,UNSD Geographical Grouping,Hubo niños
0,Incident,2014,January,central america,North America,Guatemala,1.0,0,1,0,0,1,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"31.650259, -110.366453",Northern America,no
1,Incident,2014,January,latin america / caribbean (p),North America,Unknown,1.0,0,1,0,0,0,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"31.59713, -111.73756",Northern America,no
2,Incident,2014,January,latin america / caribbean (p),North America,Unknown,1.0,0,1,0,0,0,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"31.94026, -113.01125",Northern America,no
3,Incident,2014,January,central america,North America,Mexico,1.0,0,1,0,0,1,0,Violence,US-Mexico border crossing,"near Douglas, Arizona, USA","Ministry of Foreign Affairs Mexico, Pima Count...","31.506777, -109.315632",Northern America,no
4,Incident,2014,January,northern africa,Europe,Sudan,1.0,0,1,2,0,1,0,Harsh environmental conditions / lack of adequ...,,Border between Russia and Estonia,EUBusiness (Agence France-Presse),"59.1551, 28",Northern Europe,no
5,Incident,2014,January,latin america / caribbean (p),North America,Unknown,1.0,0,1,0,0,0,0,Violence,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"32.45435, -113.18402",Northern America,no
6,Incident,2014,January,unknown,Mediterranean,"Afghanistan,Syrian Arab Republic",12.0,0,12,0,9,0,3,Drowning,Eastern Mediterranean,Waters near Greece while being towed back to T...,European Council on Refugees and Exiles,"37.2832, 27",Uncategorized,si
7,Incident,2014,January,latin america / caribbean (p),North America,Unknown,1.0,0,1,0,0,0,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"32.478317, -113.182833",Northern America,no
8,Incident,2014,January,latin america / caribbean (p),North America,Unknown,1.0,0,1,0,0,0,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"31.81154, -111.01101",Northern America,no
9,Incident,2014,January,latin america / caribbean (p),North America,Unknown,1.0,0,1,0,0,0,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"32.174017, -112.174583",Northern America,no


## Filter

Vamos a usar el metodo `filter` de Python solo para mostrar como se utilizaria en este contexto. Pandas tiene su propio metodo filter y en este caso seria mas apropiado y daria un codigo mas legible, pero como estamos aprendiendo, vamos a ver como seria. 

el metodo filter no acepta Dataframes de entrada, pero si acepta diccionarios, por lo que vamos a usar el metodo to_dict de DataFrame para convertirlo a diccionario y pasarselo al filtro junto con la funcion lambda para filtrar. Ademas, la salida del filtro no es un dataframe ni una lista sino un objeto Filter. Para poder recuperar la columna filtrada, tenemos que convertirla a lista primero y luego a DataFrame. 

In [33]:
central_america_incidents = pd.DataFrame(list(filter(
    lambda row: 'central america' in row['Region of Origin'],
    df.to_dict('records')
)))
central_america_incidents.head(5)

Unnamed: 0,Incident Type,Incident year,Reported Month,Region of Origin,Region of Incident,Country of Origin,Number of Dead,Minimum Estimated Number of Missing,Total Number of Dead and Missing,Number of Survivors,Number of Females,Number of Males,Number of Children,Cause of Death,Migration route,Location of death,Information Source,Coordinates,UNSD Geographical Grouping
0,Incident,2014,January,central america,North America,Guatemala,1.0,0,1,0,0,1,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"31.650259, -110.366453",Northern America
1,Incident,2014,January,central america,North America,Mexico,1.0,0,1,0,0,1,0,Violence,US-Mexico border crossing,"near Douglas, Arizona, USA","Ministry of Foreign Affairs Mexico, Pima Count...","31.506777, -109.315632",Northern America
2,Incident,2014,January,central america,North America,Mexico,1.0,0,1,0,0,1,0,Mixed or unknown,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"32.40291, -113.02935",Northern America
3,Incident,2014,February,central america,North America,Mexico,1.0,0,1,2,0,1,0,Violence,US-Mexico border crossing,"California-Mexico border near San Diego, Calif...",CNN,"32.5543, -117",Northern America
4,Incident,2014,March,central america,North America,Guatemala,1.0,0,1,0,1,0,0,Harsh environmental conditions / lack of adequ...,US-Mexico border crossing,Pima Country Office of the Medical Examiner ju...,Pima County Office of the Medical Examiner (PC...,"31.759, -110.51916",Northern America


Vamos a ampliar nuestro filtro para que ademas incluya solamente casos desde 2014. Antes vamos a buscar si la columna Incident Year esta apta para filtrar o si necesita alguna transformacion

In [34]:
df['Incident year'].value_counts()

Incident year
2022    2183
2019    1804
2021    1800
2018    1637
2017    1341
2020    1296
2016    1277
2015     821
2023     565
2014     296
Name: count, dtype: int64

El tipo de dato es el correcto y no tiene nulos, avanzamos con nuestro filtro ampliado

In [36]:
central_america_incidents_after_2014 = pd.DataFrame(list(filter(
    lambda row: ('central america' in row['Region of Origin']) & (row['Incident year'] > 2014), 
    df.to_dict('records')
)))
central_america_incidents_after_2014.head(5)

Unnamed: 0,Incident Type,Incident year,Reported Month,Region of Origin,Region of Incident,Country of Origin,Number of Dead,Minimum Estimated Number of Missing,Total Number of Dead and Missing,Number of Survivors,Number of Females,Number of Males,Number of Children,Cause of Death,Migration route,Location of death,Information Source,Coordinates,UNSD Geographical Grouping
0,Incident,2015,January,central america,Central America,Honduras,1.0,0,1,0,0,1,0,Violence,,"Gregorio Mendez, Tabasco, Mexico",La 72,"18.056331, -93.084468",Central America
1,Incident,2015,January,central america,Central America,Honduras,1.0,0,1,0,0,1,0,Violence,,"Near railway tracks, Villahermosa, Tabasco, Me...",Excelsior,"17.983326, -92.916667",Central America
2,Incident,2015,January,central america,Central America,El Salvador,1.0,0,1,9,0,1,1,Vehicle accident / death linked to hazardous t...,,"Highway out of Reforma, Tabasco, Mexico",Noticieros Televisa,"17.875807, -93.151203",Central America
3,Incident,2015,February,central america,North America,El Salvador,1.0,0,1,0,0,0,0,Mixed or unknown,US-Mexico border crossing,"AP Ranch, Webb County, Texas, USA",Operation Identification - U. of Texas,"27.87871582, -99.142943",Northern America
4,Incident,2015,February,central america,Central America,Honduras,1.0,0,1,0,0,1,0,Mixed or unknown,,La Cuchilla sector in municipality of Concepci...,Prensa Libre,"14.633328, -92.133327",Central America


## Reduce

Vamos a usar Reduce ahora para obtener algunas metricas. Comencemos por el numero de fallecidos totales dentro de nuestro filtro

In [37]:
total_dead_missing = reduce(lambda x, y: x + y, central_america_incidents['Total Number of Dead and Missing'])
print(f'Total de migrantes muertos o desaparecidos en America Central desde 2014: {total_dead_missing}')


Total de migrantes muertos o desaparecidos en America Central desde 2014: 2109


Veamos ahora el numero maximo de fallecidos en la region en un solo incidente

In [47]:
total_dead_missing = reduce(lambda x, y: x if x > y else y, central_america_incidents['Total Number of Dead and Missing'])
print(f'Maximo de fallecidos en un mismo incidente en America Central desde 2014: {total_dead_missing}')

Maximo de fallecidos en un mismo incidente en America Central desde 2014: 64


para ver el incidente podemos buscarlo con .loc

In [48]:
central_america_incidents.loc[central_america_incidents['Total Number of Dead and Missing'] == 64]

Unnamed: 0,Incident Type,Incident year,Reported Month,Region of Origin,Region of Incident,Country of Origin,Number of Dead,Minimum Estimated Number of Missing,Total Number of Dead and Missing,Number of Survivors,Number of Females,Number of Males,Number of Children,Cause of Death,Migration route,Location of death,Information Source,Coordinates,UNSD Geographical Grouping
20,Cumulative Incident,2014,August,central america,Central America,El Salvador,64.0,0,64,0,0,0,0,Mixed or unknown,,Mexico (likely),Permanent Mission of el Salvador to the UN in ...,"19.1452, -101.074",Central America


Podemos obtener la frecuencia de casos por pais dentro de la region, usando el modulo Counter(). Existe una alternativa mas amigable con groupBy, pero usando reduce y Counter(), se veria asi. 

In [46]:
from collections import Counter
region_counts = reduce(lambda x, y: x.update([y]) or x, central_america_incidents['Country of Origin'], Counter())
pd.DataFrame(region_counts.most_common(10), columns=['Pais', 'Incidentes'])

Unnamed: 0,Pais,Incidentes
0,Mexico,705
1,Honduras,307
2,Guatemala,296
3,El Salvador,132
4,Nicaragua,62
5,Unknown,29
6,"Guatemala,Honduras",4
7,"Guatemala,Honduras,Mexico",3
8,"El Salvador,Mexico",3
9,"Honduras,Mexico",2


Para aprender a usar el modulo Counter de Python recomiendo este tutorial: https://realpython.com/python-counter/ o, ver la documentacion oficial https://docs.python.org/es/3.8/library/collections.html#collections.Counter

# Ejercicios propuestos

- Usar la funcion map() para crear una columna nueva en base al mes, que de el numero del mes (January es 1, February es 2, etc...)
- Crear un filtro con la funcion filter() que tome unicamente los incidentes donde hubo violencia
- Usar reduce() para obtener todos los fallecidos en casos de violencia