# Pair Unión y Limpieza de Datos

Para realizar estos ejercicios deberéis usar el conjunto de datos de world-data-2023-part1.csv y el de world-data-2023-part2.csv.

## 1. Ejercicios Unión de Datos

Tienes a tu disposición dos conjuntos de datos, "world-data-2023-part1.csv" y "world-data-2023-part2.csv", que contienen información de una serie de indicadores y datos de distintos países. Tu tarea es explorar estos conjuntos de datos y determinar qué tienen en común en términos de columnas y datos.

Luego, debes crear un nuevo DataFrame que combine la información de ambos conjuntos de datos en un solo conjunto de datos. Para hacerlo, debes seleccionar el método de unión de Pandas que consideres más apropiado para esta situación y justificar por qué crees que ese método es el mejor en tu informe.

Asegúrate de realizar los siguientes pasos:

Explora y carga ambos conjuntos de datos en pandas DataFrames.

Identifica las columnas comunes entre los dos conjuntos de datos.

Utiliza el método de unión de Pandas que consideres más adecuado para combinar los datos de ambos años en un solo DataFrame.

Explica por qué elegiste ese método de unión y cómo se llevaron a cabo los pasos anteriores.


In [1]:
# importamos las librerías que necesitamos

# Tratamiento de datos
# -----------------------------------------------------------------------
import pandas as pd
import numpy as np


# Configuración
# -----------------------------------------------------------------------
pd.set_option('display.max_columns', None) # para poder visualizar todas las columnas de los DataFrames

In [2]:
df1 = pd.read_csv("files/world-data-2023_part1.csv", index_col= 0)
df2 = pd.read_csv("files/world-data-2023_part2.csv", index_col= 0)

In [3]:
df1.head()

Unnamed: 0,Country,Density\n(P/Km2),Abbreviation,Agricultural Land( %),Land Area(Km2),Armed Forces size,Birth Rate,Calling Code,Capital/Major City,Co2-Emissions,CPI,CPI Change (%),Currency-Code,Fertility Rate,Forested Area (%),Gasoline Price
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97


In [4]:
df2.head()

Unnamed: 0,GDP,Gross primary education enrollment (%),Gross tertiary education enrollment (%),Infant mortality,Largest city,Life expectancy,Maternal mortality ratio,Minimum wage,Official language,Out of pocket health expenditure,Physicians per thousand,Population,Population: Labor force participation (%),Tax revenue (%),Total tax rate,Unemployment rate,Urban_population,country,coordinates
0,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,Afghanistan,"('33.93911 ', '67.709953')"
1,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,Albania,"('41.153332 ', '20.168331')"
2,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,Algeria,"('28.033886 ', '1.659626')"
3,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,Andorra,"('42.506285 ', '1.521801')"
4,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,Angola,"('-11.202692 ', '17.873887')"


Parece que ambas tablas tienen en común la columna "Country". Vamos a investigar más sobre esta columna en ambos df.

In [5]:
df1.shape #compruebo si tienen el mimo número de países (países = num filas, en este caso parece q 195)

(195, 16)

In [6]:
df1["Country"].nunique() #y no se repiten

195

In [7]:
df2.shape #tienen 195 filas (=países)

(195, 19)

In [8]:
df2["country"].nunique() #y no se repiten

195

Para juntar los datos:

- ambos df coinciden en una columna = hay que hacer un `merge`. (nota: si coincidieran en el índice, habría que hacer `join`).

- tipo de merge: parece a simple vista que daría igual hacer un `inner` que un `left`, porque entendemos que son los mismos países en ambos dataframe. Si no me fiara de quien me diera los datos igual sí haría un bucle for para comparar si ambos datos son iguales, como por ejemplo, el siguiente:

(aunque entendemos que esto no es necesario)

In [9]:
coincidencias = []
disparidades = {}

for x,y in zip(df1['Country'], df2['country']): #parece que ambas columnas están ordenadas por orden alfabético
    if x == y:
        coincidencias.append(x)
    else:
        disparidades['x'] = y


#print('coincidencias:', coincidencias)
print(len(coincidencias))

print('disparidades:', disparidades)
print(len(disparidades))

195
disparidades: {}
0


In [10]:
# hacemos el merge. Por defecto coge el "inner", así que no le diremos nada

df_merge = df1.merge(df2, left_on = "Country", right_on = "country")

In [11]:
df_merge.head()

Unnamed: 0,Country,Density\n(P/Km2),Abbreviation,Agricultural Land( %),Land Area(Km2),Armed Forces size,Birth Rate,Calling Code,Capital/Major City,Co2-Emissions,CPI,CPI Change (%),Currency-Code,Fertility Rate,Forested Area (%),Gasoline Price,GDP,Gross primary education enrollment (%),Gross tertiary education enrollment (%),Infant mortality,Largest city,Life expectancy,Maternal mortality ratio,Minimum wage,Official language,Out of pocket health expenditure,Physicians per thousand,Population,Population: Labor force participation (%),Tax revenue (%),Total tax rate,Unemployment rate,Urban_population,country,coordinates
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,Afghanistan,"('33.93911 ', '67.709953')"
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,Albania,"('41.153332 ', '20.168331')"
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,Algeria,"('28.033886 ', '1.659626')"
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,Andorra,"('42.506285 ', '1.521801')"
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,Angola,"('-11.202692 ', '17.873887')"


## 2. Ejercicios de Limpieza

1. Después de la unión de datos, tenemos dos columnas de "country". Elimina una de ellas.

2. Los nombres de las columnas no son homogeneos. Cambia los nombres de las columnas de tal forma que:

- No tengan espacios.

- Estén en minúscula.

- No tengan paréntesis, es decir, quitar "(%)", "(Km2)".

- Algunas columnas tiene "\n". Eliminalos de los nombres de las columnas.

- Algunas columnas tienen ":". Eliminalos de los nombres de las columnas.

3. La columnas coordinates tiene la latitud y la longitud en una sola columna. Crea dos columnas nuevas, una con la longitud y otra con la latitud. Una vez hecho, elimina la columna de coordinates.

4. Las columnas unemployment_rate, total_tax_rate, tax_revenue, population_labor_force_participation, out_of_pocket_health_expenditure, gross_tertiary_education_enrollment, gross_primary_education_enrollment, forested_area, cpi_change, agricultural_land tienen "%". Elimina los "%" de los valores de las columnas.

5. Haz lo mismo para las columnas de gasoline_price, gdp, minimum_wage, pero eliminando "$".

6. Guarda el DataFrame para usarlo posteriormente.


In [12]:
#Después de la unión de datos, tenemos dos columnas de "country". Elimina una de ellas.
#voy a eliminar la que aparece en minúsculas, que es la que está al final del df:

df_merge.drop(columns = ['country'], inplace = True)

In [13]:
#compruebo:
df_merge.head()

Unnamed: 0,Country,Density\n(P/Km2),Abbreviation,Agricultural Land( %),Land Area(Km2),Armed Forces size,Birth Rate,Calling Code,Capital/Major City,Co2-Emissions,CPI,CPI Change (%),Currency-Code,Fertility Rate,Forested Area (%),Gasoline Price,GDP,Gross primary education enrollment (%),Gross tertiary education enrollment (%),Infant mortality,Largest city,Life expectancy,Maternal mortality ratio,Minimum wage,Official language,Out of pocket health expenditure,Physicians per thousand,Population,Population: Labor force participation (%),Tax revenue (%),Total tax rate,Unemployment rate,Urban_population,coordinates
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')"
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')"
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')"
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')"
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')"


Limpieza de nombres en las columnas:


- str.strip() = quita los espacios al principio y al final del string (strig = nombre columna)

- str.lower() = pone el string en minus

- str.split() = separa el string en un función del separador que le des. En este caso, cuando haya paréntesis, le decimos que se quede con todo lo que hay antes del paréntesis. Nos devolverá una lista dividida por el separador "(" Ej:Agricultural Land( %) =  [Agricultural Land, ( %)]. Nos quedaremos sólo con Agricultural Land con el `lista[0]`

- str.replace() = para los "\n" y ":". Nos reemplaza ese string por un espacio. 

In [14]:
#primero hago una lista de columnas:
lista_columnas = [col for col in df_merge.columns]

lista_columnas[:3]#que nos muestre los 3 primeros elementos, para ver si está ok.

['Country', 'Density\n(P/Km2)', 'Abbreviation']

In [15]:
#vamos a hacer una nueva lista de columnas limpia, para luego poder hacer el rename.
lista_columnas_limpia = []

for col in lista_columnas:
    col1 = col.strip().lower().split("(")
    col2 = str(col1[0]) #he tenido q convertir en lista pq las listas no tienen la propiedad replace que quiero aplicarle en el siguiente paso. sólo los strings
    
     #reemplazamos: 
    col3 = col2.replace('\n', '').replace(':', ' ').strip().replace(' ', '_')
    
    lista_columnas_limpia.append(col3)

print(lista_columnas_limpia)

['country', 'density', 'abbreviation', 'agricultural_land', 'land_area', 'armed_forces_size', 'birth_rate', 'calling_code', 'capital/major_city', 'co2-emissions', 'cpi', 'cpi_change', 'currency-code', 'fertility_rate', 'forested_area', 'gasoline_price', 'gdp', 'gross_primary_education_enrollment', 'gross_tertiary_education_enrollment', 'infant_mortality', 'largest_city', 'life_expectancy', 'maternal_mortality_ratio', 'minimum_wage', 'official_language', 'out_of_pocket_health_expenditure', 'physicians_per_thousand', 'population', 'population__labor_force_participation', 'tax_revenue', 'total_tax_rate', 'unemployment_rate', 'urban_population', 'coordinates']


In [16]:
#el rename sólo puede hacerse con un diccionario==> {antiguo valor: nuevo valor}
#por eso vamos a crear ese diccionario con la función zip():

dicc_col = {}

for x,y in zip(lista_columnas, lista_columnas_limpia):
    dicc_col[x] = y

dicc_col

{'Country': 'country',
 'Density\n(P/Km2)': 'density',
 'Abbreviation': 'abbreviation',
 'Agricultural Land( %)': 'agricultural_land',
 'Land Area(Km2)': 'land_area',
 'Armed Forces size': 'armed_forces_size',
 'Birth Rate': 'birth_rate',
 'Calling Code': 'calling_code',
 'Capital/Major City': 'capital/major_city',
 'Co2-Emissions': 'co2-emissions',
 'CPI': 'cpi',
 'CPI Change (%)': 'cpi_change',
 'Currency-Code': 'currency-code',
 'Fertility Rate': 'fertility_rate',
 'Forested Area (%)': 'forested_area',
 'Gasoline Price': 'gasoline_price',
 'GDP': 'gdp',
 'Gross primary education enrollment (%)': 'gross_primary_education_enrollment',
 'Gross tertiary education enrollment (%)': 'gross_tertiary_education_enrollment',
 'Infant mortality': 'infant_mortality',
 'Largest city': 'largest_city',
 'Life expectancy': 'life_expectancy',
 'Maternal mortality ratio': 'maternal_mortality_ratio',
 'Minimum wage': 'minimum_wage',
 'Official language': 'official_language',
 'Out of pocket health expe

In [17]:
#renombramos las columnas:

df_merge.rename(columns=dicc_col, inplace = True)

In [18]:
df_merge.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population__labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')"
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')"
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')"
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')"
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')"


La columna `coordinates` tiene la latitud y la longitud en una sola columna. Crea dos columnas nuevas, una con la longitud y otra con la latitud. Una vez hecho, elimina la columna de coordinates.

Para ello, utilizamos: 
```python
df[nombre_columna].str.split(',', expand = True).get(0,1)
```

- `.str`= permite aplicar a los valores de una columna el método de strings que le digas, en este caso, utilizamos el `.split()`

- `.split()`: indicasvalor a reemplazar y el valor por el cual quieres reemplazar. El split te transforma el valor en una lista. El `spand = True` nos crea una columna para cada elemento de la lista generado por el split: ["('33.93911 '", " '67.709953')"]. Nos creará dos columnas.




In [19]:
#el código anterior, habrá que igualarlo a dos nuevas columnas, para que aparezcan en el df:

df_merge[['latitud', 'longitud']] = df_merge['coordinates'].str.split(',', expand = True)

In [20]:
df_merge.head() #ahora tendremos que limpiar los datos de estas columnas nuevas.

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population__labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates,latitud,longitud
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')",('33.93911 ','67.709953')
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')",('41.153332 ','20.168331')
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')",('28.033886 ','1.659626')
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')",('42.506285 ','1.521801')
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')",('-11.202692 ','17.873887')


In [21]:
df_merge['latitud'] = df_merge['latitud'].str.replace('(', '').replace(' ', '')
df_merge['latitud'] = df_merge['latitud'].str.strip() 
#parece que este .strip no lo coge, esto es porque el dato no es de tipo string, sino de tipo object.
#voy a intentar cambiarlo a string en el siguiente pasito
df_merge.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population__labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates,latitud,longitud
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')",'33.93911 ','67.709953')
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')",'41.153332 ','20.168331')
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')",'28.033886 ','1.659626')
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')",'42.506285 ','1.521801')
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')",'-11.202692 ','17.873887')


In [22]:
#cambio el typo de dato de object a string:#esto sigue sin funcionar!!!!!!!!!
df_merge['latitud'] = df_merge['latitud'].astype(str).str.strip()

In [23]:
df_merge.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population__labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates,latitud,longitud
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')",'33.93911 ','67.709953')
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')",'41.153332 ','20.168331')
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')",'28.033886 ','1.659626')
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')",'42.506285 ','1.521801')
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')",'-11.202692 ','17.873887')


In [24]:
#ahora lo hago con la otra columna:
df_merge['longitud'] = df_merge['longitud'].str.replace(')', '').replace(' ', '')
df_merge.head()

Unnamed: 0,country,density,abbreviation,agricultural_land,land_area,armed_forces_size,birth_rate,calling_code,capital/major_city,co2-emissions,cpi,cpi_change,currency-code,fertility_rate,forested_area,gasoline_price,gdp,gross_primary_education_enrollment,gross_tertiary_education_enrollment,infant_mortality,largest_city,life_expectancy,maternal_mortality_ratio,minimum_wage,official_language,out_of_pocket_health_expenditure,physicians_per_thousand,population,population__labor_force_participation,tax_revenue,total_tax_rate,unemployment_rate,urban_population,coordinates,latitud,longitud
0,Afghanistan,60,AF,58.10%,652230,323000.0,32.49,93.0,Kabul,8672,149.9,2.30%,AFN,4.47,2.10%,$0.70,"$19,101,353,833",104.00%,9.70%,47.9,Kabul,64.5,638.0,$0.43,Pashto,78.40%,0.28,38041754,48.90%,9.30%,71.40%,11.12%,9797273,"('33.93911 ', '67.709953')",'33.93911 ','67.709953'
1,Albania,105,AL,43.10%,28748,9000.0,11.78,355.0,Tirana,4536,119.05,1.40%,ALL,1.62,28.10%,$1.36,"$15,278,077,447",107.00%,55.00%,7.8,Tirana,78.5,15.0,$1.12,Albanian,56.90%,1.2,2854191,55.70%,18.60%,36.60%,12.33%,1747593,"('41.153332 ', '20.168331')",'41.153332 ','20.168331'
2,Algeria,18,DZ,17.40%,2381741,317000.0,24.28,213.0,Algiers,150006,151.36,2.00%,DZD,3.02,0.80%,$0.28,"$169,988,236,398",109.90%,51.40%,20.1,Algiers,76.7,112.0,$0.95,Arabic,28.10%,1.72,43053054,41.20%,37.20%,66.10%,11.70%,31510100,"('28.033886 ', '1.659626')",'28.033886 ','1.659626'
3,Andorra,164,AD,40.00%,468,,7.2,376.0,Andorra la Vella,469,,,EUR,1.27,34.00%,$1.51,"$3,154,057,987",106.40%,,2.7,Andorra la Vella,,,$6.63,Catalan,36.40%,3.33,77142,,,,,67873,"('42.506285 ', '1.521801')",'42.506285 ','1.521801'
4,Angola,26,AO,47.50%,1246700,117000.0,40.73,244.0,Luanda,34693,261.73,17.10%,AOA,5.52,46.30%,$0.97,"$94,635,415,870",113.50%,9.30%,51.6,Luanda,60.8,241.0,$0.71,Portuguese,33.40%,0.21,31825295,77.50%,9.20%,49.10%,6.89%,21061025,"('-11.202692 ', '17.873887')",'-11.202692 ','17.873887'
