**CURSO**: Análisis Geoespacial, Departamento de Geociencias y Medio Ambiente, Universidad Nacional de Colombia - sede Medellín <br/>
**Profesor**: Edier Aristizábal (evaristizabalg@unal.edu.co) <br />
**Credits**: The content of this notebook is taken from [Sharone Li](https://towardsdatascience.com/concatenate-multiple-and-messy-dataframes-efficiently-80847b4da12b) and [Shinichi Okada](https://towardsdatascience.com/how-to-create-an-animated-choropleth-map-with-less-than-15-lines-of-code-2ff04921c60b#189e). Every effort has been made to trace copyright holders of the materials used in this book. The author apologies for any unintentional omissions and would be pleased to add an acknowledgment in future editions. 


# Plotly
Plotly Express makes it easy to create animated graphs. The official document says that “Although Plotly Express supports animation for many chart and map types, smooth inter-frame transitions are today only possible for scatter and bar.”

In [None]:
import glob
import numpy as np
import os
import pandas as pd
all_files = glob.glob("/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2*.csv")

In [None]:
all_files

['/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2019.csv',
 '/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2020.csv',
 '/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2018.csv',
 '/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2022.csv',
 '/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2021.csv',
 '/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2016.csv',
 '/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2017.csv',
 '/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/2015.csv']

In [None]:
import pathlib
col_names=[]

for txt_file in pathlib.Path("/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/").glob('2*.csv'):
    df.columns=[col.lower() for col in list(df.columns)]
    col_names.append(df.columns.values.tolist())
      
col_names

Next, let’s use for loop to read all the files into pandas dataframes. If these datasets all have the same column names and the columns are in the same order, we can easily concatenate them using pd.concat(). Let’s check if this is the case using the following code (notice that in line 4 I changed all the column names to lower-case for the convenience of checking).

In [None]:
dfs=[]
for i,filename in enumerate(all_files, start=0):
    df=pd.read_csv(filename)
    df.columns=[col.lower() for col in list(df.columns)]
    country_col = [col for col in df.columns if 'country' in col]
    score_col=[col for col in df.columns if any(str == col for str in ['score','happiness_score', 'happiness.score', 'happiness score','ladder score'])]
    
    df1= pd.concat([df[country_col], df[score_col]],axis=1)
    year_col=[*range(2015, 2023, 1)]
    df1['year']=year_col[i]
    
    dfs.append(df1)
    dfs[i].columns = ['country','happiness_score','year']

Line 1: We create an empty list dfs. We will later append each dataframe to this list with for loop iterations.
Line 6–7: we iterate and extract the ‘country’ column and the ‘happiness score’ column from each dataframe based on certain patterns observed from their column names.
Line 9–11: we create a new dataframe df1 that only has the three columns we need — country_col, score_col, and year_col.
Line 13: we append each of the 8 pre-processed dataframes to the list dfs with for loop iterations.
Line 14: we assign the same column names to all the dataframes in the list so that we can concatenate them in the next step which requires identical column names.

In [None]:
dfs[0].head()

Unnamed: 0,country,happiness_score,year
0,Finland,7.769,2015
1,Denmark,7.6,2015
2,Norway,7.554,2015
3,Iceland,7.494,2015
4,Netherlands,7.488,2015


Finally, with all the dataframes having the same column names and order, we can concatenate them easily using the pd.concat() method. In line 1, we concatenate all the dataframes in the list of dfs, and in line 2–3 we clean the ‘happiness score’ and ‘country’ columns to make sure our data is ready for visualization in the next section.

In [None]:
df_all = pd.concat([dfs[i] for i in range(8)], axis=0)
#df_all['happiness_score']=np.where(df_all['year']==2022, df_all['happiness_score'].str.replace(',','.'),df_all['happiness_score'])
df_all['country'] = df_all['country'].str.replace(r'*', '')

  This is separate from the ipykernel package so we can avoid doing imports until


In [None]:
import plotly.express as px

df_ISO=pd.read_csv('/content/drive/MyDrive/CATEDRA/ANALISISGEOESPACIAL/AnalisisGeoespacial/data/countries_continents_codes_flags_url.csv')
df_final = pd.merge(df_all, df_ISO, how="left", on=["country"])
df_final=df_final[['country','happiness_score','year','alpha-3']]
df_final.dropna(inplace=True)

fig = px.choropleth(df_final,
                    locations='alpha-3', 
                    color='happiness_score',
                    color_continuous_scale="rdylgn", 
                    animation_frame='year'
                    )

fig.update_layout(
      title_text = 'World Happiness (2015-2022)',
      title_font_family="Times New Roman",
      title_font_size = 22,
      title_font_color="black", 
      title_x=0.46, 
         )

fig.show()

##Murder rate in the US

In [10]:
import plotly.express as px
import pandas as pd
df = pd.read_csv('https://gist.githubusercontent.com/shinokada/f01139d3a024de375ede23cec5d52360/raw/424ac0055ed71a04e6f45badfaef73df96ad0aad/CrimeStatebyState_1960-2014.csv')
df = df[(df['State']!= 'District of Columbia' )]
df

Unnamed: 0,State,Year,Population,Violent_crime_total,Murder_and_nonnegligent_Manslaughter,Murder_per100000,Legacy_rape_/1,Revised_rape_/2,Robbery,Aggravated_assault,State_code
0,Alabama,1960,3266740,6097,406,12.428292,281,,898,4512,AL
1,Alabama,1961,3302000,5564,427,12.931557,252,,630,4255,AL
2,Alabama,1962,3358000,5283,316,9.410363,218,,754,3995,AL
3,Alabama,1963,3347000,6115,340,10.158351,192,,828,4755,AL
4,Alabama,1964,3407000,7260,316,9.275022,397,,992,5555,AL
...,...,...,...,...,...,...,...,...,...,...,...
2795,Wyoming,2010,564554,1117,8,1.417048,162,,77,870,WY
2796,Wyoming,2011,567356,1245,18,3.172611,146,,71,1010,WY
2797,Wyoming,2012,576626,1161,14,2.427917,154,,61,932,WY
2798,Wyoming,2013,583223,1212,17,2.914837,144,204.0,74,917,WY


In [11]:
px.choropleth(df, 
              locations = 'State_code',
              color="Murder_per100000", 
              animation_frame="Year",
              color_continuous_scale="Inferno",
              locationmode='USA-states',
              scope="usa",
              range_color=(0, 20),
              title='Crime by State',
              height=600
             )

##Animated choropleth world

Plotly has sample datasets and we’re going to use the Gapminder.

In [12]:
import plotly.express as px 
gapminder = px.data.gapminder()
display(gapminder)

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,AFG,4
1,Afghanistan,Asia,1957,30.332,9240934,820.853030,AFG,4
2,Afghanistan,Asia,1962,31.997,10267083,853.100710,AFG,4
3,Afghanistan,Asia,1967,34.020,11537966,836.197138,AFG,4
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,AFG,4
...,...,...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306,ZWE,716
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786,ZWE,716
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960,ZWE,716
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623,ZWE,716


In [13]:
import plotly.express as px
gapminder = px.data.gapminder()
px.choropleth(gapminder,               
              locations="iso_alpha",               
              color="lifeExp",
              hover_name="country",  
              animation_frame="year",    
              color_continuous_scale='Plasma',  
              height=600             
)