# Map Visualization in Python

> Map Visualization

- toc: true 
- badges: true
- comments: true
- categories: [Map Visualization]
- image: images/sql_1.png

## 1) Import libraries

In [112]:
import json
import requests
import pandas as pd
import plotly.express as px

## 2) Data preparation 

The original dataset source: https://raw.githubusercontent.com/suchith91/wdi/master/WDI_Data_Selected.csv

### 2.1) Load dataset

In [113]:
# This line will disappear in the portfolio page
# Step 1: Import dataset
data = pd.read_csv("https://raw.githubusercontent.com/leonardodecastro/data/main/WDI_Data_Selected.csv", encoding='cp1252').drop(['Indicator Code'], axis= 1)

# Step 2: Change dataset to allow for the use of map libraries
world_data_df = pd.melt(data, id_vars=['Country Name', 'Country Code','Indicator Name'], var_name='Year', value_name='Indicator Value')
world_data_df['Year'] = world_data_df['Year'].astype('int')
world_data_df.head(2)

Unnamed: 0,Country Name,Country Code,Indicator Name,Year,Indicator Value
0,Arab World,ARB,CO2 emissions (metric tons per capita),1960,0.644
1,Arab World,ARB,Exports of goods and services (% of GDP),1960,


### 2.2) Create datasets for each topic

In [114]:
# This line will disappear in the portfolio page
# Create dataset 1: CO2 emissions per year
CO2_emissions_df = world_data_df[world_data_df['Indicator Name'] == 'CO2 emissions (metric tons per capita)']

# Create dataset 2: Exports of goods per year
export_perc_GDP_df = world_data_df[world_data_df['Indicator Name'] == 'Exports of goods and services (% of GDP)']

# Create dataset 3: Forest area per year
land_use_perc_df = world_data_df[world_data_df['Indicator Name'] == 'Forest area (% of land area)']

# Create dataset 4: GDP Growth per year
GDP_growth_df = world_data_df[world_data_df['Indicator Name'] == 'GDP growth (annual %)']

# Create dataset 5: Imports of goods per year
import_perc_GDP_df = world_data_df[world_data_df['Indicator Name'] == 'Imports of goods and services (% of GDP)']

# Create dataset 6: Poverty headcount per year
poverty_perc_pop_df = world_data_df[world_data_df['Indicator Name'] == 'Poverty headcount ratio at national poverty lines (% of population)']

# Create dataset 7: Unemployment per year
unemployment_perc_df = world_data_df[world_data_df['Indicator Name'] == 'Unemployment, total (% of total labor force) (national estimate)']

# Create dataset 8: Youth literacy per year
youth_literacy_perc_df = world_data_df[world_data_df['Indicator Name'] == 'Youth literacy rate, population 15-24 years, both sexes (%)']
youth_literacy_perc_df.head(2)

Unnamed: 0,Country Name,Country Code,Indicator Name,Year,Indicator Value
7,Arab World,ARB,"Youth literacy rate, population 15-24 years, b...",1960,
15,Caribbean small states,CSS,"Youth literacy rate, population 15-24 years, b...",1960,


### 2.3) Create value ranges for certain visualizations

In [115]:
# This line will disappear in the portfolio page
# Create value ranges for dataset 3: GDP Growth per year
bins= [GDP_growth_df['Indicator Value'].min()-1, -10, -5, 0, 5, 10, GDP_growth_df['Indicator Value'].max()+1]
labels = ['< -10 %' , '-10% to -5%', '-5% to 0%','0% to 5%','5% to 10%','> 10%']
GDP_growth_df.loc[:,'Growth Ranges']= pd.cut(GDP_growth_df['Indicator Value'], bins=bins, labels=labels, right=False).astype('str')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



## 3) Time Series Visualizations (choropleth)

### 3.1.1) Using continuous color schemes

We need to use ISO codes with 3 letters for plotly.express to work properly

In [116]:
# This line will disappear in the portfolio page
# Step 1: Create visualization
fig = px.choropleth(GDP_growth_df[GDP_growth_df['Year']>=1961],         # Limit the analysis to years for which that is plenty of data
                    locations="Country Code",                           # Column where country code with 3 letters can be found
                    color="Indicator Value",                            # Indicator Value is the numerical value we want to examine
                    hover_name="Country Code",                          # Column to add to hover information
                    animation_frame='Year',                             # Show column that will be used in the animation frame
                    color_continuous_scale=px.colors.diverging.RdYlGn,  # Select the type of divergent color scheme to be used
                    height = 700)                                       # Adjust the size of the figure

# Step 2: Control the speed of the transitions
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 1000
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 10

# Step 3: Add title to the plot
fig.update_layout(title_text='GDP Yealy Growth (%)', title_x=0.5)
fig.show()

### 3.1.2) Using discrete color schemes

In [117]:
# This line will disappear in the portfolio page
# Step 1: Create a dictionary to map colors to each of the categories
color_discrete_dict = {'nan': '#4d4d4d', '< -10 %': '#d73027', '-10% to -5%' : '#fc8d59', '-5% to 0%' : '#fee08b',
                                          '0% to 5%' : '#d9ef8b', '5% to 10%' : '#91cf60', '> 10%' : '#1a9850'}

# Step 2: Create a dictionary with order of the legend labels
category_orders_dict = {'Growth Ranges' : ['nan', '< -10 %' , '-10% to -5%', '-5% to 0%','0% to 5%','5% to 10%', '> 10%']}

# Step 3: Create visualization
fig = px.choropleth(GDP_growth_df[GDP_growth_df['Year']>=1961],        # Limit the analysis to years for which that is plenty of data
                    locations="Country Code",                          # Column where country code with 3 letters can be found
                    color="Growth Ranges",                             # Indicator Value is the numerical value we want to examine
                    color_discrete_map = color_discrete_dict,          # Dictionary to map colors to each of the categories
                    category_orders= category_orders_dict,             # Dictionary with order of the legend labels
                    hover_name="Country Code",                         # Column to add to hover information
                    animation_frame='Year',                            # Show column that will be used in the animation frame
                    height = 700)                                      # Adjust the size of the figure

# Step 4: Control the speed of the transitions
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 1000
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 10

# Step 5: Add title to the plot
fig.update_layout(title_text='GDP Yealy Growth (%)', title_x=0.5)

fig.show()

### 3.1.3) More than one animated map side by side

We first select 2 variables that we seek to investigate and create a dataframe that contain both of them. 

In [118]:
# This line will disappear in the portfolio page
# Step 1: Select 2 variables that we seek to investigate
data = world_data_df[world_data_df['Indicator Name'].isin(['Unemployment, total (% of total labor force) (national estimate)','GDP growth (annual %)'])]

# Step 2: Encode these variables in terms of quantile for and easier interpration of these variables when compared later on
data['Quantile'] = data.groupby(['Indicator Name'])['Indicator Value'].transform(lambda x: pd.qcut(x, q=[0,.2,.4,.6,.8,1], labels=['Q1','Q2','Q3','Q4','Q5']))

# Step 3: Make sure the categories are in the string format
data['Quantile'] = data['Quantile'].astype('str')

# Step 4: Change certain terms for better visualization later
data['Indicator Name'] = data['Indicator Name'].map({'GDP growth (annual %)':'GDP Growth (annual %)','Unemployment, total (% of total labor force) (national estimate)':'Unemployment (% of total labor force)'})



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Create the visualization to evaluate if **GDP Growth** is often negatively correlated with **Unemployment Rates**. 

In [119]:
# This line will disappear in the portfolio page
# Step 1: Create a dictionary to map colors to each of the categories
color_discrete_dict = {'nan': '#4d4d4d', 'Q1': '#d7191c', 'Q2' : '#fdae61', 'Q3' : '#ffffbf',
                                          'Q4' : '#a6d96a', 'Q5' : '#1a9641'}

# Step 2: Create a dictionary with order of the legend labels
category_orders_dict = {'Quantile' : ['nan', 'Q1','Q2','Q3','Q4','Q5']}

# Step 3: Create visualization
fig = px.choropleth(data[data['Year'].isin(list(range(1980,2015)))], # Limit the analysis to years for which that is plenty of data
                    locations="Country Code",                        # Column where country code with 3 letters can be found
                    color="Quantile",                                # Indicator Value is the numerical value we want to examine
                    color_discrete_map = color_discrete_dict,        # Dictionary to map colors to each of the categories
                    category_orders= category_orders_dict,           # Dictionary with order of the legend labels
                    hover_name="Country Code",                       # Column to add to hover information
                    animation_frame='Year',                          # Show column that will be used in the animation frame
                    facet_col = 'Indicator Name')                    # Feature that determines the split into 2 columns with different types of info

# Step 4: Control the speed of the transitions
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 1000
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 10

# Step 5: Prevent redundant legend
names = set()
fig.for_each_trace(lambda trace: trace.update(showlegend=False) if (trace.name in names) else names.add(trace.name))

# Step 6: Prevent "Indicator Name" from appearing as the label of the fact column
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

# Step 7: Add title to the plot
fig.update_layout(title_text='Growth VS Unemployment', title_x=0.5)

fig.show()

## 4) Time Series Visualizations (Movement Tracking)

In [120]:
import pandas as pd

us_cities = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/us-cities-top-1k.csv")
us_cities = us_cities.query("State in ['New York', 'Ohio']")

import plotly.express as px

fig = px.line_mapbox(us_cities, lat="lat", lon="lon", color="State", zoom=3, height=300)

fig.update_layout(mapbox_style="stamen-terrain", mapbox_zoom=4, mapbox_center_lat = 41,
    margin={"r":0,"t":0,"l":0,"b":0})

fig.show()

In [121]:
import pandas as pd
import plotly.express as px
import io

road = pd.read_csv(io.StringIO("""lat,lon
49.9138598,8.6546538
49.9056928,8.6609511
49.9137963,8.6547367
49.9031554,8.6602001
49.9036356,8.6605441
49.9101238,8.6713172
49.9031909,8.6578803
49.9031801,8.6584378
49.908962,8.657051
49.9031524,8.6603878
49.908962,8.657029
49.9089642,8.6571283
49.9089628,8.6570803
49.9089621,8.656988
49.9031758,8.6585998
49.9031293,8.6629121
49.9031306,8.6628223
49.9089685,8.657193
49.9061892,8.6563651
49.9061913,8.6564502
49.9089642,8.6571283
49.9090377,8.6580812
49.904322,8.6607755
49.9089749,8.6572878
49.9122419,8.666437
49.9090437,8.6589592
49.909642,8.6555025
49.9031476,8.6612382
49.9033839,8.6557096
49.9033763,8.6557774
49.9113246,8.6590449"""))

fig = px.scatter_mapbox(road, lat="lat", lon="lon",
                        hover_name=road.index,
                        hover_data={'lat':False, 'lon':False}, zoom=11, height=500)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

tsp = [0, 2, 30, 25, 21, 23, 17, 20, 11, 12, 8, 10, 13, 26, 18, 19, 28, 29, 6, 7, 14, 3, 9, 27, 16, 15, 4, 22, 1, 5, 24]

fig.add_traces(px.line_mapbox(road.loc[tsp], lat="lat", lon="lon").data)

https://www.kaggle.com/code/docxian/visualize-seabird-tracks/data

In [122]:
df = pd.read_csv('anon_gps_tracks_with_dive.csv')

In [123]:
df_short = df[df['bird'].isin([9,74,107])][['lat','lon','bird','date_time']]

In [124]:
df.head(1)

Unnamed: 0.1,Unnamed: 0,lat,lon,alt,unix,bird,species,year,date_time,max_depth.m,colony2,coverage_ratio,is_dive,is_dive_1m,is_dive_2m,is_dive_4m,is_dive_5m,is_dive_0m
0,1,56.095451,-6.233089,-23.059999,1340627854,1,tCOGU,t2012,2012-06-25 13:37:34,-2.172046,1,0.5,False,False,False,False,False,False


In [125]:
df_short['date_time'] = pd.to_datetime(df_short['date_time'])
df_short.set_index("date_time",inplace=True)

In [126]:
df_final = df_short.groupby('bird').resample('30min').mean()
df_final = df_final.reset_index(level=1)
df_final['bird'] = df_final['bird'].astype('int').astype('str')

In [127]:
df_final.shape

(272, 4)

In [128]:
fig = px.scatter_geo(df_final,
                    lat = 'lat',
                    lon = 'lon',
                    color = 'bird')
fig.update_geos(fitbounds='locations')

fig.show()

In [129]:
df_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 272 entries, 9 to 107
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   date_time  272 non-null    datetime64[ns]
 1   lat        272 non-null    float64       
 2   lon        272 non-null    float64       
 3   bird       272 non-null    object        
dtypes: datetime64[ns](1), float64(2), object(1)
memory usage: 10.6+ KB


In [130]:
df_final['date_time'] = df_final['date_time'].astype('str')

In [131]:
fig = px.scatter_geo(df_final,
                    lat = 'lat',
                    lon = 'lon',
                    color = 'bird',
                    animation_frame = 'date_time')
fig.update_geos(fitbounds='locations')
fig.show()

In [135]:
from plotly.subplots import make_subplots
import numpy as np

In [136]:
import plotly.graph_objects as go

In [137]:
rows = 3
cols = 2
fig = make_subplots(rows=rows, cols=cols, specs = [[{'type': 'choropleth'} for c in np.arange(cols)] for r in np.arange(rows)],
                    subplot_titles = races, horizontal_spacing = 0.05, vertical_spacing = 0.05)

In [138]:
for i, race in enumerate(races):
    result = GDP_growth_df[GDP_growth_df.Year == race]
    
    fig.add_trace(go.Choropleth(locations=result['Country Code'],
        z = result['Indicator Value'],
        locationmode = 'ISO-3',
        marker_line_color='white',
        colorbar_title = "GDP Growth by Year",
    ), row = i//cols+1, col = i%cols+1)

In [139]:
fig.update_layout(title_text = 'GDP Growth by Year', margin={'l': 0, 'r': 0, 't': 100, 'b': 0}, height=1000)

for index, trace in enumerate(fig.data):
    fig.data[index].hovertemplate = 'State: %{location}<br>GDP Growth by Year: %{z:.2f}<extra></extra>'

fig.show()

In [140]:
import plotly.express as px

df = px.data.election()
geojson = px.data.election_geojson()

colorscales = [
    ((0.0, '#636efa'), (1.0, '#636efa')),
    ((0.0, '#EF553B'), (1.0, '#EF553B')),
    ((0.0, '#00cc96'), (1.0, '#00cc96'))
]

fig = go.Figure()
for i, winner in enumerate(df['winner'].unique()):
    dfp = df[df['winner'] == winner]
    fig.add_choroplethmapbox(geojson=geojson, locations=dfp['district'],
                             z=[i,] * len(dfp), featureidkey="properties.district",
                             showlegend=True, name=winner,
                             colorscale=colorscales[i], showscale=False)

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0},
                  mapbox=dict(style="carto-positron", zoom=9,
                              center={"lat": 45.5517, "lon": -73.7073},))

fig.show()

In [141]:
from plotly.subplots import make_subplots

fig_t1 = px.choropleth(GDP_growth_df[GDP_growth_df['Year']>=1961],
                    locations="Country Code",        # Column where country code with 3 letters can be found
                    color="Indicator Value",         # Indicator Value is the numerical value we want to examine
                    hover_name="Country Code",       # Column to add to hover information
                    animation_frame='Year',          # Show column that will be used in the animation frame
                    title='GDP Yealy Growth (%)',
                    color_continuous_scale=px.colors.diverging.RdYlGn)

fig_t2 = px.choropleth(GDP_growth_df[GDP_growth_df['Year']>=1961],
                    locations="Country Code",        # Column where country code with 3 letters can be found
                    color="Indicator Value",         # Indicator Value is the numerical value we want to examine
                    hover_name="Country Code",       # Column to add to hover information
                    animation_frame='Year',          # Show column that will be used in the animation frame
                    title='GDP Yealy Growth (%)',
                    color_continuous_scale=px.colors.diverging.RdYlGn)

fig = make_subplots(rows=1, cols=2, subplot_titles=['Premier tour', 'Second tour'], specs=[[{'type': 'choropleth'}, {'type': 'choropleth'}]])

fig.add_trace(fig_t1['data'][0], row=1, col=1)
fig.add_trace(fig_t2['data'][0], row=1, col=2)

fig.update_layout(coloraxis_showscale=True) # update
fig.update_layout(title_text='Élection Présidentielle 2022 - Vainqueur par département', margin={'l': 0, 'r': 0, 't': 100, 'b': 0}, height=500)

fig.show()

In [142]:
import plotly.graph_objects as go
import pandas as pd

df_shootings = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv')

state_count = df_shootings.groupby(['state', 'race']).size().reset_index(name='total')

races = pd.DataFrame({'W': 'White, non-Hispanic',
    'B': 'Black, non-Hispanic',
    'A': 'Asian',
    'N': 'Native American',
    'H': 'Hispanic'}, index=[0])
races
fig = go.Figure()
layout = dict(
    title_text = "Fatal Police Shootings Data",
    geo_scope='usa',
)

for index, race in enumerate(races):
    result = state_count[['state', 'total']][state_count.race == race]
    geo_key = 'geo'+str(index+1) if index != 0 else 'geo'  
    fig.add_trace(
        go.Choropleth(
            locations=result.state,
            z = result.total,
            locationmode = 'USA-states', # set of locations match entries in `locations`
            marker_line_color='white',
            colorbar_title = "Shooting deaths",
            geo=geo_key,
            name=races[race].values[0],
            coloraxis = 'coloraxis',
        )
    )
    
    layout[geo_key] = dict(
        scope = 'usa',
        domain = dict( x = [], y = [] ),
    )

layout
z = 0
COLS = 3
ROWS = 2
for y in reversed(range(ROWS)):
    for x in range(COLS):
        geo_key = 'geo'+str(z+1) if z != 0 else 'geo'
        layout[geo_key]['domain']['x'] = [float(x)/float(COLS), float(x+1)/float(COLS)]
        layout[geo_key]['domain']['y'] = [float(y)/float(ROWS), float(y+1)/float(ROWS)]
        z=z+1
        if z > 4:
            break
            
fig.update_layout(layout)   
fig.show()

In [143]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np

df_shootings = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/fatal-police-shootings-data.csv')

state_count = df_shootings.groupby(['state', 'race']).size().reset_index(name='total')

races = pd.DataFrame({'W': 'White, non-Hispanic',
    'B': 'Black, non-Hispanic',
    'A': 'Asian',
    'N': 'Native American',
    'H': 'Hispanic'}, index=[0])

In [144]:
rows = 2
cols = 3
fig = make_subplots(
    rows=rows, cols=cols,
    specs = [[{'type': 'choropleth'} for c in np.arange(cols)] for r in np.arange(rows)],
    subplot_titles = list(races.loc[0,:]))

In [145]:
for i, race in enumerate(races):
    result = state_count[['state', 'total']][state_count.race == race]
    fig.add_trace(go.Choropleth(
        locations=result.state,
        z = result.total,
        locationmode = 'USA-states', # set of locations match entries in `locations`
        marker_line_color='white',
        zmin = 0,
        zmax = max(state_count['total']),
        colorbar_title = "Shooting deaths",
    ), row = i//cols+1, col = i%cols+1)

In [146]:
fig.update_layout(
    title_text = 'Shooting Deaths by Race',
    **{'geo' + str(i) + '_scope': 'usa' for i in [''] + np.arange(2,rows*cols+1).tolist()},)

for index, trace in enumerate(fig.data):
    fig.data[index].hovertemplate = 'State: %{location}<br>Shooting deaths: %{z:.2f}<extra></extra>'
fig.show()

In [147]:
!pip install raceplotly
from raceplotly.plots import barplot

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting raceplotly
  Downloading raceplotly-0.1.7-py3-none-any.whl (7.3 kB)
Installing collected packages: raceplotly
Successfully installed raceplotly-0.1.7


In [148]:
CO2_emissions_df = world_data_df[world_data_df['Indicator Name'] == 'CO2 emissions (metric tons per capita)']

In [149]:
CO2_emissions_df_short = CO2_emissions_df[CO2_emissions_df['Year'].isin(list(range(1961,2014)))]

In [150]:
my_raceplot = barplot(CO2_emissions_df_short,  item_column='Country Name', value_column='Indicator Value', time_column='Year')
my_raceplot.plot(item_label = 'Top Country', value_label = 'CO2 Emissions per Capita', frame_duration = 600)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

