# Analysis and Visualization of Complex Agro-Environmental Data
---
## Visualization of Geospatial Data 



### 1. The `Geopandas` module

The `Geopandas` introduces some GIS functionalities into `python`. It extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by the `shapely` module. It further depends on the `fiona` module for file access and `matplotlib` for plotting.

We will show how to import shapefiles and merge tables, using as an example a visualization of human population density in portuguese municipalities.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import matplotlib.pyplot as plt
import geopandas as gpd
from matplotlib_scalebar.scalebar import ScaleBar
import matplotlib.patches as mpatches


%matplotlib inline

Importing shapefiles (you may need to install `geopandas` and `mapclassify`)

In [None]:
# Load the shapefile containing Portuguese civil parishes ("freguesias") into a GeoDataFrame
# NOTE: Update the path if your file is located somewhere else
port_regions = gpd.read_file("Shapes/DGT/CAOP_2020.shp")

# Display the first 5 rows of the GeoDataFrame to inspect its structure and contents
port_regions.head()

Plot map

In [None]:
# Plot the geometries from the GeoDataFrame using GeoPandas' built-in plotting functionality
# 'figsize' sets the size of the figure in inches (width=15, height=15)
port_regions.plot(figsize=(5, 5))

Change colors

In [None]:
# Plot the polygons in the GeoDataFrame with custom color styling
# 'color' fills each polygon with dark green
# 'edgecolor' draws the borders of the polygons in black
port_regions.plot(figsize=(15, 15), color="darkgreen", edgecolor="black")

Adding some of the most relevant map items

In [None]:

# Create a figure and axis with a custom size (15x15 inches)
fig, ax = plt.subplots(figsize=(15, 15))

# Plot the GeoDataFrame on the created axes
# 'legend=True' is typically used with categorical or numeric columns,
# but doesn't do anything meaningful unless a 'column=' is specified.
port_regions.plot(ax=ax, legend=True)

# Add a title to the map with a custom font size
ax.set_title("Portuguese Parishes Map", fontsize=20)

# Add a scale bar to the lower right corner of the plot
# The first argument is the scale: 1 means 1 map unit = 1 meter
# Adjust the scale if your CRS is not in meters (e.g., lat/lon)
scalebar = ScaleBar(1, location='lower right')  
ax.add_artist(scalebar)
# Note: If your shapefile uses a geographic CRS (e.g., EPSG:4326 with degrees), then ScaleBar(1) will be inaccurate, because degrees ≠ meters.
#       You may need to reproject first:

# Add a simple north arrow
# x, y are relative positions in axes coordinates (0 to 1); arrow_length determines the arrow height
# Define relative position for north arrow (axes coordinates)
x, y, arrow_length = 0.1, 0.9, 0.1
ax.annotate('N',
            xy=(x, y), xytext=(x, y - arrow_length),  # Arrow from bottom to top
            arrowprops=dict(facecolor='black', width=5, headwidth=15),  # Arrow styling
            ha='center', va='center',  # Center-align the text
            fontsize=16,
            xycoords=ax.transAxes)  # Coordinates relative to the axes frame

# Optional: Remove axis ticks and labels for a cleaner map look
# ax.set_xticks([])
# ax.set_yticks([])
# ax.set_xlabel('')
# ax.set_ylabel('')

# Display the plot
plt.show()

Import shapefile of portuguese municipalities (polygon vector layer)

In [None]:
# Load shapefile of Portuguese municipalities ("Concelhos") into a GeoDataFrame
# NOTE: Update the file path if your shapefile is in a different location
port_munic = gpd.read_file("Shapes/DGT/Concelhos_dd.shp")

# Display the first 5 rows of the GeoDataFrame to inspect columns and geometry
port_munic.head()

Convert polygon vector layer to point vector layer (using the centroid)

In [None]:
# Create a copy of the GeoDataFrame containing municipalities
# This avoids modifying the original GeoDataFrame
port_munic_cent = port_munic.copy()

# Replace the 'geometry' column with the centroids of each polygon
# The centroid is a point representing the center of each municipality polygon
port_munic_cent.geometry = port_munic_cent['geometry'].centroid

# Ensure the coordinate reference system (CRS) of the new GeoDataFrame
# is the same as the original, to maintain spatial referencing consistency
port_munic_cent.crs = port_munic.crs

# Display the first 5 rows of the GeoDataFrame with centroid geometries
port_munic_cent.head()

Plot the map

In [None]:
# Plot the polygons from the municipalities GeoDataFrame (size - width=15 inches, height=15 inches)
port_munic.plot(figsize=(15, 15))

Plot map and centroids

In [None]:
# Plot the municipalities polygons and capture the matplotlib Axes object
ax = port_munic.plot(figsize=(15, 15))

# Plot the centroids (points) on the same Axes 'ax'
# color="white" fills the points with white
# alpha=0.7 sets the transparency to 70%
# ax=ax ensures both layers plot on the same figure
port_munic_cent.plot(color="white", alpha=0.7, ax=ax)

Import csv table with population density per municipality in Portugal

In [None]:
# Read a CSV file containing population density data by municipality
# 'sep=";"' specifies that the CSV uses semicolons as separators (common in some locales)
# 'encoding="CP1252"' sets the character encoding to Windows Latin-1 (useful for special characters)
# Change the file path to where your CSV is stored
dens_pop = pd.read_csv("Shapes/Dens_pop_municipal.csv", sep=";", encoding="CP1252")

# Display the first 5 rows of the DataFrame to inspect its structure and contents
dens_pop.head()

Join the table with the imported shapefiles (polygons and centroids)

In [None]:
# Merge the municipalities GeoDataFrame with the population density DataFrame

# Join on 'Concelho' column in port_munic and 'Nome' column in dens_pop
# This adds population density attributes to the spatial polygons GeoDataFrame
port_munic_denspop = port_munic.merge(dens_pop, left_on="Concelho", right_on="Nome")

# Similarly, merge the centroids GeoDataFrame with population density data
# This attaches density data to the centroid points GeoDataFrame
port_munic_denspop_cent = port_munic_cent.merge(dens_pop, left_on="Concelho", right_on="Nome")

# Display the first 5 rows of the merged GeoDataFrame (with polygons and density data)
port_munic_denspop.head()

# Notes: .merge() performs a database-style join between GeoDataFrame and DataFrame on specified columns.
#        left_on is the key column in the left GeoDataFrame (port_munic / port_munic_cent).
#        right_on is the key column in the right DataFrame (dens_pop).
#        After merging, port_munic_denspop and port_munic_denspop_cent contain spatial geometries and 
#        opulation density attributes.

Create a cloropleth map

In [None]:
# Plot the GeoDataFrame containing municipalities with population density data

# column="2021" specifies the data column to use for coloring the polygons (population density for 2021)
# legend=True adds a legend to the plot indicating the color scale for the "2021" values
port_munic_denspop.plot(figsize=(10, 10), 
                        column="2021", 
                        legend=True
                        )


Change the choropleth classification scheme

In [None]:
# Plot the GeoDataFrame with municipalities colored by the "2021" column values

# scheme='quantiles' classifies the data into quantile bins for color breaks 
# Note: 'scheme' requires the 'mapclassify' package to be installed for classification
port_munic_denspop.plot(figsize=(10, 10), 
                        column="2021",
                        legend=True,
                        scheme='quantiles'  # classification scheme for choropleth
                       )
#Notes: scheme='quantiles' divides the data into bins so that each bin contains approximately the same number of observations (equal quantiles)

Change the color palette

In [None]:
# Plot the GeoDataFrame with municipalities colored by the "2021" population density column

# cmap='OrRd' sets the colormap to "Orange-Red" sequential palette for better visualization
# scheme='quantiles' classifies data into quantile bins for choropleth coloring (requires mapclassify)
port_munic_denspop.plot(figsize=(10, 10), 
                        column="2021", 
                        legend=True, 
                        cmap='OrRd',         # Orange-Red color palette
                        scheme='quantiles'   # Use quantile classification for breaks
                       )
#Notes: the cmap parameter can be used to customize the color scheme (e.g., cmap='plasma').

Create a scatter plot map

In [None]:
# Create a plot of the municipalities polygons and return the Axes object.
ax = port_munic.plot(figsize=(10, 10)) # ax is the matplotlib axes object used to overlay more plots

# On the same axes, plot the municipality centroids colored by the "2021" population density data

# ax=ax overlays this plot on the existing axes to combine both layers
port_munic_denspop_cent.plot(column="2021", 
                             legend=True,
                             scheme='quantiles',
                             ax=ax  #Both plots share the same axes (ax=ax) so they overlay correctly.
                            )

Create a bubble plot map

In [None]:
# Overlay a second layer: centroids of municipalities with population density values
ax = port_munic.plot(figsize=(10, 10))

# alpha=0.4 makes the points partially transparent
# legend=True and scheme='quantiles' DO NOT apply to point plots and will be ignored or raise warnings
# ax=ax ensures both layers are plotted on the same axes
port_munic_denspop_cent.plot(markersize="2021", 
                             color="pink",
                             alpha=0.4,
#                             legend=True,        # Does NOT work with point geometries
#                             scheme='quantiles', # Does NOT work with point geometries
                             ax=ax
                            )

### 2. The `Folium` module

`Folium` makes it easy to visualize data that has been manipulated in Python on an interactive `leaflet` map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map.

The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON and TopoJSON overlays.

A useful feature of Folium is that it provides easy functionality to export an interactive map to HTML, making it a useful tool in web development.

In [None]:
import folium
import math

We first need to add lat and long to the 'port_munic_denspop_cent' attribute table

In [None]:
# Create a new column "Long" by extracting the x-coordinate (longitude) from the centroid geometry

# This is required because Folium expects separate lat/lon values for placing markers or circles
port_munic_denspop_cent["Long"] = port_munic_denspop_cent['geometry'].x
# Create a new column "Lat" by extracting the y-coordinate (latitude) from the centroid geometry
port_munic_denspop_cent["Lat"] = port_munic_denspop_cent['geometry'].y

# Display the first few rows of the updated DataFrame to check the new columns
port_munic_denspop_cent.head()

Create a basemap

In [None]:
# Create a folium map object, in this case named 'm'

# 'location' defines the initial center of the map using latitude and longitude
m = folium.Map(location=[40, -9],       # [40, -9] centers the map around central Portugal (approx. latitude 40°N, longitude 9°W)
               zoom_start=6)            # 'zoom_start=6' sets the initial zoom level; lower values show a larger area

# Display the map in Jupyter Notebook
# writing `m` in the last line of the cell displays the map
m

Save to `html`

In [None]:
# Save the current folium map object 'm' to an HTML file, in this case called 'my_map.html'

# This allows you to open and interact with the map in any web browser outside of the notebook
m.save('my_map.html')

#notes: this allows you to:
#           Open the map in any browser
#           Share the HTML file with others
#           Embed the map in a webpage or documentation
#           View the map even if you’re not running Python.

#if you want to open the new html file created from here:
import webbrowser
webbrowser.open('my_map.html')

Create an interactive bubble plot map

In [None]:
# Create a new folium map centered on the average coordinates of all municipalities

# This ensures the map auto-centers over your dataset
m = folium.Map(
    location=[port_munic_denspop_cent['Lat'].mean(),  # Average latitude of all centroids
              port_munic_denspop_cent['Long'].mean()], # Average longitude of all centroids
    zoom_start=8  # Reasonable zoom to show the whole country
)

# Define a helper function to scale population values into marker radius sizes
# Using logarithmic scaling to avoid overly large circles due to skewed population distribution
def get_radius(pop):
    return math.log(pop) * 2  # Adjust multiplier to control circle size

# Loop through each row in the GeoDataFrame and create a CircleMarker on the map
port_munic_denspop_cent.apply(
    lambda row: folium.CircleMarker(
        location=[row['Lat'], row['Long']],      # Latitude and longitude of the municipality center
        radius=get_radius(row['2021']),           # Size based on the population in 2021
        popup=row['2021'],                        # Click to see exact population
#        popup = folium.Popup(f"<b>Population:</b> {row['2021']:,}", max_width=200),    # improves popup readability & formatting, making your map look more polished & informative
        tooltip='<h5>Click here for more info</h5>',  # Hover tooltip
        stroke=True,                              # Outline circle
        weight=1,                                 # Outline thickness
        color="#3186cc",                          # Outline color
        fill=True,                                # Fill the circle
        fill_color="#3186cc",                     # Fill color
        opacity=0.9,                              # Overall marker opacity
        fill_opacity=0.3,                         # Fill transparency (makes it more visually pleasing)
    ).add_to(m),                                  # Add the marker to the map object
    axis=1                                        # Apply the lambda function row-wise
)

# Display the map (only works directly in Jupyter or similar environments)
m

Create a cloropleth map

Folium’s built-in Choropleth does not provide a direct parameter to change the classification scheme (like quantiles, natural breaks, etc.) the way geopandas or other libraries do.
Options:
- Pre-classify your data manually. Use mapclassify or pandas to classify your data into bins beforehand, then pass those bins as a categorical column for coloring
- Use branca.colormap manually, but there is no direct argument to choose classification method like 'quantiles' or 'natural breaks' in the Choropleth call. Have to create a colormap with explicit bins and add it to the map.

In [None]:
# Create a folium Map centered around latitude 40, longitude -9 (Portugal), with zoom level 6
m = folium.Map(location=[40, -9], zoom_start=6)

# Add a choropleth layer to the map
folium.Choropleth(
    geo_data=port_munic_denspop,             # GeoDataFrame containing both geometry and population data
    name="choropleth",                       # Name of the layer (used in layer controls)
    
    data=port_munic_denspop,                 # Same dataset (could be a separate df if needed)
    columns=["Nome", "2001"],                # Column 1: Key for matching shapes, Column 2: values to color (e.g., population density in 2001)
    
    key_on="feature.properties.Nome",        # Match 'Nome' in geojson features with 'Nome' column in the DataFrame
    
    fill_color="YlGn",                       # Color scale: Yellow to Green
    fill_opacity=0.7,                        # Transparency of the colored regions
    legend_name="Population density",        # Legend title displayed on the map
).add_to(m)                                  # Add the layer to the map

# Display the map (in Jupyter or JupyterLab)
m

### 3. Interactive geospatial visualization with `plotly`

`Plotly` also provides interactive geospatial visualization functionalities. It is especially usefull for generating a variety of geographical plots that are easy to built, debug and customize.

We will use `plotly` to demonstrate how to generate different classes of geographcial plots with several available datasets from a variety of contexts.

Let's start by a quick interactive map using `plotly express` (https://plotly.com/python/scatter-plots-on-maps/)

Sets the default renderer for plotly figures

In [None]:
# Import the Plotly Input/Output module, which controls rendering options
import plotly.io as pio  

# Set the default renderer to 'vscode', so figures are displayed inline in Visual Studio Code
pio.renderers.default = 'vscode'

#Another option that works on VScode
#pio.renderers.default = 'notebook'  # Set the default renderer to 'notebook', figures are displayed inline in Jupyter Notebook


# If having problems running Plotly visualizations (especially in environments like VS Code or Jupyter),
# you can set the default renderer to open plots in your default web browser.
#This is useful if inline rendering (like in Jupyter notebooks) fails or is not supported in your environment.

#import plotly.io as pio  # Imports the Plotly I/O module for renderer settings and figure I/O

#pio.renderers.default = 'browser'  # Sets the default renderer to the internet 'browser'

start by a quick interactive map using `plotly express`

In [None]:
# Import the Plotly Express module, which is a high-level API for creating visualizations easily
import plotly.express as px  

# Load the built-in Gapminder dataset using Plotly Express
# Filter the data to only include records from the year 2007
df = px.data.gapminder().query("year == 2007")  
# The dataset includes fields like country, continent, year, lifeExp, pop, gdpPercap, and iso_alpha

# Create a geographic scatter plot (globe-style) where each country's marker size reflects its population
fig = px.scatter_geo(
    df,                      # The filtered DataFrame for the year 2007
    locations="iso_alpha",   # Specifies the column that contains ISO 3166-1 alpha-3 country codes for plotting on the map
    size="pop"               # Use the "pop" column to determine the size of the scatter markers (population of each country)
)

# Render the interactive plot in the current environment using the default renderer
fig.show()  

#### 3.1 Create cloropleth maps (world renewable production and comsuption)

In [None]:
import pandas as pd
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt

Import the renewable energy production dataset

In [None]:
renewable_energy_prod_url = "https://raw.githubusercontent.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/master/datasets/share-of-electricity-production-from-renewable-sources.csv"
renewable_energy_prod_df = pd.read_csv(renewable_energy_prod_url)
renewable_energy_prod_df.head()

# Define the URL for the CSV dataset.
renewable_energy_prod_url = "https://raw.githubusercontent.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/master/datasets/share-of-electricity-production-from-renewable-sources.csv"
# This dataset contains information about the share of electricity production from renewable sources by country and year.

# Use pandas to read the CSV file directly from the URL into a DataFrame.
renewable_energy_prod_df = pd.read_csv(renewable_energy_prod_url)
# This line downloads and parses the CSV contents into tabular format.

# Helps you verify what columns are available before attempting to filter or plot
print(renewable_energy_prod_df.columns)

# Display the first 5 rows of the DataFrame to quickly inspect the structure and confirm successful loading.
renewable_energy_prod_df.head()

Sort the production DataFrame based on the feature 'Year'

In [None]:
# This line sorts the entire DataFrame by the 'Year' column in ascending order (default behavior).
renewable_energy_prod_df.sort_values(by=['Year'], inplace=True)
# `inplace=True` means the sorting will modify the DataFrame directly instead of returning a sorted copy.
# This is useful when you want the data in chronological order for plotting time series or animations.

# Displays the first 5 rows of the sorted DataFrame to verify the sorting operation.
renewable_energy_prod_df.head()

Generate a choropleth map using the plotly express module animated based on 'Year' (2008 to 2016)

In [None]:
# Filter the DataFrame for years strictly between 2007 and 2017 (i.e., 2008 to 2016)
renewable_energy_prod = renewable_energy_prod_df.query('Year < 2017 and Year > 2007')

# Create a choropleth map using Plotly Express
fig = px.choropleth(
    renewable_energy_prod,  # Use the original DataFrame (not the filtered one) for the choropleth
    locations="Code",          # Column with country ISO codes for geographic locations
    color="Renewable electricity (% electricity production)",  # Column to determine color intensity
    hover_name="Country",      # Column for names shown when hovering over a country
    animation_frame="Year",    # Create an animation with one frame per year
    color_continuous_scale='Greens'  # Color scale used for the intensity (shades of green)
)

# Display the figure
fig.show()

Update layout to include suitable title text and projection style

In [None]:
# Update the layout of the existing Plotly figure 'fig'
fig.update_layout(
    
    # Set the title of the plot
    title_text='Renewable energy production across the world (% of electricity production)',

    # Configure geographic settings for the map
    geo=dict(
        projection={'type': 'natural earth'}  # Use 'natural earth' projection instead of default 'equirectangular'
    )
)

# Display the updated figure
fig.show()

Now let's import the renewable energy consumption dataset

In [None]:
# URL pointing to the CSV file containing renewable energy consumption data by country
renewable_energy_cons_url = "https://raw.githubusercontent.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/master/datasets/renewable-energy-consumption-by-country.csv"

# Read the CSV data from the URL into a pandas DataFrame
renewable_energy_cons_df = pd.read_csv(renewable_energy_cons_url)

# Display the first few rows of the DataFrame to verify successful loading
renewable_energy_cons_df.head()

Convert the DataFrame to desired format

In [None]:
# Reshape the DataFrame from wide to long format using pd.melt
renewable_energy_cons_df = pd.melt(
    renewable_energy_cons_df,                  # Input DataFrame to reshape
    id_vars=['Country', 'Code', 'Year'],      # Columns to keep as identifiers (not unpivoted)
    var_name="Energy Source",                  # Name for the new column that holds former column headers
    value_name="Consumption (terrawatt-hours)"# Name for the new column that holds values from melted columns
)

# Show the first few rows of the reshaped DataFrame to verify the transformation
renewable_energy_cons_df.head()

Sort the consumption DataFrame based on the Year

In [None]:
# Sort the DataFrame based on the 'Year' column in ascending order
renewable_energy_cons_df.sort_values(by=['Year'], inplace=True)

# Display the first few rows of the sorted DataFrame to verify the sorting
renewable_energy_cons_df.head()

Generate a choropleth map for renewable energy consumption animated based on 'Year'

In [None]:
# Filter the DataFrame to only include rows where 'Energy Source' is 'Total' 
renewable_energy_total_cons = renewable_energy_cons_df[
    renewable_energy_cons_df['Energy Source'] == 'Total'
].query('Year < 2017 and Year > 2007') # consider 'Year' just between 2007 and 2017 (exclusive)

# Create a choropleth map:
fig = px.choropleth(
    renewable_energy_total_cons,
    locations="Code",                # - locations are identified by 'Code' (country codes)
    color="Consumption (terrawatt-hours)", # - color of each country represents energy consumption in terawatt-hours
    hover_name="Country",            # - hover shows the country name
    animation_frame="Year",          # - animation frames represent different years
    color_continuous_scale='Blues'   # - color scale set to a blue palette
)

# Display the interactive choropleth plot
fig.show()

Update layout to include suitable title text and projection style

In [None]:
# Update the layout of the Plotly figure
fig.update_layout(
    title_text='Renewable energy consumption across the world (terrawatt-hours)',  # add a title text for the plot
    geo=dict(projection={'type': 'natural earth'})  # set projection style for the plot; default is 'equirectangular'
)

# Display the updated interactive plot
fig.show()

#### 3.2 Add animation into a cloropleth map

The next example uses the worldwide use of the internet

In [None]:
# Define the URL to the CSV file containing internet usage data
internet_usage_url = "https://raw.githubusercontent.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/master/datasets/share-of-individuals-using-the-internet.csv"

# Read the CSV data from the URL into a pandas DataFrame
internet_usage_df = pd.read_csv(internet_usage_url)

# Display the first few rows of the DataFrame to check the data loaded correctly
internet_usage_df.head()

Plot a choroplet map using the subset of dat for 2016

In [None]:
# Subset the DataFrame to only include data for the year 2016
internet_usage_2016 = internet_usage_df.query("Year == 2016")

# Create a choropleth map using the 2016 internet usage data
fig = px.choropleth(
    internet_usage_2016,
    locations="Code",  # Use country codes for geographic locations
    color="Individuals using the Internet (% of population)",  # Color-code by percentage of individuals using the internet
    hover_name="Country",  # Show country name on hover
    color_continuous_scale=px.colors.sequential.Plasma  # Use the 'Plasma' color scale
)

# Update the layout to add a title to the plot
fig.update_layout(
    title_text='Internet usage across the world (% population) - 2016'  # Title describing the map content
)

# Display the interactive choropleth map
fig.show()

Create a choropleth map focused only on Europe

In [None]:
# Create a choropleth map focused on 2016 internet usage data
fig = px.choropleth(
    internet_usage_2016,
    locations="Code",  # Use country codes to identify locations on the map
    color="Individuals using the Internet (% of population)",  # Color countries by internet usage percentage
    hover_name="Country",  # Show country name when hovering over areas
    color_continuous_scale=px.colors.sequential.Plasma  # Use the 'Plasma' color scale for the values
)

# Update the layout of the figure
fig.update_layout(
    title_text='Internet usage across the European Continent (% population) - 2016',  # Add a descriptive title
    geo_scope='europe'  # Limit the geographic scope to Europe (alternatives: north america, south america, africa, asia, usa)
)

# Display the interactive choropleth map
fig.show()

Update layout to include suitable projection style

In [None]:
# Create a choropleth map for internet usage in 2016:
fig = px.choropleth(
    internet_usage_2016,  # DataFrame filtered to the year 2016
    locations="Code",  # Use the country codes for geographic locations
    color="Individuals using the Internet (% of population)",  # Color countries by internet usage percentage
    hover_name="Country",  # Show country name when hovering over each country
    color_continuous_scale=px.colors.sequential.Plasma  # Use the Plasma sequential color scale for coloring
)

# Update layout settings for the figure:
fig.update_layout(
        title_text='Internet usage across Europe (% population) - 2016', # Add a title to the plot
        
    # Set the map projection style and limit the scope to Europe only
    geo=dict(
        scope='europe',               # Focus map on Europe
        projection={'type': 'natural earth'}  # Use the natural earth projection
    )
)

# Display the interactive plot
fig.show()

Update layout to include animation using the years data

In [None]:
# Create an animated choropleth map to visualize internet usage over time
fig = px.choropleth(
    internet_usage_df,  # full dataset with multiple years of internet usage
    locations="Code",   # - uses ISO 3-letter country codes to map data to countries
    color="Individuals using the Internet (% of population)",  # - the variable used for color intensity
    hover_name="Country",  # - displays the country name when hovering
    animation_frame="Year",  # - creates an animated map with frames for each year
    color_continuous_scale=px.colors.sequential.Plasma  # - uses the Plasma color scale for visual effect
)

# Update the map layout to include projection style and a title
fig.update_layout(
    title_text='Internet usage across the world (% population)',  # - set the chart's title
    geo=dict(projection={'type':'natural earth'})  # - apply the "natural earth" projection to the map
)

# Display the interactive choropleth map
fig.show()

# THE YEARS ARE NOT SORTED IN AN ASCENDING ORDER!!!!!

The years are not ordered.

Update the layout after ordering based on the year feature

In [None]:
# Sort the dataset by 'Year' to ensure proper chronological order for animation
internet_usage_df.sort_values(by=["Year"], inplace=True)

# Display the first few rows of the sorted DataFrame (optional, for inspection)
internet_usage_df.head()

# Create an animated choropleth map using the sorted internet usage dataset
fig = px.choropleth(
    internet_usage_df,  # - full dataset with internet usage data over multiple years
    locations="Code",  # - ISO 3-letter country codes to identify countries on the map
    color="Individuals using the Internet (% of population)",  # - used to determine color intensity
    hover_name="Country",  # - display country name on hover
    animation_frame="Year",  # - animate the map based on the 'Year' column
    color_continuous_scale=px.colors.sequential.Plasma  # - apply the Plasma color scale
)

# Customize the map layout
fig.update_layout(
    title_text='Internet usage across the world (% population)',  # - set the title of the plot
    geo=dict(
#        scope='europe',
        projection={'type': 'natural earth'})  # - set map projection to "natural earth"

)

# Display the animated interactive choropleth map
fig.show()

#### 3.3 Create bubble plots in a map

In [None]:
# Define the URL to the dataset containing the number of internet users by country over time
internet_users_url = "https://raw.githubusercontent.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/master/datasets/number-of-internet-users-by-country.csv"

# Read the CSV data directly from the URL into a pandas DataFrame
internet_users_df = pd.read_csv(internet_users_url)

# Display the first 5 rows of the DataFrame to inspect its structure and contents
internet_users_df.head()

Create a scatter geo-plot for the year 2016

In [None]:
# Sort the DataFrame by the 'Year' column to ensure chronological order
internet_users_df.sort_values(by=['Year'], inplace=True)

# Display the first 5 rows of the sorted DataFrame to verify the sort
internet_users_df.head()

# Create a scatter geo-plot for the year 2016 using a subset of the data
fig = px.scatter_geo(
    internet_users_df.query("Year==2016"),  # Filter data for the year 2016
    locations="Code",                       # ISO country codes used to place countries on the map
    size="Number of internet users (users)",# Column determining the size of each country's marker
    hover_name="Country",                   # Column to display as hover information
    size_max=80,                            # Maximum size for the largest bubble
    color_continuous_scale=px.colors.sequential.Plasma  # Color scale used for visual appeal
)

# Update the layout of the plot
fig.update_layout(
    # Add a descriptive title to the plot
    title_text='Internet users across the world - 2016',
    # Set the map projection style to 'natural earth' for a more realistic look
    geo=dict(projection={'type': 'natural earth'})  # Default is 'equirectangular', this is more aesthetic
)

# Display the interactive scatter geo-plot
fig.show()

Update to show an animation of the buble plot across the years

In [None]:
# Create an animated scatter geo-plot using Plotly Express
fig = px.scatter_geo(
    internet_users_df,                         # DataFrame containing internet user data
    locations="Code",                          # ISO 3-letter country codes used to position bubbles on the map
    size="Number of internet users (users)",   # Column used to size each bubble by number of internet users
    hover_name="Country",                      # Column to display when hovering over a bubble
    size_max=80,                               # Maximum size for the largest bubble
    animation_frame="Year"                     # Column used to create animation frames over time
)

# Update the layout of the plot
fig.update_layout(    
    title_text='Internet users across the world', # Add a descriptive title to the plot
    # Set the map projection style to 'natural earth' for a more realistic geographic visualization
    geo=dict(projection={'type': 'natural earth'})  # 'natural earth' gives a smooth, globe-like projection
)

# Display the interactive animated scatter geo-plot
fig.show()

#### 3.4 Create a line flow map

In the next example we will show how to plot lines in a map with `plotly` using flight connections in the USA.

Import airport locations

In [None]:
# Define the URL where the dataset (CSV) containing U.S. airports is hosted
us_airports_url = "https://raw.githubusercontent.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/master/datasets/airports.csv"

# Load the CSV data from the provided URL into a pandas DataFrame
us_airports_df = pd.read_csv(us_airports_url)

# Display the first five rows of the DataFrame to inspect the structure and sample data
us_airports_df.head()

Scatter plot on a map

In [None]:
# Import the low-level graphing interface from Plotly, which provides more control than plotly.express
import plotly.graph_objects as go

# Create a new figure for plotting
fig = go.Figure()

# Add a scatter geo trace to the figure for visualizing airport locations
fig.add_trace(go.Scattergeo(
    locationmode='USA-states',        # - interpret location data using U.S. state abbreviations
    lon=us_airports_df['LONGITUDE'],  # - set the longitude values for each airport
    lat=us_airports_df['LATITUDE'],   # - set the latitude values for each airport
    hoverinfo='text',                 # - enable custom text to be shown when hovering
    text=us_airports_df['AIRPORT'],   # - airport names shown on hover
    mode='markers',                   # - render each location as a marker point
    marker=dict(size=5, color='black')# - define marker size and color
))

# Update the layout of the figure
fig.update_layout(
    title_text='Airports in USA',      # - title of the plot
    showlegend=False,                  # - legend is not required as all markers are the same
    geo=go.layout.Geo(
        scope='usa'                    # - restrict map view to the USA region
    ),
)

# Display the interactive plot
fig.show()

Import the Flight records

In [None]:
# Define the URL of the dataset containing delayed flights on New Year's Day in 2015
new_year_2015_flights_url = "https://raw.githubusercontent.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/master/datasets/new_year_day_2015_delayed_flights.csv"

# Read the CSV file from the URL into a pandas DataFrame
new_year_2015_flights_df = pd.read_csv(new_year_2015_flights_url)

# Display the first 5 rows of the DataFrame to preview the data
new_year_2015_flights_df.head()

Flight origin dataset

Along with the source and destination airports for each flight, we need to have the longitude and latitude information of the corresponding airports. To do this, we need to merge the DataFrames containing airport and flight data. Let’s first merge to obtain longitude and latitudes for the origin airports of all flights. Merge the DataFrames on origin airport codes

In [None]:
# Merge the flights DataFrame with the airports DataFrame to get latitude and longitude for origin airports
new_year_2015_flights_df = new_year_2015_flights_df.merge(
    us_airports_df[['IATA_CODE', 'LATITUDE', 'LONGITUDE']],  # select only relevant columns from airport DataFrame
    left_on='ORIGIN_AIRPORT',  # match flights' origin airport code...
    right_on='IATA_CODE',      # ...with IATA code in airport DataFrame
    how='inner'                # use inner join to keep only matched records
)

# Drop the now-duplicate 'IATA_CODE' column since 'ORIGIN_AIRPORT' already contains the code
new_year_2015_flights_df.drop(columns=['IATA_CODE'], inplace=True)

# Rename latitude and longitude columns to indicate they belong to the origin airport
new_year_2015_flights_df.rename(
    columns={
        "LATITUDE": "ORIGIN_AIRPORT_LATITUDE",
        "LONGITUDE": "ORIGIN_AIRPORT_LONGITUDE"
    },
    inplace=True
)

# Display the first 5 rows of the updated DataFrame to inspect the changes
new_year_2015_flights_df.head()

Destination dataset

In [None]:
# Now, we will perform a similar merging to get the latitude and longitude data 
# for the destination airports of all flights.

# Merge the flights DataFrame with the airports DataFrame to get lat/lon for destination airports
new_year_2015_flights_df = new_year_2015_flights_df.merge(
    us_airports_df[['IATA_CODE', 'LATITUDE', 'LONGITUDE']],  # select only necessary columns from airport data
    left_on='DESTINATION_AIRPORT',  # match flights' destination airport code...
    right_on='IATA_CODE',           # ...with IATA code in airport DataFrame
    how='inner'                     # use inner join to retain only matching records
)

# Drop the now-redundant 'IATA_CODE' column (already represented by 'DESTINATION_AIRPORT')
new_year_2015_flights_df.drop(columns=['IATA_CODE'], inplace=True)

# Rename the latitude and longitude columns to indicate they belong to the destination airport
new_year_2015_flights_df.rename(
    columns={
        'LATITUDE': 'DESTINATION_AIRPORT_LATITUDE', 
        'LONGITUDE': 'DESTINATION_AIRPORT_LONGITUDE'
    },
    inplace=True
)

# Display the first 5 rows of the updated DataFrame to confirm successful merge and renaming
new_year_2015_flights_df.head()

Create line flow map

In [None]:
# Now, we will draw line plots to visualize each flight route.
# For each flight, we draw a line between origin and destination airports by passing their longitude and latitude.
# We use Scattergeo with mode='lines' (instead of markers) to draw these flight paths.
# This loop may take a few minutes depending on the number of flights because it adds one trace per flight.

for i in range(len(new_year_2015_flights_df)):
    fig.add_trace(
        go.Scattergeo(
            locationmode='USA-states',  # use USA states as location mode for geographical plotting
            lon=[  # list of longitudes for line start (origin) and end (destination)
                new_year_2015_flights_df['ORIGIN_AIRPORT_LONGITUDE'][i], 
                new_year_2015_flights_df['DESTINATION_AIRPORT_LONGITUDE'][i]
            ],
            lat=[  # list of latitudes for line start (origin) and end (destination)
                new_year_2015_flights_df['ORIGIN_AIRPORT_LATITUDE'][i], 
                new_year_2015_flights_df['DESTINATION_AIRPORT_LATITUDE'][i]
            ],
            mode='lines',  # draw lines connecting origin and destination points
            line=dict(width=1, color='red')  # set line width and color (red)
        )
    )

# Update layout to set the title, disable legend, and set map scope to USA
fig.update_layout(
    title_text='Flight routes',  # title of the plot
    showlegend=False,            # hide legend since it is not needed here
    geo=go.layout.Geo(
        scope='usa'              # limit map display to the USA region
    ),
)

# Display the final plot with all flight route lines added
fig.show()

## References

Folium. https://python-visualization.github.io/folium/

Geospatial Data in Python - Interactive Visualization. https://www.codementor.io/@abdelfettahbesbes/geospatial-data-in-python-interactive-visualization-1oti7dtr2v

Interactive Data Visualization with Python. https://github.com/TrainingByPackt/Interactive-Data-Visualization-with-Python 

Introduction to GeoPandas https://geopandas.org/en/stable/getting_started/introduction.html