## Data Visualization
### Visualizing Geographic and Spatial Data
- Xianli Zeng
- School of Economy, XMU

### Today's Tasks

- <font color="red"> ***Projection***

- Map Drawing

- Choropleth Maps

- Cartogram
    
- Interactive and Animation

### Projection
#### Location on Earth:
- <font color="red">***Longitude***:</font> The position along the equator, ranging from 180° West to 180° East, denoted as [-180°, 180°].

- <font color="red">***Latitude***:</font> The distance perpendicular to the equator toward the poles, ranging from 90° South to 90° North, denoted as [-180°,180°]
- <font color="red">***Altitude***:</font> The distance from the center of the Earth; rarely used in visualizations.

#### Projection refers to flattening the spherical surface of the Earth onto a 2D plane for visualization purposes.

This process inevitably introduces distortion and errors.
- <font color="red"> *Conformal Projection*:</font> Preserves angles.
- <font color="red"> *Equal-area Projection*:</font>  Preserves areas.
- <font color="red"> *Equidistant Projection*:</font>    Preserves distances from a reference point or line.

### Types of projections:
- Mercator
- Albers Equal-Area: 
- Azimuthal Equidistant

#### <font color="red"> Mercator Projection:</font>  A conformal projection that projects the Earth onto a cylinder, which is then unwrapped into a flat map.
- Accurately preserves shapes, especially near the equator
- However, it distorts areas near the poles, making them appear much larger
- Most commonly used in online maps such as Google Maps and Baidu Maps

<div>
<img src="./Mercator.png" width="250"/>
</div>

#### <font color="red"> Albers Projection:</font>  A standard conic projection with equal-area property, typically used in its regular (non-oblique) form.
- Widely used for maps emphasizing the area of countries or regions
- Particularly suitable for mid- to low-latitude regions with large east-west extent

<div>
<img src="./Albers.png" width="250"/>
</div>

#### <font color="red"> Azimuthal Projection: </font> An equidistant projection
- The distance from any point on the Earth along a meridian (longitude line) to the center point of the projection is preserved
- Commonly used in earthquake impact maps, where the epicenter is set as the center of the projection to accurately show the range of affected areas

<div>
<img src="./Azimuthal.png" width="250"/>
</div>

### Today's Tasks

- Projection

- <font color="red"> ***Map Drawing***

- Choropleth Maps

- Cartogram
    
- Interactive and Animation

Tools for Map Visualization:

- Geopandas + Matplotlob (Static)
- Plotly (Animation)

### Geopandas

- GeoPandas is an open-source Python library designed to make working with geospatial data easy.

- It extends the capabilities of pandas to handle geometry operations.

- Built on top of:

    - pandas for data manipulation

    - shapely for geometry operations

    - fiona for file input/output

    - pyproj for coordinate transformations



### Why Use GeoPandas?
- Simplifies working with spatial data (points, lines, polygons)

- Directly reads common GIS formats (e.g., Shapefile, GeoJSON)

- Integrates easily with matplotlib for plotting

- Supports spatial joins, projections, buffering, intersection, and more

- Bridges the gap between data science and GIS

### GeoPandas Data Structure
- GeoDataFrame:

    - An extension of pandas DataFrame with a special "geometry" column.

    - Each row represents a spatial feature (e.g., a point, line, or polygon).

    - Other columns store feature attributes.

- Geometry types:

    - Point

    - LineString

    - Polygon

    - MultiPoint, MultiLineString, MultiPolygon

### Basic Workflow in GeoPandas
- Load geospatial data

- Inspect and manipulate data

- Plot the data

- Perform spatial analysis

- Export to GIS file formats


How Geometry data works:


In [None]:

from shapely.geometry import Polygon

poly = Polygon([(0, 0), (1, 1), (1, 0)])  # 创建一个三角形
print(poly)
print(type(poly))

In [None]:
from shapely.geometry import Polygon
import matplotlib.pyplot as plt

# 创建Polygon
poly = Polygon([(0, 0), (1, 1), (1, 0)])

# 创建画布
fig, ax = plt.subplots()

# 取出polygon的x、y坐标
x, y = poly.exterior.xy  # 注意：exterior表示边界

# 画出polygon
ax.plot(x, y)            # 画轮廓线
ax.fill(x, y, alpha=0.5) # 填充颜色，透明度0.5

# 设置标题
ax.set_title('Triangle Polygon')

# 保持坐标比例
ax.set_aspect('equal')

plt.show()

### Data Source:
- World map: https://www.naturalearthdata.com/downloads/110m-cultural-vectors/
- China: https://datav.aliyun.com/portal/school/atlas/area_selector

### World map

In [None]:
import geopandas as gpd

# 加载世界地图数据（相当于 R 的 map_data("world")）
world = gpd.read_file("./ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")

world.plot(edgecolor="white", figsize=(12, 6))

### China Map

In [None]:
import geopandas as gpd

China_map = gpd.read_file("China.json")

China_map.plot(edgecolor="black", facecolor="lightblue")

### Today's Tasks

- Projection

- Map Drawing

- <font color="red"> ***Choropleth Maps***

- Cartogram
    
- Interactive and Animation

### Choropleth Maps

- Represent data values as spatial regions colored differently.

- Also known as shaded maps.

***Notes***: Continuous vs. Discrete Color Scales:

- Continuous color scales allow for finer gradations, but may be harder to interpret.

- Discrete color scales group data into categories and assign different colors to match specific value ranges.



***Notes***: Beware of Asymmetry Between Data Distribution and Geographic Area Size:

- Data often clusters in densely populated areas.

- Sparsely populated regions may occupy disproportionately large areas on the map display.

In [None]:
import requests
import geopandas as gpd
import matplotlib.pyplot as plt

# Step 1: 下载 GeoJSON
url = "https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json"
response = requests.get(url)
geojson_data = response.json()

# Step 2: 取出 Texas 的部分（FIPS 以 '48' 开头）
tx_features = [f for f in geojson_data["features"] if f["id"].startswith("48")]
tx_geojson = {"type": "FeatureCollection", "features": tx_features}

# Step 3: 将 id 加入到 properties 中
for feature in tx_geojson["features"]:
    feature["properties"]["fips"] = feature["id"]

# Step 4: 转为 GeoDataFrame
tx_gdf = gpd.GeoDataFrame.from_features(tx_geojson)
tx_gdf.to_file("texas_counties.geojson", driver="GeoJSON")



In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt

# 加载 Texas Counties 地图
tx = gpd.read_file("texas_counties.geojson")
# print(tx)
# 显示地图
tx.plot(edgecolor='white', figsize=(10, 8))
plt.title("Texas Counties Map")
plt.show()

In [None]:
tx_loaded = pd.read_csv("texus.csv").dropna()
merged_data = tx.merge(tx_loaded, left_on="NAME", right_on="Name", how="left").drop('Name',axis=1)
merged_data

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl

fig, ax = plt.subplots(1, 1, figsize=(10, 8))

# 画图，不加 legend
merged_data.plot(
    column="Hispanic",
    cmap="OrRd",
    linewidth=0.3,
    edgecolor="black",
    ax=ax
)

# 添加 colorbar
sm = plt.cm.ScalarMappable(
    cmap="OrRd",
    norm=plt.Normalize(vmin=merged_data["Hispanic"].min(), vmax=merged_data["Hispanic"].max())
)

cbar = fig.colorbar(sm, ax=ax, shrink=0.5)
cbar.set_label("Hispanic Population Percentage")

# 其他设置
ax.set_title("Texas Counties: Hispanic Population Percentage", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl

fig, ax = plt.subplots(1, 1, figsize=(10, 8))

# 绘图：白色边界线，深蓝色渐变色
merged_data.plot(
    column="Hispanic",
    cmap=mpl.colors.LinearSegmentedColormap.from_list("custom_blue", ["#fafafa", "#191970"]),
    edgecolor="white",
    linewidth=0.4,
    ax=ax
)

# 添加 colorbar，设置 breaks 和 label
sm = plt.cm.ScalarMappable(
    cmap=mpl.colors.LinearSegmentedColormap.from_list("custom_blue", ["#fafafa", "#191970"]),
    norm=plt.Normalize(vmin=0, vmax=0.75)
)
sm._A = []
cbar = fig.colorbar(sm, ax=ax, ticks=[0, 0.25, 0.5, 0.75], shrink=0.6)
cbar.ax.set_yticklabels(["0%", "25%", "50%", "75%"])
cbar.set_label("")  # 等价于 name = NULL

# 图形细节
ax.set_title("Texas Counties: Hispanic Population Percentage", fontsize=14)
ax.axis("off")
plt.tight_layout()
plt.show()

### Group data values and represent them with different colors: typically 4–6 groups.



In [None]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

# 假设 merged_gdf 已经包含 Hispanic 百分比数据

# 1. 按照百分比分 bin
bins = [0, 0.2, 0.4, 0.6, 0.8, 1.0]
labels = ["0-20%", "20-40%", "40-60%", "60-80%", "80-100%"]
merged_data["Hispanic"] = pd.to_numeric(merged_data["Hispanic"], errors="coerce")

merged_data["perc_bin"] = pd.cut(merged_data["Hispanic"] / 100, bins=bins, labels=labels, include_lowest=True)

# 2. 定义对应颜色（与 R 中保持一致）
color_list = ["#EEEEFB", "#BDBDEF", "#8C8CE3", "#4A4AD3", "#191970"]
cmap = ListedColormap(color_list)

# 3. 画图
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
merged_data.plot(
    column="perc_bin",
    cmap=cmap,
    edgecolor="white",
    linewidth=0.5,
    legend=True,
    ax=ax,
    categorical=True
)

# 4. 图例位置和样式
leg = ax.get_legend()
leg.set_bbox_to_anchor((0.17, 0.8))  # 图例位置等价于 R 的 c(0.17, 0.8)
leg.set_title("")  # 不显示图例标题

# 5. 标题与去除坐标轴（模拟 theme_void）
ax.set_title("Percentage of Hispanic people by county in Texas", fontsize=14)
ax.annotate("2019 population estimate", xy=(0.5, 0.92), xycoords='figure fraction', ha='center', fontsize=10)
ax.axis("off")

plt.tight_layout()
plt.show()

### China GDP data

In [None]:
import pandas as pd
China_gdp =  pd.read_csv("gdp_china.csv",encoding='utf-8',sep = '\t')

data_China = China_map.merge(China_gdp,how = 'outer',left_on = 'name', right_on = 'Name').drop('Name',axis=1)
data_China

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 10))
data_China.plot(
    column='2023',          # 按哪一年的GDP上色
    cmap='OrRd',             # 颜色方案（Oranges-Red）
    legend=True,             # 加上色标图例
    edgecolor='black',       # 省份边界线颜色
    linewidth=0.8,           # 省界线宽度
    ax=ax                    # 指定画到哪个子图
)

# 设置标题
ax.set_title('China Provinces GDP in 2023', fontsize=20)

# 去掉x/y坐标轴
# ax.axis('off')

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 10))
data_China.plot(
    column='2023',          # 按哪一年的GDP上色
    cmap='OrRd',             # 颜色方案（Oranges-Red）
    legend=True,             # 加上色标图例
    edgecolor='black',       # 省份边界线颜色
    linewidth=0.8,           # 省界线宽度
    ax=ax                    # 指定画到哪个子图
)
for idx, row in data_China.iterrows():
    if row['geometry'].geom_type in ['Polygon', 'MultiPolygon']:
        x, y = row['geometry'].centroid.coords[0]
        gdp_val = row['2023']
        if pd.notnull(gdp_val):
            ax.text(
                x, y,
                f'{gdp_val:.0f}',  # 保留整数
                fontsize=6,
                ha='center',
                va='center',
                color='blue'
            )

# 设置标题
ax.set_title('China Provinces GDP in 2023', fontsize=20)

plt.show()

### Today's Tasks

- Projection

- Map Drawing

- Choropleth Maps

- <font color="red"> ***Cartogram***
    
- Interactive and Animation

### Cartograms

- Geographical shapes are distorted so that their size is proportional to a selected variable.

- Sometimes difficult to interpret.



### Cartogram Heatmaps

- Represent each country/state/province/county as a colored block.

- All geographic units are treated equally, avoiding visual bias caused by differences in shape or size.

In [None]:
import os
os.environ["R_HOME"] = "C:/Program Files/R/R-4.4.3"

from rpy2.robjects import r, pandas2ri
from rpy2.robjects.packages import importr
from rpy2.robjects.conversion import localconverter

# 开启 pandas 和 R data.frame 自动转换
pandas2ri.activate()

df = pd.read_csv("us_rent_income.csv")

# 3. 把州名字换成缩写（需要映射表）
state_abbrev = {
    'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR',
    'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE',
    'District of Columbia': 'DC', 'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI',
    'Idaho': 'ID', 'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA',
    'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME',
    'Maryland': 'MD', 'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN',
    'Mississippi': 'MS', 'Missouri': 'MO', 'Montana': 'MT', 'Nebraska': 'NE',
    'Nevada': 'NV', 'New Hampshire': 'NH', 'New Jersey': 'NJ', 'New Mexico': 'NM',
    'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH',
    'Oklahoma': 'OK', 'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI',
    'South Carolina': 'SC', 'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX',
    'Utah': 'UT', 'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA',
    'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY',
    'Puerto Rico': 'PR'
}
rent_df = df[df['variable'] == 'rent'].copy()
rent_df['state_abbrev'] = rent_df['NAME'].map(state_abbrev)

income_df = df[df['variable'] == 'income'].copy()
income_df['state_abbrev'] = income_df['NAME'].map(state_abbrev)
# 1. 准备Python里的pandas DataFrame
income_df

In [None]:
pandas2ri.activate()
ggplot2 = importr("ggplot2")
statebins = importr("statebins")

with localconverter(pandas2ri.converter):
    r_rent_df = pandas2ri.py2rpy(rent_df)
# print(r_rent_df)
# r_rent_df
# 3. 写R脚本
r( """
library(ggplot2)
library(statebins)

draw_statbins <- function(df, filename) {

p <- ggplot(df, aes(state = state_abbrev, fill = estimate)) +
  geom_statebins() +
  scale_fill_viridis_c() +
  labs(title = "Average Rent per State", fill = "Rent Estimate") +
  theme_statebins()
  ggsave(filename, p, width = 8, height = 5, dpi = 500)
}
""")

# 4. 执行R脚本
r["draw_statbins"](r_rent_df, "rent.png")


In [None]:
# 用 Python 显示图像
from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("rent.png")
plt.imshow(img)
plt.axis('off')
plt.show()

In [None]:
pandas2ri.activate()
ggplot2 = importr("ggplot2")
statebins = importr("statebins")

with localconverter(pandas2ri.converter):
    r_income_df = pandas2ri.py2rpy(income_df)
print(income_df)

r( """
library(ggplot2)
library(statebins)

draw_statbins <- function(df, filename) {

p <- ggplot(df, aes(state = state_abbrev, fill = estimate)) +
  geom_statebins() +
  scale_fill_viridis_c() +
  labs(title = "Average Income per State", fill = "Income Estimate") +
  theme_statebins()
  ggsave(filename, p, width = 8, height = 5, dpi = 300)
}
""")

# 4. 执行R脚本
r["draw_statbins"](r_income_df, "income.png")
from PIL import Image
import matplotlib.pyplot as plt

img = Image.open("income.png")
plt.imshow(img)
plt.axis('off')
plt.show()

### Today's Tasks

- Projection

- Map Drawing

- Choropleth Maps

- Cartogram
    
- <font color="red"> ***Interactive and Animation***

In [None]:
import plotly.express as px

# 示例数据：国家名称与人均GDP
df = px.data.gapminder().query("year == 2007")

fig = px.choropleth(df, 
                    locations="iso_alpha", 
                    color="gdpPercap",
                    hover_name="country",
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title="World GDP per Capita (2007)")
fig.show()

In [None]:
import plotly.express as px

df = px.data.gapminder()

fig = px.choropleth(
    df,
    locations="iso_alpha",
    color="gdpPercap",
    hover_name="country",
    animation_frame="year",       # 🎬 年份动画滑块
    color_continuous_scale="Viridis",
    title="🌍 GDP per Capita Over Time (1952–2007)"
)

fig.update_layout(geo=dict(bgcolor='rgba(0,0,0,0)'))
fig.show()

In [None]:

from matplotlib.animation import FuncAnimation
import matplotlib.cm as cm
import matplotlib.colors as colors
years = [col for col in China_gdp.columns if col != 'Name'][::-1]
fig, ax = plt.subplots(figsize=(12, 10))

# 统一色阶，不然每年颜色尺度跳动
vmin = data_China[years].min().min()
vmax = data_China[years].max().max()

# 创建固定的ScalarMappable对象（绑定色表和统一范围）
cmap = cm.get_cmap('OrRd')
norm = colors.Normalize(vmin=vmin, vmax=vmax)
sm = cm.ScalarMappable(cmap=cmap, norm=norm)

# 只添加一次 colorbar
cbar = fig.colorbar(sm, ax=ax, shrink=0.5)

# 创建初始图
poly = data_China.plot(
    column=years[0],
    cmap='OrRd',
    edgecolor='black',
    linewidth=0.5,
    ax=ax,
    legend=False
)

def update(i):
    ax.clear()
    year = years[i]
    data_China.plot(
        column=year,
        cmap='OrRd',
        edgecolor='black',
        linewidth=0.5,
        ax=ax,
        legend=False,
        vmin=vmin,
        vmax=vmax
    )
    ax.set_title(f'China Provinces GDP in {year}', fontsize=18)
    ax.axis('off')

anim = FuncAnimation(fig, update, frames=len(years), interval=1000, repeat=True)

anim.save('china_gdp_animation.gif', writer='pillow')
