In this project, I'd like to practice on how to create a 3D spike map. This is something I've seen going around and thought it was cool. As starter, this project uses population data in New York.

Step 1: Importing Libraries and Reading Data

I would like to get an understanding on how the data is presented, so I can have a better sense on how to clean and process them. Thus the first stage is reading the statistical and spatial data.

In [None]:
import pandas as pd
import geopandas as gpd
import pydeck as pdk

In [None]:
#Reading the Population Data
df=pd.read_csv(r"C:\Users\Shell\Desktop\Self Upgrade\Python\Portfolio Building\3D Map\DECENNIALDHC2020.P1_2025-11-16T063522\DECENNIALDHC2020.P1-Data.csv")
df.head(10)

In [None]:
df.shape

In [None]:
df.info()

In [None]:
#Reading the shape file
gdf = gpd.read_file(r"C:\Users\Shell\Desktop\Self Upgrade\Python\Portfolio Building\3D Map\tl_2020_36_tract\tl_2020_36_tract.shp")
gdf.head(10)

In [None]:
gdf.shape

In [None]:
gdf.info()

To create the map, the statistical and spatial data must be combined, and this requires them to share at least one common attribute that can be used for merging. I spent some time searching for files that could be matched and combined, eventually leading me to the two data files above. 

The ‘Census Tract’ or 'GEO ID' attribute present in both datasets can be used as the common identifier. However, we need to clean the statistical data first, as this attribute does not appear as a standalone column.

Step 2: Data Cleaning

Here is what I'm going to do for the statistical data:
1. Drop the Unnamed:3 column and row 0
2.  Parsing the Geographic Area Name into Census Tract, County, and State
3.  Rename P1_001N to PopulationSize
4.  Change Dtype of PopulationSize to int
5.  Add a new column called GEO_ID which contains the last 11 numbers of GEOID

and for the spatial data:
1. Rename Namelsad to Census Tract

In [None]:
#Cleaning statistical data [1],[3],[4]
dfcleaned= (
    df.drop(columns=['Unnamed: 3'],index=[0]).
    rename(columns={'P1_001N':'PopulationSize'}).
    astype({'PopulationSize':'int'})
)
dfcleaned.head()

In [None]:
#Parsing [2]. I'm using Lambda as we're doing multi-step parsing -> splitting and then taking certain element.
dfcleaned['CensusTract'] = dfcleaned['NAME'].apply(lambda x: x.split(";")[0])
dfcleaned['County'] = dfcleaned['NAME'].apply(lambda x: x.split(";")[1])
dfcleaned['State'] = dfcleaned['NAME'].apply(lambda x: x.split(";")[2])

#Create new column and slice GEO_ID. I'm not using lambda as it is a simple parsing.
dfcleaned['GEOID']=dfcleaned['GEO_ID'].str[-11:]

#Getting an idea on how the results look like
dfcleaned.head()

In [None]:
#Cleaning Spatial Data
gdfcleaned=gdf.rename(columns={'NAMELSAD':'CensusTract'})
gdfcleaned.head()

Step 3: Further Data Processing

1. Merging spatial and statistical information
2. Converting them to JSON

In [None]:
#Merging Statistical and Spatial Data. The spatial data need to be the "main" file where the data is merged into.
CombinedGDF=pd.merge(gdfcleaned,dfcleaned,how="left",on="GEOID")
CombinedGDF.head()

In [None]:
#Cleaning the merged data
CombinedGDFCleaned=(
    CombinedGDF.drop(columns={'STATEFP','CensusTract_x','GEO_ID','MTFCC','FUNCSTAT','COUNTYFP','TRACTCE','NAME_x','NAME_y'}).
    rename(columns={'CensusTract_y':'CensusTract'})
)
CombinedGDFCleaned.head()

In [None]:
#Checking if the merge ran well
CombinedGDFCleaned.shape

In [None]:
CombinedGDFCleaned.info()

In [None]:
#Converting files to JSON (lighter, better)
CombinedGDFCleaned.to_file("cleaneddata.geojson", driver="GeoJSON")

In [None]:
#reading JSON
jsonfile = gpd.read_file(r"C:\Users\Shell\Desktop\Self Upgrade\Python\Portfolio Building\3D Map\cleaneddata.geojson")

In [None]:
# Normalization: The purpose of this step is to scale our data so it matches the range required for color values. So we have better control on deciding the colour of the maps.
# Colour values is between 0 - 255, so we need to normalize the data to that range.
# How do we do that? we try to rescale the data into 0-1 range first, and from this we got a relative position of each value, which we will multiply by 255.

pmin = jsonfile["PopulationSize"].min()
pmax = jsonfile["PopulationSize"].max()

jsonfile["PopSizeNorm"] = jsonfile["PopulationSize"].apply(
    lambda x: 255 * ((x - pmin) / (pmax - pmin))
)

In [None]:
jsonfile.describe()

In [None]:
#Checking if there are null values
jsonfile.isnull().sum()

No null value, means the data is good to go!

Step 4: Creating Maps

In [None]:
# In creating the 3D maps, I'm using Pydeck
# The parameters reference: https://deckgl.readthedocs.io/en/latest/gallery/geojson_layer.html

# Designing the view state
INITIAL_VIEW_STATE = pdk.ViewState(
    latitude=40.71427,
    longitude=-74.00597,
    zoom=9,
    max_zoom=16,
    pitch=45,
    bearing=0
)

#Designing the map
designedmap = pdk.Layer(
    "GeoJsonLayer",
    jsonfile,
    opacity=1,
    stroked=False,
    filled=True,
    extruded=True,
    wireframe=False,
    pickable=True,
    get_elevation="PopulationSize",
    get_fill_color="PopulationSize==0?[0,0,0,0]:[int(PopSizeNorm, 0, 255-PopSizeNorm]",
    #Above and below shows an if function. If Population size is 0, then the colour is [x,y,z,a]. The first three is code for RGB while the last implies opacity.
    #if Population size is not 0, then the colour will follow the [X1,y2,z2] following :. These three are also code for RGB. 
    #We can create the colour based on the population size (and gradient - in which case we do normalization process that takes quite sometime, thus I opted out)
    get_line_color="PopulationSize==0?[0,0,0,0]:[PopSizeNorm+50, PopSizeNorm+50, PopSizeNorm+50]",
)

# Combining the view state and the map
Map3D = pdk.Deck(layers=[designedmap], initial_view_state=INITIAL_VIEW_STATE)

# Exporting as an html file
Map3D.to_html("3dmap.html")


The map showcases significant spike (and a stronger red colour) in New York country, illustrating a higher population number in the area.