### Geo Analysis Demo
This is a demo of geo analysis using geopandas. We visualise the number of tweets each gcc or rural made on a map using shape files downloaded from: https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files

We visualised:
* Number of tweets each gcc and rural made using analysis results from A1Q1
* Happy socre of each place (this demo is in '/notebook/sample_sentiment_analysis.ipynb')

What can we do further: <br>
As we can see from the map, the granularity of GCC Statistic Area(GCCSA) is very large (Each area is either too large (rural places) or too small (gcc places)), so not much information can be discovered. Hence, we might need to consider using Suburbs and Localities shape file(SAL) for further analysis. 

In [1]:
import geopandas as gpd
import folium
import pandas as pd

In [2]:
# Sample geo analysis code
'''
boundary = gpd.read_file("../../data/raw/Geo/SA2_2021_AUST_SHP_GDA2020/SA2_2021_AUST_GDA2020.shp")
boundary = boundary[boundary.STE_NAME21 == 'Victoria']
geo = boundary[['SA2_CODE21','SA2_NAME21', 'geometry']]
geo.SA2_CODE21 = geo.SA2_CODE21.astype(int)
geoboundary = boundary[['SA2_NAME21', 'geometry']].dropna()'''

'\nboundary = gpd.read_file("../../data/raw/Geo/SA2_2021_AUST_SHP_GDA2020/SA2_2021_AUST_GDA2020.shp")\nboundary = boundary[boundary.STE_NAME21 == \'Victoria\']\ngeo = boundary[[\'SA2_CODE21\',\'SA2_NAME21\', \'geometry\']]\ngeo.SA2_CODE21 = geo.SA2_CODE21.astype(int)\ngeoboundary = boundary[[\'SA2_NAME21\', \'geometry\']].dropna()'

In [5]:
gdf = gpd.read_file("../data/raw/GCCSA_2021_AUST_SHP_GDA2020/GCCSA_2021_AUST_GDA2020.shp")
geoboundary = gdf[['STE_CODE21','GCC_CODE21', 'geometry']].dropna()
geoJSON = geoboundary.to_json()
result = pd.read_csv('../data/curated/q1.csv') # a demo using Assginemnt1 q1
result

Unnamed: 0,gcc,count
0,2GMEL,2286891
1,1GSYD,2218396
2,3GBRI,859994
3,3RQLD,606938
4,5GPER,589322
5,1RNSW,519371
6,4GADE,465908
7,2RVIC,412393
8,8ACTE,202646
9,6GHOB,90816


In [6]:
gdf2 = pd.merge(gdf, result, how='left', left_on='GCC_CODE21', right_on = 'gcc')

In [None]:
m = folium.Map(location=[-37.81, 144.96], tiles="Stamen Terrain", zoom_start=10, color='white')
svg_style = '<style>svg {background-color: rgb(255, 255, 255,0.5);}</style>'
m.get_root().header.add_child(folium.Element(svg_style))

c = folium.Choropleth(
    geo_data=geoJSON,
    name='choropleth',
    data=gdf2, 
    columns=['gcc','count'],
    key_on='properties.GCC_CODE21', 
    fill_color='Paired', 
    nan_fill_color='black',
    legend_name='tweets made per gcc area',
).add_to(m)

#m.save('../../plots/population_density_sa2.html')
m