# Explore GPKG File 
Exploring the GPKG file from GreenSpace downlaod folder

Conclusion:
- The file can be read by geopandas. 
- The ...short_pnt.gpkg file contains geometery POINT data, 
- The ...V1_2.gpkg contains MULTIPOLYGON data,
- Both can be used as area boundary for the choropleth map.

Note: ...V1_2.gpkg file is very large, so it may take few minutes to load.

In [1]:
import geopandas as gpd
import pandas as pd
import os

In [2]:
gpkg_path = 'GreenspaceDownload/GHS_STAT_UCDB2015MT_GLOBE_R2019A_V1_2.gpkg'

def load_gpkg(gpkg_path):
    print('Loading large file, may take 1-2 minutes...')
    gdf = gpd.read_file(gpkg_path)
    return gdf

In [3]:
gdf = load_gpkg(gpkg_path)
gdf.head()

The file V1_2.gpkg contains columns that are almost identical to those in the greenspace CSV file, with the addition of a geometry column containing MULTIPOLYGON data.

However, this file takes a long time to be loaded due to its size, it is better to output it as a smaller, more readable geojson file. Additionally, we should only retain the data we are interested in -- data about the United States.

Hence, in the following code, I will filter out data that is not related to the United States and output the result as a geojson file named `Greenspace_US.geojson`.

In [4]:
# filter out US data and output to geojson
def output_US_geojson(gdf, output_path):
    df = gdf[gdf['CTR_MN_NM'] == 'United States'].reset_index(drop=True)
    
    if os.path.exists(output_path):
        print(f'{output_path} already exists')
    else:
        print(f'Output {output_path}...')
        df.to_file(output_path, driver='GeoJSON')
    
    return df

In [5]:
output_path = 'Greenspace_US.geojson'
us_gdf = output_US_geojson(gdf, output_path)

print(us_gdf.shape)
us_gdf.head()