# Performing Table Joins

## Overview

This tutorial shows how to use GeoPandas to do a table join. In addition, we will be covering advanced data cleaning techniques to be able to merge datasets from different sources.

We will be working with 2 data layers for the Sri Lanka. Given the shapefile of Admin4 regions and a CSV file containing division-wise population statistics - we will learn how to merge them to display these indicators on a map.

Input Layers:
* `lka_admbnda_adm4_slsd_20220816.shp`: A shapefile of all Grama Niladhari (GN)Divisions (Admin Level 4) of Sri Lanka.
* `GN_Division_Age_Group_of_Population.csv`: Age-wise population for all GN Divisions of Sri Lanka.

Output:
* `admin4_pop.shp`: A shapefile containing age-wise population for GN Divisions.

Data Credit: 
* Sri Lanka Population Statistics: Department of Census and Statistics - Sri Lanka.  Downloaded from DCS Map Portal](http://map.statistics.gov.lk/).
* Sri Lanka - Subnational Administrative Boundaries: Sri Lanka administrative levels 0-4 shapefiles and gazetteer. Downloaded from [HDX portal](https://data.humdata.org/dataset/cod-ab-lka).

## Setup and Data Download

The following blocks of code will install the required packages and download the datasets to your Colab environment.

In [1]:
%%capture
try:
    import geopandas
except ModuleNotFoundError:
    if 'google.colab' in str(get_ipython()):
        !apt install libspatialindex-dev -qq
        !pip install fiona shapely pyproj rtree --quiet
        !pip install geopandas --quiet
    else:
        print('geopandas not found, please install via conda in your environment')

The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  libspatialindex-c4v5 libspatialindex4v5
The following NEW packages will be installed:
  libspatialindex-c4v5 libspatialindex-dev libspatialindex4v5
0 upgraded, 3 newly installed, 0 to remove and 20 not upgraded.
Need to get 555 kB of archives.
After this operation, 3,308 kB of additional disk space will be used.
Selecting previously unselected package libspatialindex4v5:amd64.
(Reading database ... 124016 files and directories currently installed.)
Preparing to unpack .../libspatialindex4v5_1.8.5-5_amd64.deb ...
Unpacking libspatialindex4v5:amd64 (1.8.5-5) ...
Selecting previously unselected package libspatialindex-c4v5:amd64.
Preparing to unpack .../libspatialindex-c4v5_1.8.5-5_amd64.deb ...
Unpacking libspatialindex-c4v5:amd64 (1.8.5-5) ...
Selecting previously unselected package libspatialindex-d

In [2]:
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

In [3]:
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

In [5]:
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

data_url = 'https://storage.googleapis.com/spatialthoughts-public-data/srilanka/'

shapefile = 'lka_admbnda_adm4_slsd_20220816'
exts = ['.shp', '.shx', '.dbf', '.prj']
csv_file = 'GN_Division_Age_Group_of_Population.csv'

for ext in exts:
  download(data_url + shapefile + ext)

download(data_url + csv_file)

Downloaded data/GN_Division_Age_Group_of_Population.csv


## Procedure

### Pre-Process Data Table

### Pre-Process Shapefile