This script performs 2 main tasks
1. Reverse geocode birth nodes - this helps us filter down the birth-nodes in the required district. The raw birth data obtained from WorldPop will contain data points from other adjacent districts. We use geoapify (https://www.geoapify.com/) api to reverse-geocode (https://www.geoapify.com/reverse-geocoding-api) the lat-long and obtain the district name. Once we have the district names, we filter out the birth-nodes corresponding to required district.
2. Geocode phc locations - Since the PHC locations for Vikarabad district do not have lat-long information, we will use the geocode api to get the coordinates https://vikarabad.telangana.gov.in/health-2/

#### Part 1 - Reverse Geocode birth nodes

In [3]:
import os
import sys
from pathlib import Path

import numpy as np

In [2]:
# from geopy.geocoders import Nominatim
import pandas as pd
import os
import time
import json

import requests
from requests.structures import CaseInsensitiveDict

In [4]:
path_data = Path.cwd().parent / 'data'

In [78]:
phc_locations_df = pd.read_csv(path_data / '01 in_phc_locations.csv', encoding= 'latin-1')

geoapify_api_key = open(path_data / '01 geoapify_api_key.txt', "r").read()

births_raw_df = pd.read_csv(path_data / '01 in_births.csv')

births_raw_df['District'] = pd.Series(dtype = 'object')

Iterate through all locations and obtain the district name. If using this script to geocode locations in other districts, slight modifications might be required to the script due to difference in the response from the api. Print out the 'res' object and adjust code according to requirements.

In [None]:
for i in range(births_raw_df.shape[0]):
    
    url = f"https://api.geoapify.com/v1/geocode/reverse?lat={births_raw_df.loc[i,'Latitude']}&lon={births_raw_df.loc[i,'Longitude']}&apiKey={geoapify_api_key}"
    headers = CaseInsensitiveDict()
    headers["Accept"] = "application/json"
    resp = requests.get(url, headers=headers)
    res = json.loads(resp.text)
    
    try:
        births_raw_df.loc[i,'District'] = res['features'][0]['properties']['state_district']
    except KeyError:
        births_raw_df.loc[i,'District'] = res['features'][0]['properties']['county']
        
    time.sleep(0.25)

Filter out data for required district

In [None]:
births_clean_district_df = births_raw_df[births_raw_df['District'] == 'Vikarabad District']

births_clean_district_df = births_clean_district_df.drop(columns= 'District')

#### Part 2. Geocode PHC locations 

In [83]:
phc_locations_df['Latitude'], phc_locations_df['Longitude'] = [0.0,0.0]

phc_locations_df = phc_locations_df.replace(u'\xa0', u'')

phc_locations_df['PHC Full Name'] = phc_locations_df['Primary Health Centre Name'] + ", " + phc_locations_df['Primary Health Centre Location'] + ", " + phc_locations_df['Mandal']

In [84]:
for index, row in phc_locations_df.iterrows():
    url = 'https://api.geoapify.com/v1/geocode/search'
    params = dict(
        text= row['PHC Full Name'],
        apiKey=geoapify_api_key
    )

    resp = requests.get(url=url, params=params)

    res = json.loads(resp.text)
    try:
        phc_locations_df.loc[index, ['Longitude']], phc_locations_df.loc[index, ['Latitude']] = res['features'][0]['geometry']['coordinates']
    except IndexError:
        phc_locations_df.loc[index, ['Longitude']], phc_locations_df.loc[index, ['Latitude']] = [np.nan, np.nan]
    time.sleep(0.25)

In [76]:
births_raw_df.to_csv(path_data / '01 out_births_cleaned_full.csv', index= False)

births_clean_district_df.to_csv(path_data / '01 out_births_cleaned_district.csv', index = False)

NameError: name 'births_clean_district_df' is not defined

In [85]:
phc_locations_df.to_csv(path_data / '01 out_phc_locations_geocoded.csv', index= False)