# Download rainfall data from CHIRPS

In this notebook, I explain how to get the rainfall data at given GPS coordinates, using [CHIRPS daily data](http://legacy.chg.ucsb.edu/data/chirps/).

## Import packages

In [1]:
import platform
print(platform.python_version())

3.6.9


In [2]:
import netCDF4
import csv
import numpy as np
import pandas as pd
import datetime
import wget
from PIL import Image
import os

## Define functions to get rainfall data

In [8]:
def get_precip(lon, lat, lon_array, lat_array, chirps):

    # Find grids surrounding the municipality
    lon_near_argsort = np.argsort(abs(lon_array - lon))
    lon_near1 = lon_near_argsort[0]
    lon_near2 = lon_near_argsort[1]
    lat_near_argsort = np.argsort(abs(lat_array - lat))
    lat_near1 = lat_near_argsort[0]
    lat_near2 = lat_near_argsort[1]

    # Calculate distances from the municipality centroid to each grid
    surrounding_grid1_dist = np.sqrt((lon_array[lon_near1] - lon) ** 2 + (lat_array[lat_near1] - lat) ** 2)
    surrounding_grid2_dist = np.sqrt((lon_array[lon_near1] - lon) ** 2 + (lat_array[lat_near2] - lat) ** 2)
    surrounding_grid3_dist = np.sqrt((lon_array[lon_near2] - lon) ** 2 + (lat_array[lat_near1] - lat) ** 2)
    surrounding_grid4_dist = np.sqrt((lon_array[lon_near2] - lon) ** 2 + (lat_array[lat_near2] - lat) ** 2)

    precip1 = np.array(chirps['precip'][:,lat_near1,lon_near1])
    precip1[precip1 == -9999] = np.nan
    precip2 = np.array(chirps['precip'][:,lat_near1,lon_near2])
    precip2[precip2 == -9999] = np.nan
    precip3 = np.array(chirps['precip'][:,lat_near2,lon_near1])
    precip3[precip3 == -9999] = np.nan
    precip4 = np.array(chirps['precip'][:,lat_near2,lon_near2])
    precip4[precip4 == -9999] = np.nan

    try:
        numerator = 0
        denominator = 0
        for precip, dist in zip([precip1, precip2, precip3, precip4], 
                                [surrounding_grid1_dist, surrounding_grid2_dist, 
                                 surrounding_grid3_dist, surrounding_grid4_dist]):
            if ~np.isnan(precip).any():
                numerator += precip / dist
                denominator += 1 / dist
        chirps_precip = numerator / denominator
    except:
        chirps_precip = np.empty_like(precip1)
        chirps_precip[:] = np.nan
        
    return chirps_precip


In [9]:
def get_precip_all(df_id, df_lon, df_lat, download_year):
    
    url_path = 'https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p05/chirps-v2.0.' + str(download_year) + '.days_p05.nc'
    
    try:
        os.remove('chirps_daily.nc')
    except OSError:
        pass
    
    wget.download(url_path, 'chirps_daily.nc')
    chirps = netCDF4.Dataset('chirps_daily.nc')

    lon_array = np.array(chirps.variables['longitude'][:])
    lat_array = np.array(chirps.variables['latitude'][:])
    num_dates = len(chirps.variables['time'][:])
    
    precipitation_array = np.zeros([len(df_id), num_dates])
    for i, lon, lat in zip(range(len(df_id)), df_lon, df_lat):
         precipitation_array[i, :] = get_precip(lon, lat, lon_array, lat_array, chirps)
    
    precipitation_df = pd.DataFrame(precipitation_array)
    output_df = pd.DataFrame({'id': df_id})
    
    colnames = []
    start_date = datetime.datetime(download_year, 1, 1)
    end_date = datetime.datetime(download_year + 1, 1, 1)

    for i in range((end_date - start_date).days):
        data_date = start_date + datetime.timedelta(i)
        data_date_str = data_date.strftime('%Y%m%d')
        varname = 'precip_' + data_date_str
        colnames.append(varname)

    precipitation_df.columns = colnames
    output_df = pd.concat([output_df, precipitation_df], axis = 1)
    
    filename = 'chirps_' + str(download_year) + '.csv'
    output_df.to_csv(filename, index = False)
    
    try:
        os.remove('chirps_daily.nc')
    except OSError:
        pass
    

The function `get_precip_all` does the followings:
1. Download CHIRPS data for a given year;
2. Get the precipitation information at each GPS coordinate;
3. Repeat step 2 for all GPS coordinates in the dataset;
4. Save the output rainfall data as a csv file;
5. Delete the downloaded CHIRPS data.

Notice that, in the first step this function downloads the CHIRPS data to the local folder.
It might be possible to circumvent this process, but I have not tried it yet.
Since one CHIRPS data is more than 1GB, you might need to have enough storage capacity for this.
Also, in the step 2, since CHIRPS is a gridded data, I calculate the rainfall as a weighted average of values at surrounding grids, with distances from the given GPS coordinate to each grid as weights.


## Example: ACLED

As an example of how to use the functions defined above, I use the conflict data obtained at [ACLED](https://acleddata.com/curated-data-files/).
For simplicity, in this example, I use the first 12 events in the dataset:

In [5]:
acled_data = pd.read_excel('Africa_1997-2020_Apr25_short.xlsx')
acled_data.head()

Unnamed: 0,ISO,EVENT_ID_CNTY,EVENT_ID_NO_CNTY,EVENT_DATE,YEAR,TIME_PRECISION,EVENT_TYPE,SUB_EVENT_TYPE,ACTOR1,ASSOC_ACTOR_1,...,ADMIN3,LOCATION,LATITUDE,LONGITUDE,GEO_PRECISION,SOURCE,SOURCE_SCALE,NOTES,FATALITIES,TIMESTAMP
0,12,ALG1,1,01-January-1997,1997,1,Violence against civilians,Attack,GIA: Armed Islamic Group,,...,,Douaouda,36.672,2.789,1,Algeria Watch,Other,5 January: Beheading of 5 citizens in Douaouda...,5,1582579226
1,12,ALG2,2,02-January-1997,1997,1,Violence against civilians,Attack,GIA: Armed Islamic Group,,...,,Hassasna,36.133,0.883,1,Algeria Watch,Other,Two citizens were beheaded in Hassasna.,2,1582579226
2,12,ALG3,3,03-January-1997,1997,1,Violence against civilians,Attack,GIA: Armed Islamic Group,,...,,Hassi El Abed,34.966,-0.29,1,Algeria Watch,Other,Two citizens were killed in a raid on the vill...,2,1582579226
3,12,ALG4,4,04-January-1997,1997,1,Violence against civilians,Attack,GIA: Armed Islamic Group,,...,,Blida,36.469,2.829,1,Algeria Watch,Other,4 January: 16 citizens were murdered in the vi...,16,1582579226
4,12,ALG5,5,05-January-1997,1997,1,Violence against civilians,Attack,GIA: Armed Islamic Group,,...,,Douaouda,36.672,2.789,1,Algeria Watch,Other,5 January: Killing of 18 citizens in the Olivi...,18,1582579226


To get the rainfall values for these places, I use the function `get_precip_all`.
Notice that there are 4 arguments required in this function:

1. df_id: ID in the dataset, which makes it easy to merge the output data with the original dataset (for example, on Stata, by using `merge` command);
2. df_lon: longitudes in each place;
3. df_lat: latitudes in each place;
4. download_year: the year to download the rainfall information.

In this example, I get the rainfall information in 2000.


In [6]:
df_id = acled_data.EVENT_ID_CNTY
df_lon = acled_data.LONGITUDE
df_lat = acled_data.LATITUDE
download_year = 2000

In [10]:
get_precip_all(df_id, df_lon, df_lat, download_year)

In [11]:
pd.read_csv('chirps_2000.csv')

Unnamed: 0,id,precip_20000101,precip_20000102,precip_20000103,precip_20000104,precip_20000105,precip_20000106,precip_20000107,precip_20000108,precip_20000109,...,precip_20001222,precip_20001223,precip_20001224,precip_20001225,precip_20001226,precip_20001227,precip_20001228,precip_20001229,precip_20001230,precip_20001231
0,ALG1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.850499,...,10.533422,0.0,0.0,10.533422,16.825222,8.481069,9.667308,0.0,15.323618,0.0
1,ALG2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.212355,...,6.325545,3.162773,3.844197,3.162773,12.672057,25.344114,0.0,0.0,0.0,0.0
2,ALG3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,7.556294,0.0,3.589771,3.589771,9.774583,2.707627,0.0,0.0,0.0,0.0
3,ALG4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.468626,...,16.464352,0.0,0.0,11.780624,18.989582,0.0,18.989582,0.0,21.881853,0.0
4,ALG5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.850499,...,10.533422,0.0,0.0,10.533422,16.825222,8.481069,9.667308,0.0,15.323618,0.0
5,ALG6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,5.798553,0.0,0.0,11.597107,20.830961,10.415481,10.415481,0.0,10.415481,0.0
6,ALG7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.072515,...,13.135348,0.0,3.23491,0.0,31.666449,0.0,10.555484,0.0,21.110968,0.0
7,ALG8,14.384044,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.127873,0.241623,2.293149,2.293149,0.0,4.872528,0.0,0.0,4.872528,0.0
8,ALG11,12.62218,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.299604,...,8.531845,0.0,0.858013,0.0,11.922409,18.856878,0.0,4.98794,6.934469,0.0
9,ALG9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.646772,...,6.241776,0.0,0.0,10.63645,18.679432,0.0,12.669548,0.0,12.669548,0.0
