## Load Disaster-related Statistics (BBS, 2016)
Original data is from [Bangladesh Disaster-related Statistics 2015: Climate Change and Natural Disaster Perspectives](http://203.112.218.65:8008/PageWebMenuContent.aspx?MenuKey=242).

In [7]:
import os
import sys
import numpy as np
import pandas as pd
import geopandas as gpd
import rasterio
from sklearn.preprocessing import MinMaxScaler, PowerTransformer, QuantileTransformer
import fhv
from tabula import read_pdf

In [8]:
# zila = gpd.read_file('./data/admin_boundary/bgd_admbnda_adm2_bbs_20180410.shp')
# zila = zila[['ADM1_EN','ADM1_PCODE','ADM2_EN','ADM2_PCODE']]
# zila[['ADM1_PCODE','ADM2_PCODE']] = zila[['ADM1_PCODE','ADM2_PCODE']].astype(int)
# mymensingh = (zila['ADM1_PCODE'] == 45)
# zila.loc[mymensingh, 'ADM2_PCODE'] = zila.loc[mymensingh, 'ADM2_PCODE'] % 100 + 3000
# zila = zila.sort_values(by='ADM2_PCODE').reset_index(drop=True)
# zila.to_excel('./data/zila_list.xlsx')

### List of extracted tables from the report (bold)
- Table 4: Distribution of household by main source of income and received remittance by division and district, 2014.
- Table 5: Distribution of main source oflighting and cooking fuel by division and district, 2014.
- Table 18: Distribution of annual household income from agricultural products by division and district, 2014.
- Table 20: Distribution of annual household income from non-agricultural sector by division and district, 2014.
- Table 22: Distribution of annual household income from other source by division and district, 2014.
- Table 23: Distribution of Disaster affected times of household by division, 2009-'14.
- Table 24: Distribution of affected households by disaster categories by division, 2009-'14.
- **Table 25: Distribution of affected household and disaster categories by division and district, 2009-'14.**
- Table 26: Distribution of household number of non working days due to last natural disaster by disaster categories and division, 2009-'14.
- Table 27: Distribution of Affected Household got early warning by disaster categories and division, 2009-'14.
- Table 28: Distribution of household got early warning by type of media, disaster categories and division, 2009-'14.
- Table 29: Distribution of affected area and loss of major crops by type of disaster categories and division, 2009-'14    
- Table 30: Distribution of affected area and value of loss and damage of minor crops by type of disaster categories and division, 2009-'14.
- Table 31: Distribution of affected area and loss of major crops by division and district, 2009-'14.
- Table 32: Distribution of affected area and loss of minor crops by division and district, 2009-'14.
- Table 35: Distribution of area and damage value of land by disaster categories and division, 2009-'14.
- Table 36: Distribution of area and damage value of land by division and district, 2009-'14.
- **Table 39: Distribution of population suffering from sickness and injury by sex, disaster categories and division, 2009-'14.**
- Table 40: Distribution of population suffering from sickness and injury by sex, age group and division, 2009-'14.
- **Table 41: Distribution of population suffering from sickness and injury by sex, division and district, 2009-'14.**
- Table 42: Distribution of number of total children and sick children by division and district, 2009-'14.
- **Table 48: Distribution of Children did not attend to School Due to Natural Disaster by Division and District, 2009-'14.**
- Table 51: Distribution of disaster preparedness of household by disaster category and division, 2009-'14.
- **Table 52: Distribution of disaster preparedness of household by division and district, 2009-'14.**
- **Table 53: Distribution of households having disaster precaution measures according to prior-disaster experience by disaster and division, 2009-'14.**
- Table 54: Distribution of household preparedness during disaster period untill normal situation by disaster and division, 2009-'14.
- **Table 55: Distribution of household preparedness during disaster period untill normal situation by division and district, 2009-'14.**
- **Table 56: Distribution of household taken action (precaution) during disaster period until normal situation by disaster and division, 2009-'14.**
- **Table 57: Distribution of population suffering from disease due to disaster by division and district, 2014.**
- **Table 58: Distribution of population suffering from disease due to natural disaster by sex, age group, division and district, 2014.**
- **Table 59: Distribution of Population Suffering from Disease Due to natural disaster by Type of Disease, Division and District, 2014.**
- Table 60: Distribution of household members suffering from disease before disaster by division and district, 2009-'14.
- Table 61: Distribution of household members suffering from disease during disaster period by division and district, 2009-'14
- **Table 62: Distribution of household members suffering from disease post disaster period by division and district, 2009-'14.**
- Table 63: Distribution of main probable cause of suffering from disease due to disaster by division and district, 2014.    
- **Table 64: Distribution of source of household drinking water during disaster period by division and district, 2009-'14.**
- Table 65: Distribution of other use of water (cooking, sewerage, cleanliness etc.) before disaster period by division and district, 2009-'14.
- **Table 66: Distribution of other use water (cooking, sewerage, cleanliness etc.) during disaster period by division and district, 2009-'14.**
- **Table 67: Distribution of disease status due to insufficient drinking and other use of water supply during/after disaster period by division and district, 2009-'14.** 
- Table 68: Distribution of cause of main disease due to insufficient drinking and other use of water supply during/after disaster period by division and district, 2009-'14.
- Table 71: Distribution of respondent's knowledge and perception about main impact of climate change by division and district, 2014.
- **Table 72: Distribution of respondent's knowledge and perception about disaster by division and district, 2014**
- **Table 73: Distribution of Respondent's knowledge and perception about disaster management by division and district, 2014.**
- **Table 74: Distribution of household received finantial/rehabiltation support from government/non-government agency during/post disaster period by division and district, 2009-'14**
- Table 75: Distribution of household received financial/rehabilitation support from different organization/ office during/post disaster period by division and district, 2009-'14.
- Table 76: Distribution of households received loan from post disaster period by division and district, 2009-'14.
- **Table A1: Standard error calculate of total income and total damage and loss by divisiond and district.**

In [9]:
DistrictName = ['Barisal','Chittagong','Dhaka','Khulna','Rahshahi','Rangpur','Sylhet']
DisasterType = ['Drought','Flood','Water logging','Cyclone',
                'Tornado','Storm/Tidal Surge','Thunderstorm','River/Coastal Erosion',
                'Landslides','Salinity','Hailstorm','Others']
Mulcol = pd.MultiIndex.from_product([DistrictName, DisasterType], names=['District','Disaster'])
df = pd.read_excel('./data/Disaster-related Statistics 2015.xlsx', sheet_name='Zila')
rind = np.array([0,1,8,20,38,49,58,67])
dist_new_name = df['ADM2_EN']

def LoadDisasterStat2015(sheet_name):
    
    # For the last table (Table_A1)
    if sheet_name == 'Table_A1':
        df = pd.read_excel('./data/Disaster-related Statistics 2015.xlsx', 
                   sheet_name='Table_A1', 
                   skiprows=1,
                   header=[0])
        df = df.set_index('Division/District')
        df.index.name = 'District'
        df.index = dist_new_name
        return df

    # For other tables
    df = pd.read_excel('./data/Disaster-related Statistics 2015.xlsx', 
                       sheet_name=sheet_name, 
                       skiprows=1,
                       header=[0,1])    
    if df.columns[0][0] == 'Division/District':
        # Length of single columns
        ind = len([name for name in df.columns.get_level_values(1).astype(str) if 'Unname' in name])
        sub1 = df[df.columns[:ind]]
        sub2 = df[df.columns[ind:]]
        sub1.columns = pd.MultiIndex.from_tuples([(c[0], '') for c in df[df.columns[:ind]] ])
        df = pd.concat([sub1,sub2], axis=1).set_index('Division/District')
        assert df.isna().sum().sum() == 0

        # Reshape dataframe
        df.index.name = 'District'
        assert df.shape[0] == 72
        df = df.drop(df.iloc[rind].index, axis = 0)
        df.index = dist_new_name
        # df = df.drop(['Total Household'], axis=1)

    elif df.columns[0][0] == 'Type of Disaster':
        # Length of single columns
        ind = len([name for name in df.columns.get_level_values(1).astype(str) if 'Unname' in name])
        sub1 = df[df.columns[:ind]]
        sub2 = df[df.columns[ind:]]
        sub1.columns = pd.MultiIndex.from_tuples([(c[0], '') for c in df[df.columns[:ind]] ])
        df = pd.concat([sub1,sub2], axis=1).set_index('Type of Disaster')

        # Reshape dataframe
        df = df[~(df.isna().sum(1) == df.shape[1])]  # Select empty rows
    #     df = df.drop(['Total Household'], axis=1)
        df = df[df.index != 'Total']
        df = df.iloc[12:]    # Remove Bangladesh total
        assert df.shape[0] == 84
        df = pd.DataFrame(data=df.values,index=Mulcol,columns=df.columns.get_level_values(1))
        assert df.isna().sum().sum() == 0
        
    return df

In [10]:
disaster_table = [['PAFFTHOUS','pos','House','Adaptive Capacity','Percent of households affected by floods','MinMax','District'],
                  ['PNOSCHOOL','pos','Person','Adaptive Capacity','Percent of children did not attend to school due to disasters','MinMax','District'],
                  ['PNOPREPARED','pos','House','Adaptive Capacity','Percent of households has not taken disaster preparedness','MinMax','District'],
                  ['PDISEASE','pos','Person','Health','Percent of population who has suffered from disease due to disasters','MinMax','District'],
                  ['PDIARRHEA','pos','Person','Health','Percent of population experienced diarrhea as a main disease due to natural disaster','MinMax','District'],
                  ['PDISEASEDWATER','pos','House','Health','Percent of households with disease due to insufficient drinking water during/after disaster period','MinMax','District'],
                  ['PPERCEPTION','neg','House','Adaptive Capacity','Percent of households with knowledge and perception about disaster','MinMax','District'],
                  ['PSUPPORT','neg','House','Adaptive Capacity','Percent of household received financial support from agencies during/after disaster period','MinMax','District'],
                  ['DAMAGERATIO','pos','Person','Adaptive Capacity','Ratio of total damage and loss to total income in district level','MinMax','District']
                 ]
disaster_table = pd.DataFrame(disaster_table, columns=['Name','Sign','Type','Domain','Description','Normalization','Scale'])
disaster_table['Source'] = 'BBS (2016)'
print(disaster_table[['Description']])

                                         Description
0           Percent of households affected by floods
1  Percent of children did not attend to school d...
2  Percent of households has not taken disaster p...
3  Percent of population who has suffered from di...
4  Percent of population experienced diarrhea as ...
5  Percent of households with disease due to insu...
6  Percent of households with knowledge and perce...
7  Percent of household received financial suppor...
8  Ratio of total damage and loss to total income...


In [11]:
# DataFrame of variables
zila = pd.read_excel('./data/Disaster-related Statistics 2015.xlsx', 
                   sheet_name='Zila',index_col=0)
disaster = pd.DataFrame(index=zila['ADM2_EN'])
disaster.index.name = 'DID'

# PAFFTHOUS: Percent of households affected by floods
df = LoadDisasterStat2015('Table_25')
disaster['PAFFTHOUS'] = df['Affected Household','Flood']/df['Total Household']

# PNOSCHOOL: Percent of children did not attend to school due to disasters
df = LoadDisasterStat2015('Table_48')
disaster['PNOSCHOOL'] = df[[('Children','Not Attended School'),
                            ('Children','Not School Going')]].sum(1)/df['Children', 'Total']

# PNOPREPARED: Percent of households has not taken disaster preparedness
df = LoadDisasterStat2015('Table_52')
disaster['PNOPREPARED'] = df['Preparedness','Not Taken']/df['Preparedness','Total']

# PDISEASE: Percent of population who has sufferred from disease due to disasters
df = LoadDisasterStat2015('Table_57')
disaster['PDISEASE'] = df['Population','Suffering']/df['Population','Total']

# PDIARRHEA: Percent of population experienced diarrhea as a main disease due to natural disaster
df = LoadDisasterStat2015('Table_59')
disaster['PDIARRHEA'] = (df[('Type of Disease','Diarrhoea')]/df[('Total Suffering','')]).astype(float)

# PDISEASEDWATER: Percent of households with disease due to insufficient drinking water during/after disaster period
df = LoadDisasterStat2015('Table_67')
disaster['PDISEASEDWATER'] = df['Disease','Yes']/df['Total Household','']


####
# PPERCEPTION: Percent of households with knowledge and perception about disaster
df = LoadDisasterStat2015('Table_72')
disaster['PPERCEPTION'] = df[[('Knowledge and Perception', 'Critical Situation Caused by Nature/Human'),
                              ('Knowledge and Perception','Continuous Natural Process Occurs in Course of Time')]].sum(1)/df['Knowledge and Perception', 'Total']
# disaster['PPERCEPTION'] = df[('Knowledge and Perception', 'Critical Situation Caused by Nature/Human')]/df['Knowledge and Perception', 'Total']
# df = LoadDisasterStat2015('Table_73')
# disaster['PPERCEPTION'] = df[('Knowledge and Perception', 'In Order to Minimize Losses Pre-, During- & Post-Disaster')]/df['Knowledge and Perception', 'Total']
####

# PSUPPORT: Percent of household received finantial support from agencies during/after disaster peirod
df = LoadDisasterStat2015('Table_74')
disaster['PSUPPORT'] = df[('Financial/Rehabilitation Support', 'Yes')]/df[('Total Household','')]

# DAMAGERATIO: Ratio of total damage and loss to total income in district level
df = LoadDisasterStat2015('Table_A1')
disaster['DAMAGERATIO'] = df['Total damage and loss']/df['Total income']

# Convert District_name to Disaster_code
disaster.index = zila['ADM2_PCODE'].astype(int)
disaster.head()

Unnamed: 0_level_0,PAFFTHOUS,PNOSCHOOL,PNOPREPARED,PDISEASE,PDIARRHEA,PDISEASEDWATER,PPERCEPTION,PSUPPORT,DAMAGERATIO
ADM2_PCODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1004,0.004874,0.450996,0.152479,0.035781,0.116745,0.072802,0.676733,0.285581,0.26251
1006,0.10911,0.341601,0.194746,0.019557,0.454488,0.084649,0.783799,0.133959,0.467403
1009,0.048528,0.420014,0.117564,0.11074,0.09713,0.249081,0.867596,0.192321,0.332349
1042,0.090412,0.278391,0.041717,0.042946,0.069884,0.076886,0.784465,0.336469,0.192493
1078,0.061115,0.646824,0.156958,0.038187,0.116735,0.135836,0.83914,0.269234,0.293171


In [12]:
# Save data
if True:
    fn = './data/disaster.hdf'
    disaster.to_hdf(fn, 'data'); print('%s is saved.' % fn)
    fn = './data/disaster_table.hdf'
    disaster_table.to_hdf(fn, 'table'); print('%s is saved.' % fn)

./data/disaster.hdf is saved.
./data/disaster_table.hdf is saved.
