### Bangladesh Population in Unions (ADM4 Level)
- GeoCode is obtained from [BBS Geo Location Registry](http://app.dghs.gov.bd/bbscode/)
- Bangladesh population is **144,043,697** (BBS) and **149,273,778** (WorldBank) in 2011, and **159,670,593** (WorldBank) in 2017.
- 'bgd_admbnda_adm4_bbs_20180410.shp' has 5,160 unions
- Municipal Corporations, also known as **Paurasava**, are the local governing bodies of the cities and towns in Bangladesh. There are 327 such municipal corporations in eight divisions of Bangladesh. The Paurasava consists of multiple Wards and is spatially represented as a single feature with PCODE likes "XXXXXX99". However, the city coporation has its own Wards that are spatially represented in the shapefile.
- Mymensingh (45) division consists of 4 districts: Sherpur (4589), Jamalpur (4539), Mymensingh (4561), Netrakona (4572). It was created in 2015 from districts previously comprising the northern part of Dhaka Division in 2015. In the 2011 Census data, these districts are included into Dhaka division (30). So we need to modify it in order to link with 2018 Shapefile.

In [1]:
import numpy as np
import pandas as pd
import xlsxwriter
import geopandas as gpd
from tabula import read_pdf
import fhv

In [2]:
shape = gpd.read_file('./data/admin_boundary/bgd_admbnda_adm4_bbs_20180410.shp')
shape = shape[shape.columns[[12,11,10,9,8,7,3,2]]].sort_values('ADM4_PCODE').reset_index(drop=True)
ucode_shape = shape[['ADM4_PCODE','ADM4_EN']]
ucode_shape.columns = ['Ucode','Name']

### Load BBS Geocode Union PDFs
We need to import this files in order to identify/classify city corporations and paurasava.

In [3]:
if False:
    # Barisal (10)
    df = read_pdf('./data/union/Geocode Union_Barisal2015.pdf',pages='all',multiple_tables=False,
                  pandas_options={'header':0,'skiprows':3})
    df10 = df.drop(df.columns[-1], axis=1).dropna(axis=0, how='all').reset_index(drop=True)
    df10.columns = ['Division','Zila','Upazila','Paurasava','Union','Name']
    # Chittagong (20)
    df = pd.read_excel('./data/union/Geocode Union_Chittagong2015.xlsx',header=0, skiprows=1)
    df = df.dropna(axis=0, how='all').reset_index(drop=True)
    df.columns = ['Division','Zila','Upazila','Paurasava','Union','Name']
    df = df.loc[df['Division'] != 'Division']
    df20 = df.loc[df['Division'] != -1]
    # Dhaka (30)
    df = read_pdf('./data/union/Geocode Union_Dhaka2015.pdf',pages='all',multiple_tables=False,
                  pandas_options={'header':0,'skiprows':2})
    df30 = df.dropna(axis=0, how='all').reset_index(drop=True)
    df30.columns = ['Division','Zila','Upazila','Paurasava','Union','Name']
    # Khulna (40)
    df = read_pdf('./data/union/Geocode Union_Khulna2015.pdf',pages='all',multiple_tables=False,
                  pandas_options={'header':0,'skiprows':2})
    df40 = df.dropna(axis=0, how='all').reset_index(drop=True)
    df40.columns = ['Division','Zila','Upazila','Paurasava','Union','Name']
    # Rajshahi (50)
    df = read_pdf('./data/union/Geocode Union_Rajshahi2015.pdf',pages='all',multiple_tables=False,
                  pandas_options={'header':0,'skiprows':2})
    df50 = df.dropna(axis=0, how='all').reset_index(drop=True)
    df50.columns = ['Division','Zila','Upazila','Paurasava','Union','Name']
    # Rangpur (55)
    df = read_pdf('./data/union/Geocode Union_Rangpur2015.pdf',pages='all',multiple_tables=False,
                  pandas_options={'header':0,'skiprows':2})
    df55 = df.dropna(axis=0, how='all').reset_index(drop=True)
    df55.columns = ['Division','Zila','Upazila','Paurasava','Union','Name']
    # Sylhet (60)
    df = read_pdf('./data/union/Geocode Union_Sylhet2015.pdf',pages='all',multiple_tables=False,
                  pandas_options={'header':0,'skiprows':2})
    df60 = df.dropna(axis=0, how='all').reset_index(drop=True)
    df60.columns = ['Division','Zila','Upazila','Paurasava','Union','Name']
    # Merge
    df = pd.concat([df10, df20, df30, df40, df50, df55, df60])
    # Remove unnecessary rows
    remove = (df['Name'].isna()) | (df['Name'] == 'Name') | (df['Name'] == '(6)')
    df = df[~remove]
    # Split merged codes in Paurasava column
    targ = df['Paurasava'].str.len() == 5
    new = df.loc[targ, 'Paurasava'].str.split(' ',n=1,expand = True)
    df.loc[targ, 'Paurasava'] = new[0]
    df.loc[targ, 'Union'] = new[1]
    gcode = df.reset_index(drop=True)
    # Change dtype to float
    gcode[gcode.columns[:-1]] = gcode[gcode.columns[:-1]].astype(np.float)
    gcode.to_excel('./data/union/Geocode_temp.xlsx')
else:
    gcode = pd.read_excel('./data/union/Geocode_temp.xlsx',index_col=0)

### Load Union Statistics (BBS, 2011)
Reads BBS 2011 Census - Union Statistics (union_stats_extracted.xlsx)

In [4]:
# Load extracted (converted from PDF manually by Donghoon)
df = pd.read_excel('./data/union/union_stats_extracted.xlsx',
                   skiprows=0,header=0,skipfooter=0)

### MANUALLY REMOVE ROWS ###
# Remove row of 'Brahmanpara  Paurashava' (This is unnessary row)
df = df.drop(df.index[df['Name'] == 'Brahmanpara  Paurashava'].values, axis=0).reset_index(drop=True)
df = df[df['Name'].str.contains("chittagang cnt.",case=False) == False]

# Add a column of Paurasava
df.insert(3, "Paurasava",np.full(df.shape[0], np.nan))
# Creat a subset of data
stat = df[df.columns[:6]].copy()
stat[stat.columns[:5]] = stat[stat.columns[:5]].astype('float')

# Assign Paurasava values from Geocode
union_gcode = gcode['Division']*10**6+gcode['Zila']*10**4+gcode['Upazila']*10**2+gcode['Union']
union_stat = stat['Division']*10**6+stat['Zila']*10**4+stat['Upazila']*10**2+stat['Union']
same = np.intersect1d(union_stat,union_gcode, return_indices=True, assume_unique=False)
stat.iloc[same[1],3] = gcode.iloc[same[2],3].values
# ID for unselected unions
stat.loc[stat['Name'].str.contains(' union',na=False),'Paurasava'] = 99
# ID for Paurashava rows
stat = stat.reset_index(drop=True)
ip = stat.index[stat['Name'].str.contains('paurashava',na=False,case=False)].values                                   
stat.iloc[ip,3] = stat.iloc[ip+1,3].values

# Insert 8 digits UCODE
stat['Ucode'] = stat['Division']*10**6+stat['Zila']*10**4+stat['Upazila']*10**2+stat['Union']
stat = stat[stat.columns[[0,1,2,3,4,6,5]]]

### MANUAL CHANGES (ADD/REMOVE) ###
# - Restore missing names in the official documents
stat.loc[stat['Ucode'] == 50882727, 'Name'] = 'KHAS KAULIA'

# Change names (removing total and union in the strings)
targ = stat['Union'].notna()
name = stat.loc[targ,'Name'].reset_index(drop=True)
name = name.str.replace(' Total','')
name = name.str.replace(' union','')
name = name.str.replace(' Union','')
name = name.str.replace(' Unio','')
name = name.str.replace(' Dakshin','')

stat.loc[targ,'Name'] = name.values
stat['Name'] = stat['Name'].str.upper()

### ADD MISSED ROWS ###
def InsertRows(dfstat, head, rows):
    '''This function isert rows right below the head row and return df.
    '''
    dfstat = dfstat.copy()
    rows = pd.DataFrame(rows, columns=dfstat.columns)
    tdx = np.argwhere([dfstat['Name'] == head])[0][1]
    dfstat = pd.concat([dfstat.iloc[:tdx+1], rows, dfstat.iloc[tdx+1:]]).reset_index(drop=True)
    return dfstat

# Under "CHAPITALA"
rows = [(20,19,81,99,27,20198127,'DARORA'),
        (20,19,81,99,31,20198131,'CHHALIAKANDI'),
        (20,19,81,99,36,20198136,'DHAMGHAR')]
stat = InsertRows(stat, 'CHAPITALA', rows)
# Under "BARA BAKIA"
rows = [(20,22,56,99,55,20225655,'UJANTIA')]
stat = InsertRows(stat, 'BARA BAKIA', rows)
# Under "GHARJAN"
rows = [(50,88,27,99,27,50882727,'KHAS KAULIA')]
stat = InsertRows(stat, 'GHARJAN', rows)

# Retouch Code
stat['Ucode'] = stat['Division']*10**6+stat['Zila']*10**4+stat['Upazila']*10**2+stat['Union']
# - Insert ADM3_PCODE TO stat
stat.insert(3, 'ADM3_PCODE', stat['Division']*10**4+stat['Zila']*10**2+stat['Upazila'])

# Subset of Union codes and names
targ = stat['Union'].notna()
ucode_stat = stat[['Ucode','Name']][targ].reset_index(drop=True)

# Save (temporarily)
stat.to_excel('./data/union/stat_temp.xlsx')
# stat.head(50)


### Load Union Population in Age 5 years group
Reads BBS 2011 Census - Union Age5 population data (e.g., age5_Rangpur.xls)
- Some unions don't have rows (no population) for some age 5 years groups...

In [5]:
# Load each files
df10 = fhv.LoadUnionAge5('./data/union/age5_Barisal.xls')
df20 = fhv.LoadUnionAge5('./data/union/age5_Chittagong_modified.xls')
df30 = fhv.LoadUnionAge5('./data/union/age5_Dhaka_modified.xls')
df40 = fhv.LoadUnionAge5('./data/union/age5_Khulna.xls')
df50 = fhv.LoadUnionAge5('./data/union/age5_Rajshahi_modified.xls')
df55 = fhv.LoadUnionAge5('./data/union/age5_Rangpur.xls')
df60 = fhv.LoadUnionAge5('./data/union/age5_Sylhet_modified.xls')
df = pd.concat([df10, df20, df30, df40, df50, df55, df60])

# Restore names when Union ID were assigned
targ = df['Union'].astype(str).str.isdigit()
name = df.loc[targ, 'Union']
name_num = np.unique(name)
name_str = ucode_stat.iloc[np.intersect1d(name_num, ucode_stat['Ucode'],return_indices=True)[2],1].values
for i, val in enumerate(name_num):
    name.loc[name == val] = name_str[i]
df.loc[targ, 'Union'] = name

### MANUAL CHANGES (REMOVE/ADD ROWS) ###
df = df[(df['Union'].str.lower() == "chittagang cnt.") == False].reset_index(drop=True)

# Capitalize characters
ucode_age5 = df[df['Age5'] == 'Total']['Union'].reset_index(drop=True)

# "ucode_stat" and "ucode_age5" are cross-validated (by Donghoon)
if False:
    # Retouch (Remove "Union", "Dakshin")
    ucode_age5 = ucode_age5.str.upper()
    ucode_age5 = ucode_age5.str.replace(' UNION','')
    ucode_age5 = ucode_age5.str.replace(' DAKSHIN','')
    ucode_age5 = ucode_age5.str.replace('LAKSHMIPUR MODEL','SAKHUA')
    # Save Ucodes to comparison
    writer = pd.ExcelWriter('./data/union/ucode_comp.xlsx', engine='xlsxwriter')
    ucode_shape.to_excel(writer, sheet_name='Sheet2')
    ucode_stat.to_excel(writer, sheet_name='Sheet1', startcol=0)
    ucode_age5.to_excel(writer, sheet_name='Sheet1', startcol=3)
    writer.save()
    
# Copy and Modify a dataframe
age5 = df.copy()
diff = age5[age5['Age5'] == 'Total'].index.values + 1
leng = np.append(diff[0],diff[1:] - diff[:-1])
swap = ucode_stat['Ucode'].repeat(leng)
age5['Union'] = swap.values
total = age5.pivot(index='Union',columns='Age5',values='Total')
total.index = total.index.astype(int)

### Link Shapfile IDs to Union Statistics
We will change division codes of unions in Mymensingh (45) to Dhaka (30), then link to the data and rechange them later.

In [86]:
# Control Mymensingh (45) division
# - Mymensingh (45): Sherpur (4589), Jamalpur (4539), Mymensingh (4561), Netrakona (4572)
f45t30 = shape.loc[shape['ADM1_PCODE'] == '45', 'ADM4_PCODE'].str.replace('45','30',n=2)
shape.loc[shape['ADM1_PCODE'] == '45', 'ADM4_PCODE'] = f45t30.values

# Create an empty dataframe to be filled with data
group = total.columns
data = pd.DataFrame(index=shape['ADM4_PCODE'],columns=group).reset_index()
data.insert(1, 'Name1', shape['ADM4_EN'])
data.insert(2, 'Name2', np.NaN)
data.columns.name = ''
igroup = data.columns.get_indexer(group)

# Copy matched unions with UCODE
match = np.intersect1d(shape['ADM4_PCODE'].astype(int), total.index, return_indices=True)
data.iloc[match[1],igroup] = total.iloc[match[2],:].values
data.iloc[match[1],2] = ucode_stat.iloc[match[2],1].values

In [135]:
### CONTROL PAURASHAVA
# - Paurashava list
plist = data.loc[data['Name2'].isna(), ['ADM4_PCODE','Name1']]
plist.insert(0, 'ADM3_PCODE', plist['ADM4_PCODE'].str[:6].astype(int))
# - ADM3_PCODE List
adm3List = np.unique(plist['ADM3_PCODE'])

# Apply an Algorithm per ADM3 unit
for adm3 in adm3List:
    
# adm3 = 201553 # two
# adm3 = 201395 # one

    # Potential Paurasava(s) in the current ADM3_PCODE
    adm4List = plist[plist['ADM3_PCODE'] == adm3]

    # Paurasava codes ("99" means nomal union)
    targ = (stat['ADM3_PCODE'] == adm3) & (stat['Paurasava'].notna()) & (stat['Paurasava'] != 99) 
    pcode = np.unique(stat.loc[targ,'Paurasava'])
    table = stat.loc[targ,['Paurasava','Union','Ucode','Name']]   # Corresponding table

    # Find the row of Paurashava
    go = table['Ucode'].isna()
    if (adm4List.shape[0] == 1) & (go.sum() == 1):
        # (1) Single unmatched union & Single Paurashava in ADM3

        # Select the rows of Wards to be merged to Paurashava
        go_name = table.loc[go, 'Name'].values[0]
        go_ucode = table.loc[~go,'Ucode'].values
        # Insert Name and Merged data to Paurashava
        data.iloc[adm4List.index, 3:] = total[total.index.isin(go_ucode)].sum().values
        data.iloc[adm4List.index, 2] = go_name

    elif (adm4List.shape[0] > 1) & (go.sum() == 1):
        # (2) Multiple unmatched unions & Single Paurashava in ADM3
        if adm4List['Name1'].str.contains('paurashava', case=False).sum() == 1:
            # Only one Paurashava (Ignore others)
            adm4True = adm4List[adm4List['Name1'].str.contains('paurashava', case=False)]

            # Select the rows of Wards to be merged to Paurashava
            go_name = table.loc[go, 'Name'].values[0]
            go_ucode = table.loc[~go,'Ucode'].values
            # Insert Name and Merged data to Paurashava
            data.iloc[adm4True.index, 3:] = total[total.index.isin(go_ucode)].sum().values
            data.iloc[adm4True.index, 2] = go_name

            
            
            
            
# # Save
data.to_excel('./data/union/union_merged.xlsx')
print(data['Name2'].isna().sum())

62


In [134]:
go.sum()

0

In [66]:
targ = (stat['ADM3_PCODE'] == adm3) & (stat['Paurasava'].notna()) & (stat['Paurasava'] != 99) 
pcode = np.unique(stat.loc[targ,'Paurasava'])
len(pcode)
# table = stat.loc[targ,['Paurasava','Union','Ucode','Name']]
# table

1

In [60]:
go

1225     True
1226    False
1227    False
1228    False
1229    False
1230    False
1231    False
1232    False
1233    False
1234    False
Name: Ucode, dtype: bool

### Type of House and Tenancy

In [30]:
df = pd.read_excel('./data/union/Type of House and Tenancy.xls',
                   skiprows=11,header=0,index_col=0,skipfooter=8)



### Read Disaster-realted Statistics (BBS, 2015)

In [None]:
# ADD RERIGION



# Disaster-related Statistics (BBS, 2015)
Table 4: Distribution of household by main source of income and received remittance by division and district, 2014
Table 5: Distribution of main source oflighting and cooking fuel by division and district, 2014.
Table 18: Distribution of annual household income from agricultural products by division and district, 2014.
Table 20: Distribution of annual household income from non-agricultural sector by division and district, 2014.
Table 22 : Distribution of annual household income from other source by division and district, 2014.
Table 23: Distribution of Disaster affected times of household by division, 2009-'14.
Table 24: Distribution of affected households by disaster categories by division, 2009-'14.
Table 25: Distribution of affected household and disaster categories by division and district, 2009-'14.
Table 26: Distribution of household number of non working days due to last natural disaster by disaster categories and division, 2009-'14.
Table 27: Distribution of Affected Household got early warning by disaster categories and division, 2009-'14.
Table 28: Distribution of household got early warning by type of media, disaster categories and division, 2009-'14.
Table 29: Distribution of affected area and loss of major crops by type of disaster categories and division, 2009-'14    
Table 30: Distribution of affected area and value of loss and damage of minor crops by type of disaster categories and division, 2009-'14.
Table 31: Distribution of affected area and loss of major crops by division and district, 2009-'14.
Table 32: Distribution of affected area and loss of minor crops by division and district, 2009-'14.
Table 35: Distribution of area and damage value of land by disaster categories and division, 2009-'14.
Table 36: Distribution of area and damage value of land by division and district, 2009-'14.
Table 39: Distribution of population suffering from sickness and injury by sex, disaster categories and division, 2009-'14.
Table 40: Distribution of population suffering from sickness and injury by sex, age group and division, 2009-'14.
Table 41: Distribution of population suffering from sickness and injury by sex, division and district, 2009-'14.  
Table 42: Distribution of number of total children and sick children by division and district, 2009-'14.
Table 48: Distribution of Children did not attend to School Due to Natural Disaster by Division and District, 2009-'14.
Table 51: Distribution of disaster preparedness of household by disaster category and division, 2009-'14.
Table 52: Distribution of disaster preparedness of household by division and district, 2009-'14.
Table 53: Distribution of households having disaster precaution measures according to prior-disaster experience by disaster and division, 2009-'14.
Table 54: Distribution of household preparedness during disaster period untill normal situation by disaster and division, 2009-'14.
Table 55: Distribution of household preparedness during disaster period untill normal situation by division and district, 2009-'14.
Table 56: Distribution of household taken action (precaution) during disaster period until normal situation by disaster and division, 2009-'14.
Table 57: Distribution of population suffering from disease due to disaster by division and district, 2014.
Table 58: Distribution of population suffering from disease due to natural disaster by sex, age group, division and district, 2014.
Table 59: Distribution of Population Suffering from Disease Due to natural disaster by Type of Disease, Division and District, 2014.    
Table 60: Distribution of household members suffering from disease before disaster by division and district, 2009-'14.
Table 61: Distribution of household members suffering from disease during disaster period by division and district, 2009-'14
Table 62: Distribution of household members suffering from disease post disaster period by division and district, 2009-'14.
Table 63: Distribution of main probable cause of suffering from disease due to disaster by division and district, 2014.    
Table 64: Distribution of source of household drinking water during disaster period by division and district, 2009-'14.
Table 65: Distribution of other use of water (cooking, sewerage, cleanliness etc.) before disaster period by division and district, 2009-'14.
Table 66: Distribution of other use water (cooking, sewerage, cleanliness etc.) during disaster period by division and district, 2009-'14.
Table 67: Distribution of disease status due to insufficient drinking and other use of water supply during/after disaster period by division and district, 2009-'14.    
Table 68: Distribution of cause of main disease due to insufficient drinking and other use of water supply during/after disaster period by division and district, 2009-'14.
Table 71: Distribution of respondent's knowledge and perception about main impact of climate change by division and district, 2014.
Table 73: Distribution of Respondent's knowledge and perception about disaster management by division and district, 2014.
Table 74: Distribution of household received finantial/rehabiltation support from government/non-government agency during/post disaster period by division and district, 2009-'14
Table 75: Distribution of household received financial/rehabilitation support from different organization/ office during/post disaster period by division and district, 2009-'14.
Table 76: Distribution of households received loan from post disaster period by division and district, 2009-'14.
Table A1: Standard error calculate of total income and total damage and loss by divisiond/ istrict. 
    
    
    
    