# Assessing BBS data: Provided in WB folder

- This notebook is created to assess the quality and type of data provided by Bangladesh Bureau of Statistics. This specific notebook is used to assess data as provided in the form of "WB" directory.
- General notes: Esc-R (Raw), Esc-Y (Code), Esc-M (Markdown) 

## Getting necessary packages

In [137]:
import os
import glob
import tzlocal
import numpy as np
import pandas as pd
from dbfread import DBF
import matplotlib.pyplot as plt
import rpy2.robjects as robjects
from rpy2.robjects import packages
from rpy2.robjects import pandas2ri 

## Functions

In [182]:
def import_dbfs(path, pattern):
    """
    path: directory where dbf files are stored
    pattern: file extension pattern
    returns list of pandas dataframes
    """
    dbf_files = glob.glob(os.path.join(path, pattern))
    output = dict()
    for file in dbf_files:
        df = pd.DataFrame(iter(DBF(file)))
        output[file] = df
    return output


def import_sav(file):
    """
    file: full path to a SPSS file to be imported
    """
    foreign = packages.importr('foreign')
    pandas2ri.activate()
    df = foreign.read_spss(file, reencode=False)
    return pd.DataFrame(dict(zip(df.names, map(list,list(df)))))


def get_names(dbf_data):
    variable_names = list()
    tafsil_names = list()
    for num, names in enumerate(dbf_data):
        variable_names = variable_names + list(dbf_data[names].columns)
        tafsil_names = tafsil_names + ([names] * len(list(dbf_data[names].columns)))
    return pd.DataFrame(np.column_stack([variable_names, tafsil_names]), columns = ['variables', 'dataset'])

## Importing data

### Importing DBF files from WB directory

In [8]:
path = 'wb'
pattern = '*.dbf'
dbf_data = import_dbfs(path='wb', pattern='*.dbf')

### Importing SAV file from WB directory

In [9]:
spss_data = import_sav(file='wb/IMPS area 2012.sav')

# Assesing data quality, geo and variable availability 

- Using hierarchy from BSVS 

In [165]:
spss_data.head()
spss_data[spss_data['zl']==6.0].head()

Unnamed: 0,psu,psu_new,dv,Div_Name,zl,Zila_Name,upz,Upz_Name,un,Un_Name,psa,mza,Mza_Name,village,rmo,ea,hh
17,1.0,18.0,10.0,Barisal,6.0,Barisal,2.0,Agailjhara,13.0,Bagdha,,19.0,Ambala,1.0,1.0,4.0,120.0
18,128.0,19.0,10.0,Barisal,6.0,Barisal,2.0,Agailjhara,15.0,Bakal,,588.0,Manasi Phulasree,1.0,3.0,4.0,101.0
19,380.0,20.0,10.0,Barisal,6.0,Barisal,2.0,Agailjhara,79.0,Rajiher,,132.0,Basunda,1.0,1.0,1.0,119.0
20,,21.0,10.0,Barisal,6.0,Barisal,3.0,Babuganj,27.0,Chandpasha,,165.0,Chandipur,1.0,1.0,1.0,90.0
21,758.0,22.0,10.0,Barisal,6.0,Barisal,3.0,Babuganj,81.0,Rahmatpur,,803.0,Paschim Rahmatpur,1.0,1.0,5.0,120.0


- There are 2012 primary sampling units
- There are 7 division new new ones, but it can be mapped from dbf files
- There are 64 zilas
- There are 516 upazilas and 95 by tracking upz
- There are 96 unions and 1190 unions

In [11]:
len(spss_data['Mza_Name'].unique())
#spss_data[['Upz_Name', 'upz']].groupby(['Upz_Name', 'upz']).count()

1840

### Household card - h

In [117]:
print(f"Number of unique DIVISION: {len(dbf_data['wb/tafsil_2h.dbf']['DIV'].unique())}")
print(f"Number of unique NEW DIVISIONS: {len(dbf_data['wb/tafsil_2h.dbf']['DIVNEW'].unique())}")
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_2h.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_2h.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_2h.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_2h.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_2h.dbf data frima {dbf_data['wb/tafsil_2h.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_2h.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_2h.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_2h.dbf'].head(5)

Number of unique DIVISION: 7
Number of unique NEW DIVISIONS: 8
Number of unique ZILAS: 64
Number of unique UPAZILAS: 96
Number of unique UNIONS: 97
Number of unique MAUZA: 835
Shape of wb/tafsil_2h.dbf data frima (22087, 32)
Number of unique PSU_NO 2012
Number of unique UPZA 96


Unnamed: 0,D_R,ID,PSU_NO,DIV,DIVNEW,ZILA,UPZA,UNION,MAUZA,RMO,...,Q1_6N,Q1_6A,Q2_1,Q2_2,Q_3,Q_4,Q_5,Q_6,Q_7,WGT
0,,28.0,1.0,10.0,10.0,4,9.0,7.0,250.0,2,...,0.0,0.0,2.0,2.0,3.0,1.0,9.0,1.0,4.0,0.226689
1,,28.0,1.0,10.0,10.0,4,9.0,7.0,250.0,2,...,0.0,0.0,2.0,1.0,1.0,1.0,9.0,3.0,4.0,0.226689
2,,28.0,1.0,10.0,10.0,4,9.0,7.0,250.0,2,...,0.0,0.0,2.0,1.0,3.0,1.0,9.0,1.0,4.0,0.226689
3,,28.0,1.0,10.0,10.0,4,9.0,7.0,250.0,2,...,0.0,0.0,2.0,4.0,5.0,1.0,1.0,1.0,3.0,0.226689
4,,28.0,1.0,10.0,10.0,4,9.0,7.0,250.0,2,...,0.0,0.0,2.0,2.0,5.0,1.0,9.0,1.0,2.0,0.226689


### Household card - P

In [118]:
print(f"Number of unique DIVISION: {len(dbf_data['wb/tafsil_2p.dbf']['DIV'].unique())}")
print(f"Number of unique NEW DIVISIONS: {len(dbf_data['wb/tafsil_2p.dbf']['DIVNEW'].unique())}")
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_2p.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAs: {len(dbf_data['wb/tafsil_2p.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_2p.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_2p.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_2p.dbf data frima {dbf_data['wb/tafsil_2p.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_2p.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_2p.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_2p.dbf'].head(5)

Number of unique DIVISION: 7
Number of unique NEW DIVISIONS: 8
Number of unique ZILAS: 64
Number of unique UPAZILAs: 96
Number of unique UNIONS: 97
Number of unique MAUZA: 834
Shape of wb/tafsil_2p.dbf data frima (95791, 30)
Number of unique PSU_NO 2012
Number of unique UPZA 96


Unnamed: 0,D_R,ID,PSU_NO,DIV,DIVNEW,ZILA,DIST,UPZA,UNION,MAUZA,...,DUP,Q_15,Q_16,Q_17,Q_18,Q_19,Q_20,Q_21,Q_22,PPWEIGHT
0,,28.0,1.0,10.0,10.0,4,4.0,9.0,7.0,250.0,...,,0.0,3.0,1.0,2.0,1.0,2.0,0.0,24.0,0.2354
1,,28.0,1.0,10.0,10.0,4,4.0,9.0,7.0,250.0,...,,22.0,4.0,2.0,3.0,1.0,1.0,1.0,17.0,0.2354
2,,28.0,1.0,10.0,10.0,4,4.0,9.0,7.0,250.0,...,,36.0,10.0,2.0,3.0,1.0,1.0,1.0,17.0,0.2354
3,,28.0,1.0,10.0,10.0,4,4.0,9.0,7.0,250.0,...,,23.0,9.0,2.0,3.0,1.0,1.0,1.0,17.0,0.2354
4,,28.0,1.0,10.0,10.0,4,4.0,9.0,7.0,250.0,...,,0.0,0.0,2.0,3.0,2.0,2.0,0.0,0.0,0.2354


In [181]:
dbf_data['wb/tafsil_2p.dbf'].columns
df_pop = (dbf_data['wb/tafsil_2p.dbf']).copy()
df_pop['TOT_POP_UNW'] = df_pop['TOT_POP']/df_pop['PPWEIGHT']
stratified_sample = df_pop[['TOT_POP_UNW', 'ZILA']].groupby('ZILA').sum()
stratified_sample['TOTAL']= stratified_sample['TOT_POP_UNW'].sum()
stratified_sample['PERCENT']=stratified_sample.TOT_POP_UNW/stratified_sample.TOTAL
stratified_sample.head()

Unnamed: 0_level_0,TOT_POP_UNW,TOTAL,PERCENT
ZILA,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1292.973591,224645.089218,0.005756
3,943.473059,224645.089218,0.0042
4,1818.454492,224645.089218,0.008095
6,40501.916992,224645.089218,0.180293
9,4580.673805,224645.089218,0.020391


In [164]:
plt.hist(df_pop['TOT_POP'],bins=100)

stratified_sample.head()

Unnamed: 0_level_0,TOT_POP,TOTAL,PERCENT
ZILA,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1040.0,98164.0,1.059452
3,478.0,98164.0,0.48694
4,827.0,98164.0,0.842468
6,4928.0,98164.0,5.02017
9,1871.0,98164.0,1.905994


### Birth

In [119]:
print(f"Number of unique DISTRICTS: {len(dbf_data['wb/tafsil_3.dbf']['DIST'].unique())}")
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_3.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_3.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_3.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_3.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_3.dbf data frima {dbf_data['wb/tafsil_3.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_3.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_3.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_3.dbf'].head(5)

Number of unique DISTRICTS: 63
Number of unique ZILAS: 63
Number of unique UPAZILAS: 79
Number of unique UNIONS: 70
Number of unique MAUZA: 163
Shape of wb/tafsil_3.dbf data frima (179, 32)
Number of unique PSU_NO 179
Number of unique UPZA 79


Unnamed: 0,D_R,PSU_NO,ZILA,DIST,UPZA,UNION,MAUZA,NEW,RMO,AREA,...,Q_8,Q_9,Q_10,Q_11,Q_12,Q_13,Q_14,Q_15,Q_16,Q_17
0,,17.0,4,4.0,85.0,59.0,459.0,0.0,1,1.0,...,1.0,1.0,1.0,19.0,19.0,11.0,1.0,1.0,1.0,1.0
1,,28.0,6,6.0,10.0,21.0,374.0,0.0,1,1.0,...,1.0,1.0,1.0,29.0,29.0,5.0,1.0,2.0,2.0,1.0
2,,36.0,6,6.0,36.0,67.0,728.0,88.0,1,1.0,...,1.0,1.0,1.0,20.0,20.0,5.0,1.0,1.0,1.0,1.0
3,,47.0,6,6.0,51.0,5.0,295.0,88.0,2,2.0,...,1.0,1.0,1.0,32.0,32.0,5.0,1.0,2.0,2.0,1.0
4,,60.0,6,6.0,51.0,10.0,983.0,0.0,2,2.0,...,1.0,1.0,1.0,24.0,24.0,12.0,1.0,1.0,1.0,1.0


### Death

In [120]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_4.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_4.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_4.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_4.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_4.dbf data frima {dbf_data['wb/tafsil_4.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_4.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_4.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_4.dbf'].head(5)

Number of unique ZILAS: 64
Number of unique UPAZILAS: 91
Number of unique UNIONS: 89
Number of unique MAUZA: 375
Shape of wb/tafsil_4.dbf data frima (483, 24)
Number of unique PSU_NO 483
Number of unique UPZA 91


Unnamed: 0,D_R,PSU_NO,ZILA,UPZA,UNION,MAUZA,RMO,AREA,HH_NO,MOTH_LIN,...,Q_4,Q_5,Q_6D,Q_6M,Q_6Y,Q_7,Q_8D,Q_8M,Q_8Y,Q_9
0,,7.0,4,28.0,1.0,773.0,2,2.0,65.0,1.0,...,4.0,18.0,2.0,5.0,16.0,2.0,0.0,0.0,0.0,1.0
1,,10.0,4,28.0,57.0,281.0,1,1.0,63.0,1.0,...,1.0,23.0,20.0,1.0,16.0,2.0,0.0,0.0,0.0,1.0
2,,16.0,4,85.0,35.0,357.0,1,1.0,17.0,6.0,...,1.0,55.0,17.0,4.0,16.0,2.0,0.0,0.0,0.0,1.0
3,,19.0,6,2.0,15.0,588.0,1,1.0,33.0,4.0,...,9.0,58.0,15.0,1.0,16.0,2.0,0.0,0.0,0.0,1.0
4,,21.0,6,3.0,27.0,165.0,1,1.0,85.0,1.0,...,1.0,27.0,20.0,1.0,16.0,2.0,0.0,0.0,0.0,1.0


### Marriage

In [121]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_5.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_5.dbf']['UPZA'].unique())}")
print(f"Number of unique UNION: {len(dbf_data['wb/tafsil_5.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_5.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_5.dbf data frima {dbf_data['wb/tafsil_5.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_5.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_5.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_5.dbf'].head(5)

Number of unique ZILAS: 64
Number of unique UPAZILAS: 95
Number of unique UNION: 93
Number of unique MAUZA: 680
Shape of wb/tafsil_5.dbf data frima (1367, 17)
Number of unique PSU_NO 1224
Number of unique UPZA 95


Unnamed: 0,D_R,PSU_NO,ZILA,UPZA,UNION,MAUZA,RMO,HH_NO,MOTH_LIN,Q_1,Q_2,Q_4,Q_5,Q_6,Q_7,Q_8,Q_9
0,,4,4,9,94,98,1,21,6,2.0,1916-04-01,18.0,1.0,1.0,10.0,27.0,1.0
1,,5,4,19,71,535,1,4,5,2.0,1916-06-17,19.0,1.0,1.0,11.0,24.0,1.0
2,,9,4,28,28,469,1,110,3,1.0,1916-09-16,20.0,1.0,1.0,8.0,24.0,1.0
3,,10,4,28,57,281,1,56,2,2.0,1916-11-16,16.0,1.0,1.0,7.0,24.0,2.0
4,,11,4,28,85,919,1,83,6,2.0,1916-06-17,17.0,1.0,1.0,10.0,24.0,1.0


### Divorce/Separation

In [122]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_6.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_6.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_6.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_6.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_6.dbf data frima {dbf_data['wb/tafsil_6.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_6.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_6.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_6.dbf'].head(5)

Number of unique ZILAS: 54
Number of unique UPAZILAS: 68
Number of unique UNIONS: 65
Number of unique MAUZA: 123
Shape of wb/tafsil_6.dbf data frima (138, 22)
Number of unique PSU_NO 137
Number of unique UPZA 68


Unnamed: 0,D_R,PSU_NO,ZILA,UPZA,UNION,MAUZA,RMO,HH_NO,LIN_NO,Q_1,...,Q_4,Q_5,Q_6,Q_7,Q8_1,Q8_2,Q8_3,Q9_1,Q9_2,Q9_3
0,,16,4,85,35,357,1,23,6,1.0,...,1.0,11.0,10.0,2.0,17.0,0.0,0.0,6.0,0.0,0.0
1,,32,6,32,6,950,2,11,5,1.0,...,1.0,10.0,8.0,2.0,16.0,0.0,0.0,2.0,0.0,0.0
2,,39,6,51,2,89,2,95,6,1.0,...,1.0,4.0,11.0,1.0,21.0,27.0,0.0,4.0,0.0,0.0
3,,57,6,51,9,530,2,72,1,1.0,...,2.0,9.0,11.0,2.0,21.0,0.0,0.0,10.0,0.0,0.0
4,,84,6,51,24,649,2,79,6,1.0,...,1.0,0.0,10.0,1.0,14.0,15.0,0.0,1.0,0.0,0.0


### Out-migration

In [123]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_7.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_7.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_7.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_7.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_7.dbf data frima {dbf_data['wb/tafsil_7.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_7.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_7.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_7.dbf'].head(5)

Number of unique ZILAS: 64
Number of unique UPAZILAS: 95
Number of unique UNIONS: 95
Number of unique MAUZA: 806
Shape of wb/tafsil_7.dbf data frima (7858, 17)
Number of unique PSU_NO 1838
Number of unique UPZA 95


Unnamed: 0,D_R,PSU_NO,ZILA,UPZA,UNION,MAUZA,RMO,HH_NO,LINE_NO,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7M,Q_7Y,Q_8
0,,1,4,4,7,250,2,31,1,1.0,36.0,2.0,28.0,5.0,3.0,16.0,1.0
1,,2,4,9,39,85,1,6,1,1.0,34.0,2.0,4.0,5.0,10.0,16.0,1.0
2,,2,4,9,39,85,1,22,5,2.0,50.0,2.0,4.0,8.0,10.0,16.0,1.0
3,,2,4,9,39,85,1,41,5,2.0,22.0,2.0,4.0,2.0,10.0,16.0,1.0
4,,3,4,9,71,319,1,37,4,2.0,8.0,3.0,40.0,8.0,4.0,16.0,1.0


### In-migration

In [124]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_8.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_8.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_8.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_8.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_8.dbf data frima {dbf_data['wb/tafsil_8.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_8.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_8.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsil_8.dbf'].head(5)

Number of unique ZILAS: 64
Number of unique UPAZILAS: 95
Number of unique UNIONS: 96
Number of unique MAUZA: 785
Shape of wb/tafsil_8.dbf data frima (7483, 17)
Number of unique PSU_NO 1736
Number of unique UPZA 95


Unnamed: 0,D_R,PSU_NO,ZILA,UPZA,UNION,MAUZA,RMO,HH_NO,LINE_NO,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7M,Q_7Y,Q_8
0,,1,4,9,7,250,2,31,2,2.0,28.0,8.0,2.0,25.0,4.0,16.0,1.0
1,,1,4,9,7,250,2,80,2,2.0,20.0,8.0,2.0,11.0,7.0,16.0,1.0
2,,2,4,9,39,85,1,15,2,2.0,21.0,8.0,1.0,4.0,10.0,16.0,1.0
3,,2,4,9,39,85,1,66,3,1.0,8.0,8.0,1.0,4.0,12.0,16.0,1.0
4,,2,4,9,39,85,1,127,3,2.0,4.0,8.0,1.0,99.0,11.0,16.0,2.0


### Contraceptive

In [126]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsil_9.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsil_9.dbf']['UPZ'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsil_9.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsil_9.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsil_9.dbf data frima {dbf_data['wb/tafsil_9.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsil_9.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsil_9.dbf']['UPZ'].unique())}")
dbf_data['wb/tafsil_9.dbf'].head(5)

Number of unique ZILAS: 64
Number of unique UPAZILAS: 95
Number of unique UNIONS: 95
Number of unique MAUZA: 833
Shape of wb/tafsil_9.dbf data frima (19893, 25)
Number of unique PSU_NO 1978
Number of unique UPZA 95


Unnamed: 0,D_R,PSU_NO,DIV,ZILA,UPZ,UNION,MAUZA,RMO,HH_NO,LINE,...,Q10,Q11,Q12,Q13_1,Q13_2,Q13_3,Q14,Q15,Q16,Q17
0,,1.0,10.0,4.0,9.0,7.0,250.0,2,64.0,1.0,...,13.0,16.0,1.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0
1,,6.0,10.0,4.0,28.0,1.0,773.0,2,115.0,1.0,...,10.0,25.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
2,,10.0,10.0,4.0,28.0,57.0,281.0,1,35.0,9.0,...,8.0,27.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
3,,10.0,10.0,4.0,28.0,57.0,281.0,1,157.0,1.0,...,2.0,25.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
4,,12.0,10.0,4.0,47.0,7.0,151.0,2,6.0,1.0,...,5.0,25.0,1.0,2.0,0.0,0.0,2.0,0.0,0.0,0.0


### Disability

In [127]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsl_10.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsl_10.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsl_10.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsl_10.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsl_10.dbf data frima {dbf_data['wb/tafsl_10.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsl_10.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsl_10.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsl_10.dbf'].head(5)

Number of unique ZILAS: 64
Number of unique UPAZILAS: 94
Number of unique UNIONS: 93
Number of unique MAUZA: 534
Shape of wb/tafsl_10.dbf data frima (866, 16)
Number of unique PSU_NO 804
Number of unique UPZA 94


Unnamed: 0,D_R,PSU_NO,ZILA,UPZA,UNION,MAUZA,RMO,HH_NO,LIN_NO,Q_2,Q_3,Q_4Y,Q_4M,Q_5,Q_6,Q_7
0,,8,4,28,6,331,2,48,5,1.0,17.0,17.0,6.0,5.0,1.0,1.0
1,,10,4,28,57,281,1,68,4,1.0,13.0,13.0,1.0,6.0,1.0,1.0
2,,15,4,85,11,739,1,47,6,2.0,81.0,0.0,8.0,5.0,1.0,4.0
3,,17,4,85,59,459,1,90,1,1.0,53.0,43.0,2.0,1.0,2.0,2.0
4,,21,6,3,27,165,1,35,3,1.0,5.0,5.0,0.0,5.0,2.0,1.0


### HIV/AIDS

In [128]:
print(f"Number of unique ZILAS: {len(dbf_data['wb/tafsl_11.dbf']['ZILA'].unique())}")
print(f"Number of unique UPAZILAS: {len(dbf_data['wb/tafsl_11.dbf']['UPZA'].unique())}")
print(f"Number of unique UNIONS: {len(dbf_data['wb/tafsl_11.dbf']['UNION'].unique())}")
print(f"Number of unique MAUZA: {len(dbf_data['wb/tafsl_11.dbf']['MAUZA'].unique())}")
print(f"Shape of wb/tafsl_11.dbf data frima {dbf_data['wb/tafsl_11.dbf'].shape}")
print(f"Number of unique PSU_NO {len(dbf_data['wb/tafsl_11.dbf']['PSU_NO'].unique())}")
print(f"Number of unique UPZA {len(dbf_data['wb/tafsl_11.dbf']['UPZA'].unique())}")
dbf_data['wb/tafsl_11.dbf'].head(5)

Number of unique ZILAS: 64
Number of unique UPAZILAS: 96
Number of unique UNIONS: 96
Number of unique MAUZA: 837
Shape of wb/tafsl_11.dbf data frima (26308, 13)
Number of unique PSU_NO 1996
Number of unique UPZA 96


Unnamed: 0,PSU_NO,ZILA,UPZA,UNION,MAUZA,RMO,HH_NO,LINE_NO,Q_2,Q_3,Q_4_1,Q_4_2,Q_4_3
0,1.0,4,9,7,250,2,10.0,6,21.0,1.0,1.0,1.0,1.0
1,1.0,4,9,7,250,2,20.0,2,20.0,1.0,1.0,8.0,1.0
2,1.0,4,9,7,250,2,26.0,2,39.0,1.0,1.0,1.0,8.0
3,1.0,4,9,7,250,2,34.0,4,24.0,1.0,1.0,1.0,1.0
4,1.0,4,9,7,250,2,42.0,2,44.0,1.0,1.0,8.0,8.0


## Observations:

- The provided files for SVRS names from tafsil_2h.dbf and tafsil_2p.dbf to tafsil_9.dbf
- We assume that these files correspond to the following naming:
    - Tafsil_1: Household listing
    - Tafsil-2: Household card
    - Tafsil-3: Birth
    - Tafsil-4: Death
    - Tafsil-5: Marriage
    - Tafsil-6: Divorce/Separation
    - Tafsil-7: Out-migration
    - Tafsil-8: In-migration
    - Tafsil-9: Contraceptive 
    - Tafsil-10: Disability
- It is obvious that we did not recieve tafsil_1 and tafsil_10 which should refer to Household listings and Disability
- However, what we need is what exactly of the variables within the each of there datasets contains:


In [90]:
data = get_names(dbf_data)
data.to_csv('variables_datasets_SVRS.csv')