# Recommending Assistance to the needy based coping strategies

**Objective:** To predict the likelihood of a single or a set of coping strategies selected by a respondent based on the shocks that the respondents have faced and recommend the appropriate assistance for early planning and decision-making

## Setting up antigranular

In [30]:
!pip install antigranular



In [31]:
import antigranular as ag
session = ag.login("Iu++I502WDr1co4pPxwlHUgdHYfdsXw7", "Nm8hkS5PoyVTtaWS2GOv7CPcS9PFYR97xTm2RhDNSNV7bntFAfCj4r3iA0sqgPci", competition = "UN Datathon PETs Track")

Connected to Antigranular server session id: 4c92c332-052f-4c41-a42d-8f07d65dbc9d, the session will time out if idle for 60 minutes
Cell magic '%%ag' registered successfully, use `%%ag` in a notebook cell to execute your python code on Antigranular private python server
🚀 Everything's set up and ready to roll!


### Privacy Odometer

In [32]:
session.privacy_odometer()

{'total_epsilon_used': 0.011999999999999985,
 'total_delta_used': 0.0,
 'library_costs': {'op_pandas': {'total_delta': 0,
   'total_epsilon': 0.011999999999999985,
   'total_requests': 7}},
 'dataset_costs': {'undata_cs': {'delta': 0, 'eps': 0.011999999999999985}}}

## Loading shock data

In [34]:
%%ag
shock_data = load_dataset("undata_shock")
ag_print(shock_data.columns)
ag_print(shock_data.metadata)
ag_print(shock_data.dtypes)

Index(['objectid', 'shock_noshock', 'shock_sicknessordeathofhh',
       'shock_lostemplorwork', 'shock_otherintrahhshock',
       'shock_higherfoodprices', 'shock_higherfuelprices', 'shock_mvtrestrict',
       'shock_othereconomicshock', 'shock_pestoutbreak', 'shock_plantdisease',
       'shock_animaldisease', 'shock_napasture', 'shock_othercropandlivests',
       'shock_coldtemporhail', 'shock_flood', 'shock_hurricane',
       'shock_drought', 'shock_earthquake', 'shock_landslides',
       'shock_firenatural', 'shock_othernathazard', 'shock_violenceinsecconf',
       'shock_theftofprodassets', 'shock_firemanmade',
       'shock_othermanmadehazard', 'shock_dk', 'shock_ref'],
      dtype='object')
{'objectid': (3496, 239871), 'shock_noshock': (0.0, 1.0), 'shock_sicknessordeathofhh': (0.0, 1.0), 'shock_lostemplorwork': (0.0, 1.0), 'shock_otherintrahhshock': (0.0, 1.0), 'shock_higherfoodprices': (0.0, 1.0), 'shock_higherfuelprices': (0.0, 1.0), 'shock_mvtrestrict': (0.0, 1.0), 'shock_othe

## Verifying multiple shocks can happen at the same time

- We are assuming that multiple shocks can occur simultaneously for a household. However to move forward with our problem, we need to verify that the problem actually exists.

- So we find the row-wise sum and if the sum > 1, it means that multiple shocks have occured to one household respondent.

In [35]:
# Get data with binary shock variables only
%%ag
columns = list(shock_data.columns[2:-2])
shock_only = shock_data[columns]
ag_print(shock_only.columns)

Index(['shock_sicknessordeathofhh', 'shock_lostemplorwork',
       'shock_otherintrahhshock', 'shock_higherfoodprices',
       'shock_higherfuelprices', 'shock_mvtrestrict',
       'shock_othereconomicshock', 'shock_pestoutbreak', 'shock_plantdisease',
       'shock_animaldisease', 'shock_napasture', 'shock_othercropandlivests',
       'shock_coldtemporhail', 'shock_flood', 'shock_hurricane',
       'shock_drought', 'shock_earthquake', 'shock_landslides',
       'shock_firenatural', 'shock_othernathazard', 'shock_violenceinsecconf',
       'shock_theftofprodassets', 'shock_firemanmade',
       'shock_othermanmadehazard'],
      dtype='object')



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._df[col].clip(lower=self._metadata[col][0], upper=self._metadata[col][1], inplace=True)



In [36]:
%%ag

# Row-wise sum
total_shocks = shock_only.sum(axis=1, eps=0.5)

# Find if more than 1 shock can happen in each sample (row) and find the sum those samples
total_shocks_gt_1 = (total_shocks > 1).sum(eps=0.5)
export(total_shocks_gt_1, 'total_shocks_gt_1')

Setting up exported variable in local environment: total_shocks_gt_1


In [37]:
total_shocks_gt_1

33533.404196544856

Verified.

## Filtering only shock data
- Select only relevant columns from shock data

In [38]:
%%ag
shock_yes_data = shock_data.where((shock_data['shock_noshock'] == 1) & (shock_data['shock_dk'] == 0) & (shock_data['shock_ref'] == 0))[[
       'objectid', 'shock_sicknessordeathofhh',
       'shock_lostemplorwork', 'shock_otherintrahhshock',
       'shock_higherfoodprices', 'shock_higherfuelprices', 'shock_mvtrestrict',
       'shock_othereconomicshock', 'shock_pestoutbreak', 'shock_plantdisease',
       'shock_animaldisease', 'shock_napasture', 'shock_othercropandlivests',
       'shock_coldtemporhail', 'shock_flood', 'shock_hurricane',
       'shock_drought', 'shock_earthquake', 'shock_landslides',
       'shock_firenatural', 'shock_othernathazard', 'shock_violenceinsecconf',
       'shock_theftofprodassets', 'shock_firemanmade',
       'shock_othermanmadehazard'
]].dropna()
ag_print(shock_yes_data.columns)
ag_print(shock_yes_data.metadata)

Index(['objectid', 'shock_sicknessordeathofhh', 'shock_lostemplorwork',
       'shock_otherintrahhshock', 'shock_higherfoodprices',
       'shock_higherfuelprices', 'shock_mvtrestrict',
       'shock_othereconomicshock', 'shock_pestoutbreak', 'shock_plantdisease',
       'shock_animaldisease', 'shock_napasture', 'shock_othercropandlivests',
       'shock_coldtemporhail', 'shock_flood', 'shock_hurricane',
       'shock_drought', 'shock_earthquake', 'shock_landslides',
       'shock_firenatural', 'shock_othernathazard', 'shock_violenceinsecconf',
       'shock_theftofprodassets', 'shock_firemanmade',
       'shock_othermanmadehazard'],
      dtype='object')
{'objectid': (3496, 239871), 'shock_sicknessordeathofhh': (0.0, 1.0), 'shock_lostemplorwork': (0.0, 1.0), 'shock_otherintrahhshock': (0.0, 1.0), 'shock_higherfoodprices': (0.0, 1.0), 'shock_higherfuelprices': (0.0, 1.0), 'shock_mvtrestrict': (0.0, 1.0), 'shock_othereconomicshock': (0.0, 1.0), 'shock_pestoutbreak': (0.0, 1.0), 'shock_p

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._df[col].clip(lower=self._metadata[col][0], upper=self._metadata[col][1], inplace=True)



We also verify that all the columns are binary 0 or 1 variables

## Load coping strategies data

In [39]:
%%ag
cs_data = load_dataset("undata_cs")
ag_print(cs_data.columns)
ag_print(cs_data.metadata)
ag_print(cs_data.dtypes)

Index(['objectid', 'cs_hh_assets', 'cs_spent_savings', 'cs_sold_more_animals',
       'cs_eat_elsewhere', 'cs_borrowed_or_helped', 'cs_credit',
       'cs_borrowed_money', 'cs_changed_school', 'cs_sold_prod_assets',
       'cs_no_school', 'cs_reduced_health_exp', 'cs_harv_immature_crops',
       'cs_consume_seed_stock', 'cs_decrease_input_exp', 'cs_sold_house',
       'cs_begged', 'cs_illegal', 'cs_sold_last_female', 'cs_hh_migration'],
      dtype='object')
{'objectid': (3496, 239871), 'cs_hh_assets': (0.0, 999.0), 'cs_spent_savings': (0.0, 999.0), 'cs_sold_more_animals': (1.0, 999.0), 'cs_eat_elsewhere': (0.0, 999.0), 'cs_borrowed_or_helped': (0.0, 999.0), 'cs_credit': (0.0, 999.0), 'cs_borrowed_money': (1.0, 999.0), 'cs_changed_school': (0.0, 999.0), 'cs_sold_prod_assets': (1.0, 999.0), 'cs_no_school': (0.0, 999.0), 'cs_reduced_health_exp': (0.0, 999.0), 'cs_harv_immature_crops': (0.0, 999.0), 'cs_consume_seed_stock': (0.0, 999.0), 'cs_decrease_input_exp': (0.0, 999.0), 'cs_sold_hou

We can see that the coping strategy columns are not binary like shocks.

### Filtering only the coping strategies that the respondents used
- Select the samples who used atleast one coping strategy

In [59]:
%%ag
cs_filtered = cs_data.where((cs_data['cs_hh_assets'] == 1) |
                            (cs_data['cs_spent_savings'] == 1) |
                            (cs_data['cs_sold_more_animals'] == 1) |
                            (cs_data['cs_eat_elsewhere'] == 1) |
                            (cs_data['cs_borrowed_or_helped'] == 1) |
                            (cs_data['cs_credit'] == 1) |
                            (cs_data['cs_borrowed_money'] == 1) |
                            (cs_data['cs_changed_school'] == 1) |
                            (cs_data['cs_sold_prod_assets'] == 1) |
                            (cs_data['cs_reduced_health_exp'] == 1) |
                            (cs_data['cs_harv_immature_crops'] == 1) |
                            (cs_data['cs_consume_seed_stock'] == 1) |
                            (cs_data['cs_decrease_input_exp'] == 1) |
                            (cs_data['cs_sold_house'] == 1) |
                            (cs_data['cs_begged'] == 1) |
                            (cs_data['cs_illegal'] == 1) |
                            (cs_data['cs_sold_last_female'] == 1) |
                            (cs_data['cs_hh_migration'] == 1) |
                            (cs_data['cs_no_school'] == 1) ).dropna()
ag_print(cs_filtered.columns)

Index(['objectid', 'cs_hh_assets', 'cs_spent_savings', 'cs_sold_more_animals',
       'cs_eat_elsewhere', 'cs_borrowed_or_helped', 'cs_credit',
       'cs_borrowed_money', 'cs_changed_school', 'cs_sold_prod_assets',
       'cs_no_school', 'cs_reduced_health_exp', 'cs_harv_immature_crops',
       'cs_consume_seed_stock', 'cs_decrease_input_exp', 'cs_sold_house',
       'cs_begged', 'cs_illegal', 'cs_sold_last_female', 'cs_hh_migration'],
      dtype='object')



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._df[col].clip(lower=self._metadata[col][0], upper=self._metadata[col][1], inplace=True)



Apart from 1, we also have 2, 3, 4, 888, and 999 as response, all of which basically means they did not use the coping strategy.

In [60]:
%%ag
def make_zero(x:int) -> int:
  if x == 1:
    return 1
  else:
    return 0

cs_filtered[list(cs_filtered.columns[1:])] = cs_filtered[list(cs_filtered.columns[1:])].applymap(lambda x: x if x==1 else 0, eps=0.1)
ag_print(cs_filtered.metadata)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._df[col].clip(lower=self._metadata[col][0], upper=self._metadata[col][1], inplace=True)



{'objectid': (3496, 239871), 'cs_hh_assets': (0.0, 1.0), 'cs_spent_savings': (1.0, 2.0), 'cs_sold_more_animals': (0.0, 1.0), 'cs_eat_elsewhere': (0.0, 1.0), 'cs_borrowed_or_helped': (1.0, 2.0), 'cs_credit': (0.0, 1.0), 'cs_borrowed_money': (1.0, 2.0), 'cs_changed_school': (0.0, 1.0), 'cs_sold_prod_assets': (0.0, 1.0), 'cs_no_school': (0.0, 1.0), 'cs_reduced_health_exp': (0.0, 1.0), 'cs_harv_immature_crops': (0.0, 1.0), 'cs_consume_seed_stock': (0.0, 1.0), 'cs_decrease_input_exp': (0.0, 1.0), 'cs_sold_house': (0.0, 1.0), 'cs_begged': (0.0, 1.0), 'cs_illegal': (0.0, 1.0), 'cs_sold_last_female': (0.0, 1.0), 'cs_hh_migration': (0.0, 1.0)}



We have now changed the coping strategies to binary 0 or 1 as well

## Checking if multiple coping strategies are used or not

In [61]:
%%ag
total_cs = cs_filtered.sum(axis=1, eps=0.01)

# Find if more than 1 shock can happen in each sample (row) and find the sum of the > 1 shocks samples
total_cs_gt_1 = (total_cs > 1).sum(eps=0.01)
export(total_cs_gt_1, 'total_cs_gt_1')

Setting up exported variable in local environment: total_cs_gt_1


In [62]:
total_cs_gt_1

7209.0

Verified

## Merge shock and cs data on object info
- Merging the shocks and coping strategies to create our initial dataset

In [66]:
%%ag
import op_pandas
shock_cs = op_pandas.merge(shock_yes_data, cs_filtered, on="objectid", how="inner")
ag_print(shock_cs.columns)

Index(['objectid', 'shock_sicknessordeathofhh', 'shock_lostemplorwork',
       'shock_otherintrahhshock', 'shock_higherfoodprices',
       'shock_higherfuelprices', 'shock_mvtrestrict',
       'shock_othereconomicshock', 'shock_pestoutbreak', 'shock_plantdisease',
       'shock_animaldisease', 'shock_napasture', 'shock_othercropandlivests',
       'shock_coldtemporhail', 'shock_flood', 'shock_hurricane',
       'shock_drought', 'shock_earthquake', 'shock_landslides',
       'shock_firenatural', 'shock_othernathazard', 'shock_violenceinsecconf',
       'shock_theftofprodassets', 'shock_firemanmade',
       'shock_othermanmadehazard', 'cs_hh_assets', 'cs_spent_savings',
       'cs_sold_more_animals', 'cs_eat_elsewhere', 'cs_borrowed_or_helped',
       'cs_credit', 'cs_borrowed_money', 'cs_changed_school',
       'cs_sold_prod_assets', 'cs_no_school', 'cs_reduced_health_exp',
       'cs_harv_immature_crops', 'cs_consume_seed_stock',
       'cs_decrease_input_exp', 'cs_sold_house', 'cs_beg

## Modeling

- We try to build a Naive Bayes model that can take in the shock events as the features and predict the probability of the different coping strategies

- Based on the predicted probabilities we can make proper plan and roadmap to provide appropriate assistance to the households

## Label Encoding the coping strategies to use as our Target
- Current coping strategies are one hot encoded
- To analyze and model the data further we need to label encode them. For instance creating visualization and building Naive Bayes Model requires label encoded targets

We first try replacing the 1 in the dataset with corresponding strategy name. For that we tried running a for loop for each column and modify that column

In [70]:
%%ag
def return_label(x:int | float, column:str) -> str:
  if x==1:
    return column
  else:
    return ''

In [75]:
%%ag
for column in shock_cs.columns:
  if column.startswith("cs_"):
    col_name = str(column.copy())
    shock_cs[column] = shock_cs[col_name].map(lambda x: return_label(x, col_name), eps=0.01)

UsageError: Calling a non whitelisted Global method: col_name



Since this did not woerk we ran the function for each column manually

In [None]:
# %%ag
# def return_label(x:int | float, column:str) -> str:
#   if x==1:
#     return column
#   else:
#     return ''
# shock_cs['cs_hh_migration_l'] = shock_cs['cs_hh_migration'].map(return_label, eps=0.01)

In [None]:
%%ag
ag_print(shock_cs.columns)

Index(['objectid', 'shock_sicknessordeathofhh', 'shock_lostemplorwork',
       'shock_otherintrahhshock', 'shock_higherfoodprices',
       'shock_higherfuelprices', 'shock_mvtrestrict',
       'shock_othereconomicshock', 'shock_pestoutbreak', 'shock_plantdisease',
       'shock_animaldisease', 'shock_napasture', 'shock_othercropandlivests',
       'shock_coldtemporhail', 'shock_flood', 'shock_hurricane',
       'shock_drought', 'shock_earthquake', 'shock_landslides',
       'shock_firenatural', 'shock_othernathazard', 'shock_violenceinsecconf',
       'shock_theftofprodassets', 'shock_firemanmade',
       'shock_othermanmadehazard', 'cs_hh_assets', 'cs_spent_savings',
       'cs_sold_more_animals', 'cs_eat_elsewhere', 'cs_borrowed_or_helped',
       'cs_credit', 'cs_borrowed_money', 'cs_changed_school',
       'cs_sold_prod_assets', 'cs_no_school', 'cs_reduced_health_exp',
       'cs_harv_immature_crops', 'cs_consume_seed_stock',
       'cs_decrease_input_exp', 'cs_sold_house', 'cs_beg

In [None]:
# %%ag
# label_components = [shock_cs[column].astype(str) for column in list(shock_cs.columns) if column.endswith('_l')]
# shock_cs['label'] = '|'.join(label_components)

# error occured here which should not have happened, its error on the oblivious itself

#column wise string concatenation doesnot happen in pandas

---





In [None]:
%%ag
shock_cs['label'] = shock_cs['cs_hh_assets_l']+ shock_cs['cs_spent_savings_l'] + shock_cs['cs_sold_more_animals_l']+ shock_cs['cs_eat_elsewhere_l']+ shock_cs['cs_borrowed_or_helped_l']+ shock_cs['cs_credit_l']+ shock_cs['cs_borrowed_money_l']+ shock_cs['cs_changed_school_l']+ shock_cs['cs_sold_prod_assets_l']+ shock_cs['cs_no_school_l']+ shock_cs['cs_reduced_health_exp_l']+ shock_cs['cs_harv_immature_crops_l']+ shock_cs['cs_consume_seed_stock_l']+ shock_cs['cs_decrease_input_exp_l']+ shock_cs['cs_sold_house_l']+ shock_cs['cs_begged_l']+ shock_cs['cs_illegal_l']+ shock_cs['cs_sold_last_female_l']+ shock_cs['cs_hh_migration_l']

[0;31mValueError[0m: Tuple should be of size 2 denoting (lower,upper)


In [None]:
#due to this we were not able to create a lable for prediction

In [None]:
%%ag
shock_cs['label'] = shock_cs['cs_hh_assets_l'].dt.astype(str)+ shock_cs['cs_spent_savings_l'].astype(str) + shock_cs['cs_sold_more_animals_l'].astype(str)+ shock_cs['cs_eat_elsewhere_l'].astype(str)+ shock_cs['cs_borrowed_or_helped_l'].astype(str)+ shock_cs['cs_credit_l'].astype(str)+ shock_cs['cs_borrowed_money_l'].astype(str)+ shock_cs['cs_changed_school_l'].astype(str)+ shock_cs['cs_sold_prod_assets_l'].astype(str)+ shock_cs['cs_no_school_l'].astype(str)+ shock_cs['cs_reduced_health_exp_l'].astype(str)+ shock_cs['cs_harv_immature_crops_l'].astype(str)+ shock_cs['cs_consume_seed_stock_l'].astype(str)+ shock_cs['cs_decrease_input_exp_l'].astype(str)+ shock_cs['cs_sold_house_l'].astype(str)+ shock_cs['cs_begged_l'].astype(str)+ shock_cs['cs_illegal_l'].astype(str)+ shock_cs['cs_sold_last_female_l'].astype(str)+ shock_cs['cs_hh_migration_l'].astype(str)

[0;31mKeyError[0m: <class 'str'>


In [None]:
def return_label(column:str, x:int | float) -> str:
  if x==1:
    return column
  else:
    return ''

for column in df.columns[range]:
  df[f'{column}_ed'] = df[column].map(return_label,eps=)

In [None]:
%%ag
shock_cs['label'] = shock_cs['cs_hh_assets'].apply

[0;31mNameError[0m: name 'shock_' is not defined
