# Social networks and the decision to insure
_The social insure data contains data from Jai, De Janvry, and Saoudlet (2015) on a two-round social network-based experiment on getting farmers to get insurance. See the paper for more details._
_American Economic Journal: Applied Economics 2015, 7(2): 81–108 http://dx.doi.org/10.1257/app.20130442
Social_

## 1. Introduction

- __Financial decisions__ involve complexities that individuals frequently have difficulty understanding based on their own education, information, and experience.

- __Social networks__ can help people make these complex decisions: people can learn about product benefits from their friends, be influenced by their friends’ choices, and/or learn from their friends’ experiences with the product.

- __Causal effect__ refers to the relationship between a cause and its effect, specifically in the context of cause-and-effect relationships in scientific or statistical analysis. It is the measurable impact or influence that a particular cause has on a specific effect.

- __Causal inference__ is an important concept in various fields, including social sciences, economics, medicine, and data analysis. Understanding causal effects helps researchers and analysts make informed decisions, predict outcomes, and develop effective interventions or policies based on the identified causal relationships.

In 2009, the Chinese government collaborated with PICC to introduce the first rice production insurance policy in selected pilot counties. The policy aimed to ensure food security and protect farmers from adverse weather conditions. Farmers paid a subsidized premium deducted from their annual rice production subsidy. The insurance covered natural disasters, offering payouts based on the loss in yield, with higher losses resulting in larger payouts. The policy covered 25% of gross income or 50% of production costs. The insurance product was considered favorable for farmers due to the low post-subsidy price and limited moral hazard. The program expanded rapidly in subsequent years, covering all main rice-producing counties in China.

## 2. Understand the data

We designed a randomized experiment based on the introduction of a new weather insurance policy for rice farmers offered by the People’s Insurance Company of China (PICC), China’s largest insurance provider. Implemented jointly with PICC, the experiment involved 5,300 households across 185 villages of rural China. 

In [2]:
!pip install causalinference numpy scipy causaldata black plotly pycausalimpact
!pip install --upgrade statsmodels



In [5]:
import causaldata, causalinference
help(causaldata)
dataset = causaldata.social_insure.load_pandas()

Help on package causaldata:

NAME
    causaldata

PACKAGE CONTENTS
    Mroz (package)
    abortion (package)
    adult_services (package)
    auto (package)
    avocado (package)
    black_politicians (package)
    castle (package)
    close_college (package)
    close_elections_lmb (package)
    cps_mixtape (package)
    credit_cards (package)
    gapminder (package)
    google_stock (package)
    gov_transfers (package)
    gov_transfers_density (package)
    greek_data (package)
    mortgages (package)
    nhefs (package)
    nhefs_codebook (package)
    nhefs_complete (package)
    nsw_mixtape (package)
    organ_donations (package)
    restaurant_inspections (package)
    ri (package)
    scorecard (package)
    snow (package)
    social_insure (package)
    texas (package)
    thornton_hiv (package)
    titanic (package)
    training_bias_reduction (package)
    training_example (package)
    yule (package)

DATA
    __all__ = ['auto', 'black_politicians', 'gapminder', 'google_st

| Attribute       | Describe                                                                                                |
|-----------------|---------------------------------------------------------------------------------------------------------|
| address         | Natural village                                                                                         |
|  village        | Administrative village                                                                                  |
| takeup_survey   | Whether farmer ended up purchasing insurance. (1 = yes)                                                 |
| age             | Household Characteristics - Age                                                                         |
| agpop           | Household Characteristics - Household Size                                                              |
| ricearea_2010   | Area of Rice Production                                                                                 |
| disaster_prob   | Perceived Probability of Disasters Next Year                                                            |
| male            | Gender of Household Head (1 = male)                                                                     |
| default         | "Default option" in experimental format assigned to. (1 = default is to buy, 0 = default is to not buy) |
| intensive       | Whether or not was assigned to "intensive" experimental session (1 = yes)                               |
| risk_averse     | Risk aversion measurement                                                                             |
| literacy        | 1 = literate, 0 = illiterate                                                                          |
| pre_takeup_rate | Takeup rate prior to experiment                                                                         |

In [21]:
import pandas as pd
df_origin = pd.DataFrame(dataset.data)
df_origin.head()

Unnamed: 0,address,village,takeup_survey,age,agpop,ricearea_2010,disaster_prob,male,default,intensive,risk_averse,literacy,pre_takeup_rate
0,beilian2,beilian,0,62.0,2.0,10.0,30.0,1.0,1,0,0.0,0.0,0.071429
1,beilian2,beilian,1,63.0,5.0,15.0,100.0,1.0,1,0,0.0,1.0,0.071429
2,beilian2,beilian,1,44.0,3.0,7.5,20.0,1.0,1,1,0.0,1.0,0.071429
3,beilian2,beilian,1,76.0,6.0,,50.0,1.0,1,1,0.6,1.0,0.071429
4,beilian2,beilian,0,52.0,6.0,11.0,0.0,1.0,1,1,0.2,1.0,0.071429


In [25]:
# Data without missing values

df_without_na = df_origin.dropna()

# Data with missing values
df_with_na = df_origin[df_origin.isna().any(axis=1)]

display(
    "df_without_na", df_without_na.describe(),
    "df_with_na", df_with_na.describe(),
    "origin", df_origin.describe()
)


'df_without_na'

Unnamed: 0,takeup_survey,age,agpop,ricearea_2010,disaster_prob,male,default,intensive,risk_averse,literacy,pre_takeup_rate
count,1378.0,1378.0,1378.0,1378.0,1378.0,1378.0,1378.0,1378.0,1378.0,1378.0,1378.0
mean,0.460813,51.410015,4.891872,13.488549,33.057765,0.902032,0.482583,0.487663,0.174456,0.793904,0.42947
std,0.498643,12.079239,2.073257,22.035801,16.631186,0.297379,0.499878,0.500029,0.306005,0.404647,0.241123
min,0.0,18.0,1.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,42.0,4.0,5.0,20.0,1.0,0.0,0.0,0.0,1.0,0.222222
50%,0.0,50.0,5.0,10.0,30.0,1.0,0.0,0.0,0.0,1.0,0.421053
75%,1.0,60.0,6.0,16.0,50.0,1.0,1.0,1.0,0.2,1.0,0.569196
max,1.0,85.0,19.0,650.0,100.0,1.0,1.0,1.0,1.0,1.0,1.0


'df_with_na'

Unnamed: 0,takeup_survey,age,agpop,ricearea_2010,disaster_prob,male,default,intensive,risk_averse,literacy,pre_takeup_rate
count,32.0,28.0,26.0,23.0,32.0,29.0,32.0,32.0,32.0,11.0,32.0
mean,0.59375,54.964286,5.384615,9.886957,35.0,0.931034,0.5625,0.65625,0.225,0.727273,0.395843
std,0.498991,15.007362,2.228487,6.947747,13.678332,0.257881,0.504016,0.482559,0.320282,0.467099,0.245623
min,0.0,33.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,40.75,4.0,5.0,20.0,1.0,0.0,0.0,0.0,0.5,0.154135
50%,1.0,54.5,5.0,10.0,40.0,1.0,1.0,1.0,0.0,1.0,0.3875
75%,1.0,67.0,6.0,14.0,50.0,1.0,1.0,1.0,0.45,1.0,0.566667
max,1.0,86.0,12.0,30.0,50.0,1.0,1.0,1.0,1.0,1.0,1.0


'origin'

Unnamed: 0,takeup_survey,age,agpop,ricearea_2010,disaster_prob,male,default,intensive,risk_averse,literacy,pre_takeup_rate
count,1410.0,1406.0,1404.0,1401.0,1410.0,1407.0,1410.0,1410.0,1410.0,1389.0,1410.0
mean,0.46383,51.480797,4.900997,13.429422,33.101844,0.90263,0.484397,0.491489,0.175603,0.793377,0.428707
std,0.498867,12.148048,2.07645,21.876182,16.568483,0.296567,0.499934,0.500105,0.30631,0.405029,0.241189
min,0.0,18.0,1.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,42.0,4.0,5.0,20.0,1.0,0.0,0.0,0.0,1.0,0.222222
50%,0.0,51.0,5.0,10.0,30.0,1.0,0.0,0.0,0.0,1.0,0.421053
75%,1.0,60.0,6.0,16.0,50.0,1.0,1.0,1.0,0.2,1.0,0.569196
max,1.0,86.0,19.0,650.0,100.0,1.0,1.0,1.0,1.0,1.0,1.0


## 3. Formulate a causal question
The experiment assumes that improving farmers’ understanding of insurance reinforces take-up.
1. Do networks matter because they __diffuse knowledge among farmers__ about how insurance works and what are its expected benefits? 
2. Or is it because __farmers learn about each other’s decisions__? 

\* a fact that we verify later.

![Figure_1.1.Experimental_Design_Within-Village_Household-Level_Randomization](../assets/001_Social_networks_insurance/Figure_1.1.Experimental_Design_Within-Village_Household-Level_Randomization.jpg)

![Figure_1.2.Experimental_Design_Village_Level_Randomization.jpg](../assets/001_Social_networks_insurance/Figure_1.2.Experimental_Design_Village_Level_Randomization.jpg)