# Respondent-driven sampling 

Hidden or hard-to-reach populations occur when sampling procedures are nonexistent and public knowledge about about the members is imprecise because of social stigma and criminalization. It is hard to estimate probabilities in this cenario by common demographic census given that the target population produces low response rates. Sex workers, homeless people, men who have sex with men and drug users are examples. Respondet-driven sampling is a procedure with a dual system of structured incentives in a chain-referral way. Starting with some individuals (seeds) of the target population, each actor recruits other in their network through coupons. Other methods include snowball sampling, key important sampling and target sampling. 

In this notebook, we explore the features of this procedure through a dataset of activist refugees from Syria. 

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
import seaborn as sns
sns.set()

## Data

Here we present the data and explain each title 

`id`: sequential case number. 

`recruit.id`: number that identifies each individual and it is used to link each participants to one another. The seeds have 1 digit and each new wave have a digit more. The individual 1 receives the coupons: 11, 12 and 13, the individual 22 receives 221, 222, 223, and so. 

`coupon.1, .2, .3`: the coupon received by the individual. 

`degree`: network's degree informed. 

The other columns are answers to demographic and activism questions developed by the research.VI Colóquio de Matemática da Região Centro-Oeste
17 a 21 de maio de 2021


Programação


In [21]:
data = pd.read_csv("../data/rds_replication_data.tab", sep = '\t')
data.head()

Unnamed: 0,id,recruit.id,coupon.1,coupon.2,coupon.3,degree,sex,age,education,humanitarian,...,parties,mostrecent,employstatus,timestatus,formality,cooperation,cooperationlocation,syriapre2011,syriapost2011,trust
0,1.0,1.0,11.0,12.0,13.0,30.0,2.0,28.0,6.0,1.0,...,3.0,4.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0
1,2.0,2.0,21.0,22.0,23.0,6.0,1.0,30.0,7.0,1.0,...,3.0,1.0,1.0,2.0,2.0,1.0,3.0,2.0,2.0,2.0
2,3.0,4.0,41.0,42.0,43.0,25.0,1.0,28.0,4.0,2.0,...,3.0,4.0,2.0,1.0,2.0,1.0,3.0,2.0,1.0,1.0
3,4.0,12.0,121.0,122.0,123.0,3.0,1.0,33.0,6.0,1.0,...,2.0,8.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,1.0
4,5.0,5.0,51.0,52.0,53.0,30.0,1.0,31.0,6.0,1.0,...,3.0,4.0,2.0,1.0,2.0,1.0,3.0,1.0,1.0,2.0


In [35]:
data.iloc[-20:]

Unnamed: 0,id,recruit.id,coupon.1,coupon.2,coupon.3,degree,sex,age,education,humanitarian,...,parties,mostrecent,employstatus,timestatus,formality,cooperation,cooperationlocation,syriapre2011,syriapost2011,trust
156,157.0,52111120000.0,521111200000.0,521111200000.0,521111200000.0,15.0,1.0,25.0,6.0,1.0,...,3.0,1.0,2.0,2.0,1.0,1.0,3.0,2.0,1.0,2.0
157,158.0,521111200000000.0,5211112000000000.0,5211112000000000.0,5211112000000000.0,2.0,1.0,24.0,4.0,2.0,...,3.0,7.0,2.0,2.0,1.0,1.0,2.0,2.0,1.0,2.0
158,159.0,52111120000000.0,521111200000000.0,521111200000000.0,521111200000000.0,10.0,1.0,22.0,4.0,2.0,...,3.0,7.0,2.0,2.0,2.0,1.0,3.0,2.0,1.0,2.0
159,160.0,52111120000.0,521111200000.0,521111200000.0,521111200000.0,25.0,1.0,19.0,4.0,1.0,...,3.0,1.0,3.0,2.0,2.0,1.0,2.0,2.0,2.0,2.0
160,161.0,5221212000000000.0,5.221212e+16,5.221212e+16,5.221212e+16,6.0,2.0,38.0,6.0,1.0,...,3.0,3.0,2.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0
161,162.0,522121200000000.0,5221212000000000.0,5221212000000000.0,5221212000000000.0,30.0,1.0,25.0,6.0,1.0,...,3.0,1.0,2.0,1.0,2.0,1.0,3.0,1.0,2.0,2.0
162,163.0,5211112000000000.0,5.211112e+16,5.211112e+16,5.211112e+16,5.0,1.0,23.0,4.0,1.0,...,3.0,10.0,2.0,2.0,2.0,1.0,3.0,2.0,1.0,2.0
163,164.0,5221212000000000.0,5.221212e+16,5.221212e+16,5.221212e+16,5.0,1.0,40.0,4.0,2.0,...,3.0,3.0,2.0,2.0,2.0,1.0,2.0,2.0,2.0,2.0
164,165.0,5.211112e+16,5.211112e+17,5.211112e+17,5.211112e+17,4.0,1.0,27.0,4.0,1.0,...,3.0,1.0,2.0,2.0,2.0,1.0,2.0,2.0,1.0,2.0
165,166.0,5.211112e+17,5.211112e+18,5.211112e+18,5.211112e+18,6.0,1.0,29.0,4.0,1.0,...,3.0,1.0,2.0,1.0,2.0,1.0,2.0,1.0,1.0,2.0


In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176 entries, 0 to 175
Data columns (total 27 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   176 non-null    float64
 1   recruit.id           176 non-null    float64
 2   coupon.1             176 non-null    float64
 3   coupon.2             176 non-null    float64
 4   coupon.3             176 non-null    float64
 5   degree               176 non-null    float64
 6   sex                  176 non-null    float64
 7   age                  176 non-null    float64
 8   education            176 non-null    float64
 9   humanitarian         176 non-null    float64
 10  advocacy             176 non-null    float64
 11  development          176 non-null    float64
 12  media                176 non-null    float64
 13  protest              174 non-null    float64
 14  armed                174 non-null    float64
 15  fundraising          176 non-null    flo

## Network 

Here we see the network and the demographic variables associated with each node. 

In [17]:
nodes = data["coupon.1"]

In [19]:
nodes

0      1.100000e+01
1      2.100000e+01
2      4.100000e+01
3      1.210000e+02
4      5.100000e+01
           ...     
171    5.211112e+21
172    5.211112e+21
173    5.211112e+19
174    5.211112e+22
175    5.211112e+22
Name: coupon.1, Length: 176, dtype: float64

# Referências 

[1] Khoury, Rana B., 2020, "Replication Data for: Hard-to-Survey Populations and Respondent-Driven Sampling: Expanding the Political Science Toolbox", https://doi.org/10.7910/DVN/XKOVUN, Harvard Dataverse, V1, UNF:6:aCejo0iCW+kK0AZVtMP2FA== [fileUNF]