# Stigma against Opioid Use Disorder varies by Personal Use status

```{margin} 
**To follow the full analysis, click through the hidden analysis code below**
```

In [136]:
%matplotlib inline

In [137]:
# import packages
import os
import json
from pathlib import Path
import pandas as pd
import numpy as np
import pyreadstat
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from utils import *
from samplics.estimation import TaylorEstimator
pd.set_option('mode.chained_assignment', None)

### Data cleaning/pre-processing

In [138]:
# inputs
STATE_ABBREVIATIONS = "state_abbrev_mappings.json"
DATAPATH = "P:/3652/Common/HEAL/y3-task-c-collaborative-projects/jcoin-stigma/analyses/data/protocol2/"
DATA_FILE = DATAPATH+"3645_JCOIN_HEAL Initiative 2021_NORC_Jan2022_1.sav"
STRATA_FILE = DATAPATH+"VSTRAT_VPSU_Survey_2039_HEAL_MAIN_21_05_14.csv"

In [139]:
# import data and metadata (data dictionaries)
df, meta = pyreadstat.read_sav(DATA_FILE,apply_value_formats=True)


In [140]:

# lower-case column names 
df.columns = df.columns.str.lower()

In [141]:
vars_of_interest = ["caseid",'p_over','weight1','weight2','stigma_scale_score','expanded_10item_stigma','state','age4','racethnicity','educ5','personaluse_ever','familyuse_ever','personalcrimjust_ever','familycrimjust_ever']
categorical_vars = ['p_over','state','age4','racethnicity','educ5',
    'personaluse_ever','familyuse_ever',
    'personalcrimjust_ever','familycrimjust_ever']



In [142]:
# to enable more granular analysis of the stigma scale score(s) - e.g. parsing impact of current versus past OUD on stigma - bring in the individual ss questions

# ss_a_historywork - agree means low stigma/high val means low stigma
# ss_b_historymarry - agree means low stigma/high val means low stigma
# ss_c_currentwork - agree means low stigma/high val means low stigma
# ss_d_currentmarry - agree means low stigma/high val means low stigma

# -------- use the reverse coded for history and current work and marry vars - these ones are the only ss questions where agree means low stigma, using the reverse coded version brings them in line with the others for easier analysis

# ss_a_historywork_rev - already converted to numeric/high val means high stigma
# ss_b_historymarry_rev - already converted to numeric/high val means high stigma
# ss_c_currentwork_rev - already converted to numeric/high val means high stigma
# ss_d_currentmarry_rev - already converted to numeric/high val means high stigma

# ss_e_dangerous - agree means high stigma/high val means high stigma
# ss_f_ trust - agree means high stigma/high val means high stigma
# ss_history_steal - agree means high stigma/high val means high stigma
# ss_historyhighrisk - agree means high stigma/high val means high stigma
# ss_currentsteal - agree means high stigma/high val means high stigma
# ss_currenthighrisk - agree means high stigma/high val means high stigma

ss_6_past = ['ss_a_historywork_rev','ss_b_historymarry_rev']
ss_6_current = ['ss_c_currentwork_rev','ss_d_currentmarry_rev','ss_e_dangerous','ss_f_trust']

ss_6_full = ss_6_past + ss_6_current

ss_10_past = ['ss_historysteal', 'ss_historyhighrisk']
ss_10_current = ['ss_currentsteal', 'ss_currenthighrisk']

ss_10_full = ss_6_full + ss_10_past + ss_10_current

ss_past = ss_6_past + ss_10_past
ss_current = ss_6_current + ss_10_current





In [143]:
# to enable parsing of stigma by political affiliation, views on race/ethnicity, and experience of racial/ethnic discrimination bring in variables assessing those items

# political = ['pid1','pida','pidb','pidi','partyid7','partyid5']
political = ['partyid5']

race = ['race_whiteadvantage','race_rich']

# race_whiteadvanctage: [White people in the U.S. have certain advantages because of the color of their skin.] Do you disagree or agree with the following statements?
    # agree corresponds with recognition of white advantage; high vals = recognition of white advantage
    # reverse code this from likert vars so that high vals will now indicate lack of recognition of white advantage

# race_rich: [Everyone who works hard, no matter what race they are, has an equal chance to become rich.] Do you disagree or agree with the following statements?
    # agree corresponds with lack of recognition of white advantage; high vals = lack of recognition of white advantage
    # code this along with the likert vars where high vals = high stigma; in this case high vals = lack of recognition of white advantage

#discrimination_experience = ['times_atschool', 'times_hired', 'times_atwork', 'times_housing', 'times_medcare', 'times_restaurant', 'times_credit', 'times_street', 'times_police']

# possible approach: 
    # add count of dicrimination experiences (times) across categories
    # higher numbers mean more discrimination experience



In [144]:
likert_replace_vars = ['ss_e_dangerous','ss_f_trust','ss_historysteal', 'ss_historyhighrisk','ss_currentsteal', 'ss_currenthighrisk','race_rich']
likert_reverse_replace_vars = ['race_whiteadvantage']

In [145]:

#additional_vars_of_interest = ss_10_full + political + race + discrimination_experience
additional_vars_of_interest = ss_10_full + political + race 
all_vars_of_interest = vars_of_interest + additional_vars_of_interest

In [146]:
# narrow down the dataset to only a few interesting (and relatively clean, straightforward variables) - check for missingness and impute to fill in missing
sub_df_1 = df[vars_of_interest]
sub_df_2 = df[additional_vars_of_interest]

In [147]:
sub_df_2

Unnamed: 0,ss_a_historywork_rev,ss_b_historymarry_rev,ss_c_currentwork_rev,ss_d_currentmarry_rev,ss_e_dangerous,ss_f_trust,ss_historysteal,ss_historyhighrisk,ss_currentsteal,ss_currenthighrisk,partyid5,race_whiteadvantage,race_rich
0,3.0,4.0,4.0,5.0,Strongly agree,Somewhat agree,Somewhat disagree,Neither disagree nor agree,Strongly agree,Somewhat agree,Democrat,Somewhat agree,Neither disagree nor agree
1,4.0,3.0,5.0,4.0,Somewhat agree,Neither disagree nor agree,Somewhat disagree,Strongly agree,Somewhat agree,Strongly agree,Republican,Somewhat agree,Somewhat agree
2,2.0,2.0,4.0,4.0,Somewhat disagree,Neither disagree nor agree,Somewhat disagree,Somewhat disagree,Neither disagree nor agree,Somewhat agree,Republican,Somewhat disagree,Somewhat disagree
3,2.0,3.0,5.0,5.0,Somewhat agree,Somewhat agree,Neither disagree nor agree,Neither disagree nor agree,Strongly agree,Strongly agree,Republican,Somewhat disagree,Strongly agree
4,2.0,3.0,3.0,5.0,Neither disagree nor agree,Strongly disagree,Strongly disagree,Strongly disagree,Somewhat agree,Somewhat agree,Lean Democrat,Strongly disagree,Strongly agree
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6510,2.0,2.0,4.0,4.0,Strongly agree,Strongly agree,Neither disagree nor agree,Somewhat agree,Strongly agree,Strongly agree,Republican,Strongly disagree,Strongly agree
6511,4.0,2.0,5.0,4.0,Strongly agree,Strongly agree,Somewhat agree,Somewhat agree,Strongly agree,Strongly agree,Democrat,Somewhat agree,Somewhat agree
6512,3.0,5.0,3.0,5.0,Somewhat agree,Somewhat agree,Somewhat agree,Somewhat agree,Somewhat agree,Somewhat agree,Democrat,Somewhat agree,Somewhat agree
6513,2.0,1.0,2.0,4.0,Somewhat disagree,Somewhat agree,Somewhat agree,Somewhat disagree,Neither disagree nor agree,Neither disagree nor agree,Democrat,Neither disagree nor agree,Somewhat disagree


In [148]:
# get all var types
print("var info: ")
print(sub_df_2.info())

# all new vars (except the reverse coded individual stigma scale questions) are categorical

var info: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6515 entries, 0 to 6514
Data columns (total 13 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   ss_a_historywork_rev   6471 non-null   float64 
 1   ss_b_historymarry_rev  6459 non-null   float64 
 2   ss_c_currentwork_rev   6476 non-null   float64 
 3   ss_d_currentmarry_rev  6468 non-null   float64 
 4   ss_e_dangerous         6459 non-null   category
 5   ss_f_trust             6486 non-null   category
 6   ss_historysteal        6480 non-null   category
 7   ss_historyhighrisk     6485 non-null   category
 8   ss_currentsteal        6478 non-null   category
 9   ss_currenthighrisk     6481 non-null   category
 10  partyid5               6483 non-null   category
 11  race_whiteadvantage    6471 non-null   category
 12  race_rich              6481 non-null   category
dtypes: category(9), float64(4)
memory usage: 262.8 KB
None


In [149]:
# check for missing I

# check if missing values
print("missing values: ")
print(sub_df_2.isnull().sum())

# check if missing values
print("missing values: ")
print(sub_df_2.isna().sum())

# every var has at least some missing, except for partyid7

missing values: 
ss_a_historywork_rev     44
ss_b_historymarry_rev    56
ss_c_currentwork_rev     39
ss_d_currentmarry_rev    47
ss_e_dangerous           56
ss_f_trust               29
ss_historysteal          35
ss_historyhighrisk       30
ss_currentsteal          37
ss_currenthighrisk       34
partyid5                 32
race_whiteadvantage      44
race_rich                34
dtype: int64
missing values: 
ss_a_historywork_rev     44
ss_b_historymarry_rev    56
ss_c_currentwork_rev     39
ss_d_currentmarry_rev    47
ss_e_dangerous           56
ss_f_trust               29
ss_historysteal          35
ss_historyhighrisk       30
ss_currentsteal          37
ss_currenthighrisk       34
partyid5                 32
race_whiteadvantage      44
race_rich                34
dtype: int64


In [150]:
#print("ss single questions, categories: ")
#print(sub_df_2.ss_a_historywork.value_counts(dropna=False))

print("ss single questions, categories: ")
print(sub_df_2.ss_a_historywork_rev.value_counts(dropna=False))

print("ss single questions, categories: ")
print(sub_df_2.ss_e_dangerous.value_counts(dropna=False))

#print("party id 7 composite question, categories: ")
#print(sub_df_2.partyid7.value_counts(dropna=False))

print("party id 5 composite question, categories: ")
print(sub_df_2.partyid5.value_counts(dropna=False))

#print("discrimination times single questions, categories: ")
#print(sub_df_2.times_atschool.value_counts(dropna=False))

ss single questions, categories: 
2.0    2832
3.0    1481
1.0    1420
4.0     517
5.0     221
NaN      44
Name: ss_a_historywork_rev, dtype: int64
ss single questions, categories: 
Somewhat agree                2179
Neither disagree nor agree    2078
Somewhat disagree              981
Strongly agree                 811
Strongly disagree              410
NaN                             56
Name: ss_e_dangerous, dtype: int64
party id 5 composite question, categories: 
Democrat                       2466
Republican                     1548
Don't Lean/Independent/None     984
Lean Democrat                   796
Lean Republican                 689
NaN                              32
Name: partyid5, dtype: int64


In [151]:
likert_replacer = {'Strongly disagree': 1, 
                   'Somewhat disagree': 2,
                   'Neither disagree nor agree': 3,
                   'Somewhat agree': 4, 
                   'Strongly agree': 5}

likert_reverse_replacer = {'Strongly disagree': 5, 
                           'Somewhat disagree': 4,
                           'Neither disagree nor agree': 3,
                           'Somewhat agree': 2, 
                           'Strongly agree': 1}

sub_df_2[likert_replace_vars].replace(likert_replacer, inplace=True)
sub_df_2[likert_replace_vars] = sub_df_2[likert_replace_vars].astype("float")

sub_df_2[likert_reverse_replace_vars].replace(likert_reverse_replacer, inplace=True)
sub_df_2[likert_reverse_replace_vars] = sub_df_2[likert_reverse_replace_vars].astype("float")

#sub_df_2['partyid5_any_d'] = np.where(sub_df_2['partyid5'] in ["Democrat","Lean Democrat"], 1,0)
sub_df_2['partyid5_strong_d'] = np.where(sub_df_2['partyid5'] == "Democrat", 1,0)
#sub_df_2['partyid5_any_r'] = np.where(sub_df_2['partyid5'] in ["Republican","Lean Republican"], 1,0)
sub_df_2['partyid5_strong_r'] = np.where(sub_df_2['partyid5'] == "Republican", 1,0)

sub_df_2.drop(['partyid5'], axis=1, inplace=True)

sub_df_2



Unnamed: 0,ss_a_historywork_rev,ss_b_historymarry_rev,ss_c_currentwork_rev,ss_d_currentmarry_rev,ss_e_dangerous,ss_f_trust,ss_historysteal,ss_historyhighrisk,ss_currentsteal,ss_currenthighrisk,race_whiteadvantage,race_rich,partyid5_strong_d,partyid5_strong_r
0,3.0,4.0,4.0,5.0,5.0,4.0,2.0,3.0,5.0,4.0,2.0,3.0,1,0
1,4.0,3.0,5.0,4.0,4.0,3.0,2.0,5.0,4.0,5.0,2.0,4.0,0,1
2,2.0,2.0,4.0,4.0,2.0,3.0,2.0,2.0,3.0,4.0,4.0,2.0,0,1
3,2.0,3.0,5.0,5.0,4.0,4.0,3.0,3.0,5.0,5.0,4.0,5.0,0,1
4,2.0,3.0,3.0,5.0,3.0,1.0,1.0,1.0,4.0,4.0,5.0,5.0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6510,2.0,2.0,4.0,4.0,5.0,5.0,3.0,4.0,5.0,5.0,5.0,5.0,0,1
6511,4.0,2.0,5.0,4.0,5.0,5.0,4.0,4.0,5.0,5.0,2.0,4.0,1,0
6512,3.0,5.0,3.0,5.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,4.0,1,0
6513,2.0,1.0,2.0,4.0,2.0,4.0,4.0,2.0,3.0,3.0,3.0,2.0,1,0


In [152]:
# check for missing I

# check if missing values
print("missing values: ")
print(sub_df_2.isnull().sum())

# check if missing values
print("missing values: ")
print(sub_df_2.isna().sum())

missing values: 
ss_a_historywork_rev     44
ss_b_historymarry_rev    56
ss_c_currentwork_rev     39
ss_d_currentmarry_rev    47
ss_e_dangerous           56
ss_f_trust               29
ss_historysteal          35
ss_historyhighrisk       30
ss_currentsteal          37
ss_currenthighrisk       34
race_whiteadvantage      44
race_rich                34
partyid5_strong_d         0
partyid5_strong_r         0
dtype: int64
missing values: 
ss_a_historywork_rev     44
ss_b_historymarry_rev    56
ss_c_currentwork_rev     39
ss_d_currentmarry_rev    47
ss_e_dangerous           56
ss_f_trust               29
ss_historysteal          35
ss_historyhighrisk       30
ss_currentsteal          37
ss_currenthighrisk       34
race_whiteadvantage      44
race_rich                34
partyid5_strong_d         0
partyid5_strong_r         0
dtype: int64


In [153]:
# get all var types
print("var info: ")
print(sub_df_2.info())

var info: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6515 entries, 0 to 6514
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   ss_a_historywork_rev   6471 non-null   float64
 1   ss_b_historymarry_rev  6459 non-null   float64
 2   ss_c_currentwork_rev   6476 non-null   float64
 3   ss_d_currentmarry_rev  6468 non-null   float64
 4   ss_e_dangerous         6459 non-null   float64
 5   ss_f_trust             6486 non-null   float64
 6   ss_historysteal        6480 non-null   float64
 7   ss_historyhighrisk     6485 non-null   float64
 8   ss_currentsteal        6478 non-null   float64
 9   ss_currenthighrisk     6481 non-null   float64
 10  race_whiteadvantage    6471 non-null   float64
 11  race_rich              6481 non-null   float64
 12  partyid5_strong_d      6515 non-null   int32  
 13  partyid5_strong_r      6515 non-null   int32  
dtypes: float64(12), int32(2)
memory usage: 661.8 

In [154]:
mode_impute_vars = ss_10_full + race
mode_impute_vars


['ss_a_historywork_rev',
 'ss_b_historymarry_rev',
 'ss_c_currentwork_rev',
 'ss_d_currentmarry_rev',
 'ss_e_dangerous',
 'ss_f_trust',
 'ss_historysteal',
 'ss_historyhighrisk',
 'ss_currentsteal',
 'ss_currenthighrisk',
 'race_whiteadvantage',
 'race_rich']

In [155]:


#df['salary'] = df['salary'].fillna(df['salary'].mode()[0])

sub_df_2[mode_impute_vars] = sub_df_2[mode_impute_vars].fillna(sub_df_2[mode_impute_vars].mode().iloc[0])

# check if missing values
print("missing values: ")
print(sub_df_2.isnull().sum())

missing values: 
ss_a_historywork_rev     0
ss_b_historymarry_rev    0
ss_c_currentwork_rev     0
ss_d_currentmarry_rev    0
ss_e_dangerous           0
ss_f_trust               0
ss_historysteal          0
ss_historyhighrisk       0
ss_currentsteal          0
ss_currenthighrisk       0
race_whiteadvantage      0
race_rich                0
partyid5_strong_d        0
partyid5_strong_r        0
dtype: int64


In [156]:
# get all var types
print("var info: ")
print(sub_df_2.info())

var info: 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6515 entries, 0 to 6514
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   ss_a_historywork_rev   6515 non-null   float64
 1   ss_b_historymarry_rev  6515 non-null   float64
 2   ss_c_currentwork_rev   6515 non-null   float64
 3   ss_d_currentmarry_rev  6515 non-null   float64
 4   ss_e_dangerous         6515 non-null   float64
 5   ss_f_trust             6515 non-null   float64
 6   ss_historysteal        6515 non-null   float64
 7   ss_historyhighrisk     6515 non-null   float64
 8   ss_currentsteal        6515 non-null   float64
 9   ss_currenthighrisk     6515 non-null   float64
 10  race_whiteadvantage    6515 non-null   float64
 11  race_rich              6515 non-null   float64
 12  partyid5_strong_d      6515 non-null   int32  
 13  partyid5_strong_r      6515 non-null   int32  
dtypes: float64(12), int32(2)
memory usage: 661.8 

In [None]:
# clean up some of the categoricals to be consistently coded
sub_df_1.familycrimjust_ever.replace({0:"No",1:"Yes"},inplace=True)
sub_df_1.familyuse_ever.replace({" No":"No"},inplace=True)
sub_df_1.personalcrimjust_ever.replace({"Yes, ever arrested or incarcerated":"Yes", "No, never arrested or incarcerated":"No"},inplace=True)


In [None]:
# check for missing 

print(sub_df_1.isnull().sum())


In [None]:

# impute any missing - confirm missing eliminated

# # impute missing stigma scale score vals with median, impute missing personaluse_ever with mode, "No"

# replace missing values of personaluse_ever with mode value of 'No'
sub_df_1.personaluse_ever.fillna('No',inplace=True)
#print(sub_df_1.isnull().sum())

sub_df_1.familyuse_ever.fillna('No',inplace=True)
#print(sub_df_1.isnull().sum())

sub_df_1.personalcrimjust_ever.fillna('No',inplace=True)
#print(sub_df_1.isnull().sum())

sub_df_1.familycrimjust_ever.fillna('No',inplace=True)
#print(sub_df_1.isnull().sum())


# impute missing stigma scale score values as the median score 
#sub_df_1['stigma_scale_score'].fillna(sub_df_1.groupby('time-point')['stigma_scale_score'].transform('median'),inplace=True)
sub_df_1['stigma_scale_score'].fillna(sub_df_1['stigma_scale_score'].median(),inplace=True)
sub_df_1['expanded_10item_stigma'].fillna(sub_df_1['expanded_10item_stigma'].median(),inplace=True)

print(sub_df_1.isnull().sum())

In [None]:
# add df column with state 2 letter code
# https://pythonfix.com/code/us-states-abbrev.py/
# state name to two letter code dictionary
us_state_to_abbrev = json.loads(Path(STATE_ABBREVIATIONS).read_text())
state_cd = sub_df_1.state.replace(us_state_to_abbrev)
sub_df_1.insert(6,"state_cd",state_cd,True)

In [None]:
# Add jcoin information
jcoin_json = json.loads(Path("jcoin_states.json").read_text())

jcoin_df = (pd.DataFrame(jcoin_json)
    .assign(hub_types=lambda df:df["hub"]+"("+df["type"]+")")
    .groupby('states')
    # make a list of the name and type of hub/study and how many hubs are in that state
    .agg({"hub_types":lambda s:",".join(s),"hub":"count"})
    .reset_index()
    .rename(
        columns={"states":"state_cd",
        "hub":"jcoin_hub_count",
        "hub_types":"jcoin_hub_types"})
)

jcoin_df["jcoin_flag"] = 1

In [None]:
jcoin_df.head()

In [None]:
sub_df_1 = sub_df_1.merge(jcoin_df,on="state_cd",how="left")
sub_df_1["jcoin_hub_types"].fillna("not JCOIN",inplace=True)
sub_df_1["jcoin_hub_count"].fillna(0,inplace=True)
sub_df_1["jcoin_flag"].fillna(0,inplace=True)
sub_df_1["is_jcoin_hub"] = np.where(sub_df_1["jcoin_hub_types"]=="not JCOIN","No","Yes")
sub_df_1.head()

In [None]:
# o	all: n/weighted n in genpop
# o	all:n/weighted n in as oversample
# o	all:n/weighted n in as oversample + gen pop(in oversampled state)
# o	per state: n/weighted n in as oversample
# o	missingness; imputation procedures

In [None]:
pop_counts_by_sampletypexstate = (
    sub_df_1
    .convert_dtypes()
    .assign(jcoin_hub_count=lambda df: df.jcoin_hub_count.astype(str))
    .groupby(['state_cd','p_over'])
    ["stigma_scale_score"]
    .count()
    .unstack(['p_over'])
)
pop_counts_by_sampletypexstate["total"] = pop_counts_by_sampletypexstate.sum(axis=1)

In [None]:
# merge jcoin info
pop_counts_by_sampletypexstate = pop_counts_by_sampletypexstate\
    .merge(jcoin_df,on='state_cd',how='left')\
    .sort_values("total",ascending=False)\
    .assign(
        jcoin_hub_count=lambda df:df.jcoin_hub_count.fillna(0).astype(int),
        jcoin_flag=lambda df:df.jcoin_flag.fillna(0).astype(int),
        jcoin_hub_types=lambda df:(
            np.where(df.jcoin_hub_types.isna() & df["AS oversample"]>0,"non JCOIN comparison",
                np.where(df.jcoin_hub_types.isna(),"non JCOIN gen pop",df.jcoin_hub_types)
        )
        
        ))

In [None]:
print("N for Oversample, General and Total By State")
pop_counts_by_sampletypexstate.reset_index()

pop_counts_by_sampletypexstate.to_csv("state_counts.csv")


In [None]:
print("N All")
pop_counts_by_sampletypexstate.sum().to_frame().T

### National estimates

### Add and correct strata and PSUs necessary for variance estimation 

In [None]:
strata_df = pd.read_csv(STRATA_FILE)
strata_df.columns = strata_df.columns.str.lower()

# collapse strata containing only 1 PSU
onepsu = (
    strata_df[["vstrat32","vpsu32"]]
    .drop_duplicates()
    .groupby("vstrat32")
    .count()
    .squeeze()
    .loc[lambda s:s==1]
    .index
)
strata_df["vstrat32_corrected"] = strata_df["vstrat32"].where(cond=lambda s:~s.isin(onepsu),other=-1)
# rename PSUs so no duplicates
strata_df["vpsu32_corrected"] = strata_df.groupby(["vstrat32_corrected","vpsu32"]).ngroup()

In [None]:
# join strata into dataset
sub_df_1 = sub_df_1.set_index("caseid").join(strata_df.set_index('caseid'))

### Data overview (counts and missing values)

In [None]:
# o	table/bar: 
# 	point estimate + bootstrap CI; based on gen pop and weight1 
# o	table/bar: 
# 	point estimate + bootstrap CI; based on as oversample + gen pop and weight 2
# o	histogram: 
# 	distribution; based on as oversample + gen pop and weight 2
# 	distribution by personaluse_ever, familyuse_ever, personalcrimjust_ever, familycrimjust_ever; based on as oversample + gen pop and weight 2


In [None]:
sample_estimator = TaylorEstimator("mean")
sample_estimator.estimate(
    y=sub_df_1["stigma_scale_score"],
    samp_weight=sub_df_1["weight2"],
    stratum=sub_df_1["vstrat32_corrected"],
    psu=sub_df_1["vpsu32_corrected"],
)
sample_estimator.to_dataframe()

In [None]:
state_estimator = TaylorEstimator("mean")
state_estimator.estimate(
    y=sub_df_1["stigma_scale_score"],
    samp_weight=sub_df_1["weight2"],
    stratum=sub_df_1["vstrat32_corrected"],
    psu=sub_df_1["vpsu32_corrected"],
    domain=sub_df_1["state_cd"]
)
state_mean_estimates = state_estimator.to_dataframe().rename(columns={"_domain":"state_cd","_estimate":"mean"})


In [None]:
state_mean_estimates

In [None]:
# quick map of state level estimates

fig = px.choropleth(state_mean_estimates,
    locations="state_cd",
    locationmode="USA-states",
    scope="usa",
    color="mean",
    color_continuous_scale="Viridis_r")

fig.show()


In [None]:
# merge jcoin info
state_mean_estimates = state_mean_estimates\
    .merge(jcoin_df,on='state_cd',how='left')\
    .sort_values("mean",ascending=False)\
    .assign(
        jcoin_hub_count=lambda df:df.jcoin_hub_count.fillna(0).astype(int),
        jcoin_flag=lambda df:df.jcoin_flag.fillna(0).astype(int))

state_mean_estimates["jcoin_color"] = state_mean_estimates.jcoin_flag
state_mean_estimates.jcoin_color.replace({0:"blue",1:"red"},inplace=True)


state_mean_estimates.head()

```{margin} 
**To go to the data/study page on the HEAL Data Platform, follow this link:** my link
```

```{margin} 
**To go to an interactive analytic cloud workspace with the analysis code and data loaded, follow this link:** my link
```

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Sodales ut eu sem integer vitae justo eget. Pellentesque dignissim enim sit amet venenatis urna cursus. Sed faucibus turpis in eu mi bibendum. Scelerisque felis imperdiet proin fermentum leo. Volutpat est velit egestas dui id ornare arcu. Quis lectus nulla at volutpat diam ut venenatis tellus. Tellus pellentesque eu tincidunt tortor aliquam nulla facilisi cras. Pellentesque adipiscing commodo elit at imperdiet dui. 
<br>

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Sodales ut eu sem integer vitae justo eget. Pellentesque dignissim enim sit amet venenatis urna cursus. Sed faucibus turpis in eu mi bibendum. Scelerisque felis imperdiet proin fermentum leo. Volutpat est velit egestas dui id ornare arcu. Quis lectus nulla at volutpat diam ut venenatis tellus. Tellus pellentesque eu tincidunt tortor aliquam nulla facilisi cras. Pellentesque adipiscing commodo elit at imperdiet dui. 
<br><br>
In hac habitasse platea dictumst quisque sagittis purus. Libero volutpat sed cras ornare. Sit amet consectetur adipiscing elit pellentesque habitant morbi tristique senectus. Auctor augue mauris augue neque gravida in fermentum et. Amet mattis vulputate enim nulla aliquet porttitor. Proin sed libero enim sed faucibus turpis in eu. Morbi tristique senectus et netus et malesuada. Feugiat sed lectus vestibulum mattis ullamcorper.

**Data Citation** 
<br>
Harold Pollack, Johnathon Schneider, Bruce Taylor. JCOIN 026: Brief Stigma Survey. Chicago, IL: Center for Translational Data Science HEAL Data Platform (distributor) via Center for Translational Data Science JCOIN Data Commons (repository & distributor), 2022-04-08. (HEAL Data Platform branded doi goes here)
<br>
**Brief Article Citation** 
<br>
What format should this be? 