# BioData Catalyst Powered by PIC-SURE: Identify stigmatizing variables

The purpose of this notebook is to identify stigmatizing variables in [BioData Catalyst Powered by PIC-SURE](https://picsure.biodatacatalyst.nhlbi.nih.gov/). Specifically, stigmatizing variables will be identified in PIC-SURE Authorized Access and removed for PIC-SURE Open Access.

For more information about stigmatizing variables, please view the [README.md](https://github.com/hms-dbmi/biodata_catalyst_stigmatizing_variables#biodata_catalyst_stigmatizing_variables).

### Prerequisites
This notebook assumes knowledge of the BioData Catalyst Powered by PIC-SURE platform and API. For more information about the API, please visit the [Access to Data using PIC-SURE GitHub repository](https://github.com/hms-dbmi/Access-to-Data-using-PIC-SURE-API).

Developer login credentials or access to all data in PIC-SURE Authorized Access is also required to ensure all variables are reviewed. 

### Install packages

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import re
from collections import Counter
from pprint import pprint
import json
from shutil import copyfile

In [2]:
import sys
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-client.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-adapter-hpds.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-biodatacatalyst-python-adapter-hpds.git

Collecting git+https://github.com/hms-dbmi/pic-sure-python-client.git
  Cloning https://github.com/hms-dbmi/pic-sure-python-client.git to /tmp/pip-req-build-2t8qlrf6
  Running command git clone -q https://github.com/hms-dbmi/pic-sure-python-client.git /tmp/pip-req-build-2t8qlrf6
  Resolved https://github.com/hms-dbmi/pic-sure-python-client.git to commit aabcc6574eede2dc3de410c6c75f7f77ea18d23c
Building wheels for collected packages: PicSureClient
  Building wheel for PicSureClient (setup.py) ... [?25ldone
[?25h  Created wheel for PicSureClient: filename=PicSureClient-0.1.0-py2.py3-none-any.whl size=10326 sha256=e9ca87d5e4afa12f3ee6fab7fab1fd70f559d828f2b6d9db637ca926039697fb
  Stored in directory: /tmp/pip-ephem-wheel-cache-22uezk_u/wheels/31/ef/21/e362bba8de04e0072fafec9f77bd1abdf7e166213d27e98729
Successfully built PicSureClient
Installing collected packages: PicSureClient
Successfully installed PicSureClient-0.1.0
You should consider upgrading via the '/home/ec2-user/anaconda3/env

In [3]:
import PicSureClient
import PicSureBdcAdapter
from python_lib.utils import get_multiIndex_variablesDict, joining_variablesDict_onCol
from python_lib.stig_utils import check_simplified_name, regex_filter, manual_check

### Connect to PIC-SURE

In [7]:
#PICSURE_network_URL = "https://picsure.biodatacatalyst.nhlbi.nih.gov/picsure"
PICSURE_network_URL = "https://biodatacatalyst.integration.hms.harvard.edu/picsure"
resource_id = "02e23f52-f354-4e8b-992c-d37c8b9ba140" # Be sure to use Authorized Access resource ID
token_file = "token.txt" # Be sure to use developer token to get all variables

In [8]:
with open(token_file, "r") as f:
    my_token = f.read()

In [9]:
client = PicSureClient.Client()
connection = client.connect(PICSURE_network_URL, my_token, True)
adapter = PicSureBdcAdapter.Adapter(connection)
resource = adapter.useResource(resource_id)

[38;5;91;40m

|        certificates to be acceptable for connections.  This may be useful for           |
|        working in a development environment or on systems that host public              |
|        data.  BEST SECURITY PRACTICES ARE THAT IF YOU ARE WORKING WITH SENSITIVE        |
|        DATA THEN ALL SSL CERTS BY THOSE EVIRONMENTS SHOULD NOT BE SELF-SIGNED.          |
[39;49m
+--------------------------------------+------------------------------------------------------+
|  Resource UUID                       |  Resource Name                                       |
+--------------------------------------+------------------------------------------------------+
| 02e23f52-f354-4e8b-992c-d37c8b9ba140 |                                                      |
| 70c837be-5ffc-11eb-ae93-0242ac130002 |                                                      |
+--------------------------------------+------------------------------------------------------+


### Save all variables in PIC-SURE Authorized Access to DataFrame

In [10]:
fullVariableDict = resource.dictionary().find().DataFrame()
#fullVariableDict
multiindex = get_multiIndex_variablesDict(fullVariableDict)

In [11]:
fullVariableDict.head()

Unnamed: 0_level_0,min,categorical,patientCount,observationCount,max,HpdsDataType,categoryValues,description
KEY,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"\Multi-Ethnic Study of Atherosclerosis (MESA) SHARe ( phs000209 )\MESA Lung Ancillary Study Exam 3 Dataset: This dataset provides Lung CT scan data for MESA Classic participants enrolled in the MESA Lung Ancillary Study.\RIGHT LUNG, LOWER: THE INTERCEPT OF THE LINE AT THE ANKLE\",0.0,False,2347.0,2347.0,350.0,phenotypes,,
\Coronary Artery Risk Development in Young Adults (CARDIA) ( phs000285 )\CMP DATE EXACT/APPROXIMATE?\,-316.0,False,302.0,302.0,5e-324,phenotypes,,
\Framingham Cohort ( phs000007 )\Tests\ECG\TREATMENT FOR VARICOSE VEINS (LEFT)\,,True,5518.0,6631.0,,phenotypes,"[MAYBE, NO, YES]",
\Framingham Cohort ( phs000007 )\Lab Work\Blood\Hematologic\SYSTOLIC MURMUR: BASE GRADE\,,True,4051.0,4051.0,,phenotypes,"[GRADE 1, GRADE 2, GRADE 3, GRADE 4, NO SOUND ...",
\Framingham Cohort ( phs000007 )\Lab Work\Blood\Hematologic\IF YES TO G3A143 OR G3A144: HOW MANY YEARS HAVE YOU BROUGHT PHLEGM UP FROM YOUR CHEST ON MOST DAYS?\,,True,3747.0,3747.0,,phenotypes,[NONE],


### Identify stigmatizing variables using `simplified_name`

There are two functions to identify stigmatizing variables: `check_simplified_name` and `regex_filter_out`. 

`check_simplified_name` selects all variables from the `multiindex` dataframe where the `simplified_name` contains any of the terms in the given list. It also takes an optional argument `exclude_vars` that removes variable with specified `simplified_name` variables.

For example, 

`check_simplified_name(['bio', 'data', 'catalyst'], multiindex, ['biology variable'])`

would find all variables where the `simplified_name` contains 'bio', 'data', and/or 'catalyst' but excludes `simplified_name`s equal to 'biology variable' (ignoring capitalization).

`regex_filter` accomplishes two main filters. First, a list of terms will be used to automatically identify stigmatizing variables - these terms do not require manual review. Second, a list of potentially stigmatizing variables will filter out any `simplified_name` variables containing the given list of terms. Unlike `check_simplified_name` where excluded variables must match the `simplified_name` completely, this function excludes the variable if the term is *contained* by the `simplified_name`. 

For example,

`regex_filter(['biodata catalyst', 'terra', 'heliobacter pylori'], ['heliobacter'], ['ter'])`

would first include the variables containing '*heliobacter*' and then exclude all variables containing '*ter*'. In this case, 'heliobac*ter* pylori' would be identified as stigmatizing,'*ter*ra' would be filtered out, and 'biodata catalyst' would be added to the list for manual review.


***Note:*** `regex_filter` ***can use regular expressions as input while*** `check_simplified_name` ***input must match exactly.***

| Function | Arguments / Input | Output|
|--------|-------------------|-------|
| `check_simplified_name()` | (1) list of search terms, (2) multiindex dataframe, (3) optional: variables to exclude | (1) list of potentially stigmatizing variables, (2) variables excluded using provided criteria|
| `regex_filter` | (1) list of stigmatizing variables, (2) list of terms used to automatically identify stigmatizing variables, (3) list of terms or regular expressions to filter | (1) list of variables automatically deemed stigmatizing, (2) filtered list of stigmatizing variables that still require manual review |

### Load stigmatizing terms, simplified variables to exclude, and terms to filter out

The following files provide information about terms used to select and filter stigmatizing variables. These files are located in the `stigmatizing_terms` directory.

| File | Information |
|--------|-------------------|
| `stigmatizing_keywords.tsv` | List of terms used to filter out potentially stigmatizing variables from PIC-SURE Authorized Access and associated reasons for selection |
| `simplified_vars_excluded.tsv` | List of `simplified_name` variables that will be filtered out of the list of potentially stigmatizing variables and associated reasons for exclusion |
| `inclusion_terms.tsv` | List of terms that will be used to identify terms that are automatically deemed stigmatizing, this is applied before the excluded terms |
| `terms_excluded.tsv` |  List of terms that will be used to filter out non-stigmatizing variables and the associated reasons for exclusion |

In [12]:
stigmatizing_df = pd.read_csv("stigmatizing_terms/stigmatizing_keywords.tsv", sep="\t")
exclude_vars_df = pd.read_csv("stigmatizing_terms/simplified_vars_excluded.tsv", sep="\t")
terms_excluded_df = pd.read_csv("stigmatizing_terms/terms_excluded.tsv", sep="\t")
terms_included_df = pd.read_csv("stigmatizing_terms/inclusion_terms.tsv", sep='\t')

In [13]:
stig_terms = list(stigmatizing_df["Search keyword"])
print("Search keywords:\n\n", stig_terms)

Search keywords:

 ['sex', 'sexual', 'intercourse', 'copulation', 'sex history', 'sexually', 'coitus', 'pareunia', 'venery', 'chlamydia', 'herpes', 'HIV', 'pubic lice', 'trichomoniasis', 'progesterone', 'genital', 'gonorrhea', 'AIDS', 'syphilis', 'vagina', 'estrogen', 'testosterone', 'androgens', 'depression', 'anxiety', 'phobia', 'mental', 'psycho', 'emotional health', 'depressive', 'panic', 'schizophrenia', 'mental health', 'psychological', 'suicide', 'illicit', 'abuse', 'fentanyl', 'ecstasy', 'methamphetamine', 'phencyclidine', 'mushroom', 'flakka', 'central nervous system depressant', 'khat', 'loperamide', 'stimulant', 'street drug', 'illegal', 'cocaine', 'LSD', 'heroin', 'angel dust', 'salvia', 'ayahuasca', 'hallucinogen', 'kratom', 'dextromethorphan', 'cannabinoid', 'rohypnol', 'roofies', 'ketamine hydrochloride', 'psilocybin', 'mushroom', 'krokodil', 'bath salts', 'DMT', 'inhalant', 'mescaline', 'opioid', 'gamma hydroxybutyrate', 'bachelor', 'phd', 'intellectual', 'acheivement',

In [14]:
exclude_vars = list(exclude_vars_df["Variables to exclude"])
print("Variables to exclude:\n\n", exclude_vars)

Variables to exclude:

 ['sex', 'sex of participant', 'continence ability', 'dressing ability', 'feeding ability', 'grooming and bathing ability', 'heavy work ability', 'inability to walk on treadmill', 'physicians judgement of overall disability', 'physicians judgment of overall disability', 'toileting ability']


In [15]:
terms_excluded = list(terms_excluded_df["Terms to exclude"])
print("Terms to exclude:\n\n", terms_excluded)

Terms to exclude:

 ['race and sex adjusted', 'hives', 'nsaids', 'chlamydia pneumoniae', 'walking aid', 'shiver', 'health aids', 'herpes zoster', 'heart disease', 'archive', 'hispanic', 'ecg', 'instrumental', 'supplemental', 'segmental', 'electrocardiograph', 'minn(esota)? code', 'environmental', 'mini-mental state exam', 'coffee or tea', 'change in ability to', 'how ability to', 'variability', 'gradability', 'gradeability', 'reliability', 'acceptability', 'stability', 'leg ability', 'physical ability', 'probability', 'single tennis', 'single ventricular', 'single nodule', 'single chair', 'urinalysis: albumin', 'brace', 'thoracentesis', 'paracentral', 'extracellular', 'intracellular', 'contraceptive', 'traced', 'single sup', 'st depression ge', 'segment depression', 'tennis (singles)', 'inability to walk on treadmill', 'availability', 'shopping ability', 'money', 'grade ability', 'knudson', 'single item', 'maids', 'sample use', 'urine', 'albumin', 'creatinine', 'cortisol', 'saliva', 'r

In [16]:
terms_included = list(terms_included_df['Terms to include'])
print("Terms to include:\n\n", terms_included)

Terms to include:

 ['estrogen', 'progesterone', 'testosterone', 'de-identified aric subject id', 'de-identified aric participant id']


### Run functions to find potentially stigmatizing variables

In [17]:
# Takes a while
stig_vars, ex_vars = check_simplified_name(stig_terms, multiindex, exclude_vars)

In [18]:
keep_vars, final_vars = regex_filter(stig_vars, terms_included, terms_excluded)

Found 1220 that are stigmatizing
3626 still need review


In [19]:
print("Total number of vars", len(stig_vars))
print("After filtering", len(final_vars))

Total number of vars 6292
After filtering 3626


### Manual review of potentially stigmatizing variables

`manual_check` provides an interactive way to record whether filtered variables are indeed stigmatizing. It uses the list of stigmatizing variables and also takes an optional argument `ex_vars` that provides a manual review of the excluded terms. A dataframe of the stigmatizing variables with recorded responses and (if applicable) a dataframe of excluded variables and recorded responses are returned.

To use this function, simply call it on the list of filtered variables (and excluded variables if needed) and follow the interactive instructions.

Please save results from this function to the `stigmatizing_variable_results` directory.

In [20]:
# Rename output_file to appropriate filename
output_file = 'stigmatizing_variable_results/stigmatizing_variable_decisions_10nov2021.txt'
prev_file = "stigmatizing_variable_results/stigmatizing_variable_decisions_8nov2021.txt"
stigmatizing_variables, excluded_stigmatizing_variables = manual_check(final_vars, output_file, keep_vars, prev_file=prev_file)

Continue to review of 3626 variables?
y/n: y
Stigmatizing: Estrogens, excluding vaginal creams <, recording result 1 of 1220 already identified as stigmatizing
Stigmatizing: HAVE YOU EVER TAKEN OTHER ESTROGENS <, recording result 2 of 1220 already identified as stigmatizing
Stigmatizing: Premarin (conjugated estrogens) <, recording result 3 of 1220 already identified as stigmatizing
Stigmatizing: Calculated estrogen use at bl <, recording result 4 of 1220 already identified as stigmatizing
Stigmatizing: Estrogen at baseline, no creams <, recording result 5 of 1220 already identified as stigmatizing
Stigmatizing: Estrogens, excluding vaginal creams <, recording result 6 of 1220 already identified as stigmatizing
Stigmatizing: Premarin (conjugated estrogens) <, recording result 7 of 1220 already identified as stigmatizing
Stigmatizing: Estrogens <, recording result 8 of 1220 already identified as stigmatizing
Stigmatizing: Estrogens, excluding vaginal creams <, recording result 9 of 1220

Stigmatizing: If progesterone use ever: The strength of progesterone <, recording result 191 of 1220 already identified as stigmatizing
Stigmatizing: If taken HRT: Estrogen use ever? <, recording result 192 of 1220 already identified as stigmatizing
Stigmatizing: If taken HRT: Progesterone use ever? <, recording result 193 of 1220 already identified as stigmatizing
Stigmatizing: MEDICATION USE - ORAL/PATCH ESTROGEN (FOR WOMEN USERS ALSO SEE ESTROGEN SECTION) <, recording result 194 of 1220 already identified as stigmatizing
Stigmatizing: MEDICATION USE: ORAL/PATCH ESTROGEN (FOR WOMEN USERS ALSO SEE ESTROGEN SECTION) <, recording result 195 of 1220 already identified as stigmatizing
Stigmatizing: MEDICATION USE: ORAL/PATCH ESTROGEN <, recording result 196 of 1220 already identified as stigmatizing
Stigmatizing: MEDICATIONS: ORAL/PATCH ESTROGEN (FOR WOMEN USERS ALSO SEE ESTROGEN SECTION) <, recording result 197 of 1220 already identified as stigmatizing
Stigmatizing: MEDICINE USE: ESTROG

Stigmatizing: FEMALE HORMONE REPLACEMENT: NAME OF MOST RECENT ESTROGEN PREPARATION - CHARACTER VARIABLE; (BLANK) = UNKNOWN, OR NO ESTROGEN PREPARATION USE (197) <, recording result 375 of 1220 already identified as stigmatizing
Stigmatizing: FEMALE HORMONE REPLACEMENT: NAME OF MOST RECENT PROGESTERONE PREPARATION - CHARACTER VARIABLE; (BLANK) = UNKNOWN, OR NO PROGESTERONE PREPARATION USE (207) <, recording result 376 of 1220 already identified as stigmatizing
Stigmatizing: FEMALE HORMONE REPLACEMENT: NUMBER OF DAYS A MONTH TAKING ESTROGENS <, recording result 377 of 1220 already identified as stigmatizing
Stigmatizing: FEMALE HORMONE REPLACEMENT: PATCH DOSE OF ESTROGEN <, recording result 378 of 1220 already identified as stigmatizing
Stigmatizing: FEMALE HORMONE REPLACEMENT: PROGESERONE PREPARATION: STRENGTH CHARACTER VARIABLE; (BLANK) = UNKNOWN, OR NO PROGESTERONE PREPARATION USE (207) <, recording result 379 of 1220 already identified as stigmatizing
Stigmatizing: FEMALE HORMONE REP

Stigmatizing: If estrogen use ever: Name of most recent estrogen preparation <, recording result 560 of 1220 already identified as stigmatizing
Stigmatizing: If estrogen use ever: Number of days per month taken <, recording result 561 of 1220 already identified as stigmatizing
Stigmatizing: If estrogen use ever: The strength of estrogen <, recording result 562 of 1220 already identified as stigmatizing
Stigmatizing: If periods stopped: Have you ever taken hormone replacement therapy (estrogen/progesterone)? <, recording result 563 of 1220 already identified as stigmatizing
Stigmatizing: If periods stopped: Have you used Evista (raloxifene) or Nolvadex (tamoxifen) or other selective estrogen receptor modulator (SERM)? <, recording result 564 of 1220 already identified as stigmatizing
Stigmatizing: If progesterone use ever: Name of most recent progesterone preparation <, recording result 565 of 1220 already identified as stigmatizing
Stigmatizing: If progesterone use ever: Number of days

Stigmatizing: De-identified ARIC subject ID [Heart Failure Diagnosis Form HDX] <, recording result 748 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [Heart Failure Hospital Record Abstraction Form, HFA] <, recording result 749 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [Informant Interview Form, IFI] <, recording result 750 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [Medical Conditions Update Form, MCU] <, recording result 751 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [Post-V4 ECG Data Management System - Form Display, CEBD] <, recording result 752 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [Post-V4 ECG Data Management System -Form Display] <, recording result 753 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [Six Item Screener (SI

Stigmatizing: De-identified ARIC subject ID [Subjective Memory Form, SMF] <, recording result 932 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC participant ID. [TIA/Stroke Form, Cohort Visit 4] <, recording result 933 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [TIA/Stroke, exam 1] <, recording result 934 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [TIA/Stroke, exam 2] <, recording result 935 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC participant ID [TIA/Stroke, exam 3] <, recording result 936 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC participant ID [TIA/Stroke Form, Cohort Visit 4] <, recording result 937 of 1220 already identified as stigmatizing
Stigmatizing: De-identified ARIC subject ID [Telephone Interview for Cognitive Status] <, recording result 938 of 1220 already identified as stigmatizing
Stigma

Stigmatizing: Months used estrogen pills <, recording result 1116 of 1220 already identified as stigmatizing
Stigmatizing: Months used progesterone cream <, recording result 1117 of 1220 already identified as stigmatizing
Stigmatizing: Months used progesterone shots <, recording result 1118 of 1220 already identified as stigmatizing
Stigmatizing: Natural/Phytoestrogen creams <, recording result 1119 of 1220 already identified as stigmatizing
Stigmatizing: Natural/Phytoestrogen foods <, recording result 1120 of 1220 already identified as stigmatizing
Stigmatizing: Natural/Progesterone cream <, recording result 1121 of 1220 already identified as stigmatizing
Stigmatizing: Non-estrogen rx/Don't know <, recording result 1122 of 1220 already identified as stigmatizing
Stigmatizing: Non-estrogen rx/Evista (Raloxifene) <, recording result 1123 of 1220 already identified as stigmatizing
Stigmatizing: Non-estrogen rx/Nolvadex (Tamoxifen) <, recording result 1124 of 1220 already identified as st

Using results from previous file, no for > Baseline D-Dimer - sample <, recording result 51 of 3626
Using results from previous file, no for > Prob of being sampled by subgroup <, recording result 52 of 3626
Using results from previous file, no for > Sample prob weight, excluding death grp <, recording result 53 of 3626
Using results from previous file, yes for > Subject Identifier <, recording result 54 of 3626
Using results from previous file, yes for > Subject Identifier <, recording result 55 of 3626
Using results from previous file, no for > Year sample taken <, recording result 56 of 3626
Using results from previous file, yes for > Subject Identifier <, recording result 57 of 3626
Using results from previous file, yes for > Subject Identifier <, recording result 58 of 3626
Using results from previous file, yes for > Subject Identifier <, recording result 59 of 3626
Using results from previous file, yes for > Subject Identifier <, recording result 60 of 3626
Using results from pre

Using results from previous file, yes for > GOT MARRIED. Q B <, recording result 251 of 3626
Using results from previous file, yes for > GOT MARRIED <, recording result 252 of 3626
Using results from previous file, yes for > GRADE OF SCHOOL COMPLETED. Q 1 <, recording result 253 of 3626
Using results from previous file, yes for > GRADUATED FROM SCHOOL - MON 12 <, recording result 254 of 3626
Using results from previous file, yes for > GRADUATED FROM SCHOOL. Q A <, recording result 255 of 3626
Using results from previous file, yes for > GRADUATED FROM SCHOOL <, recording result 256 of 3626
Using results from previous file, yes for > HAD DEPRESSION IN THE PAST YEAR? Q 19 <, recording result 257 of 3626
Using results from previous file, yes for > HAD HIV IN THE PAST YEAR? Q 25 <, recording result 258 of 3626
Using results from previous file, yes for > HAD MENTAL DISORDERS IN THE PAST YEAR. Q 16 <, recording result 259 of 3626
Using results from previous file, yes for > HAD MENTAL DISORDER

Using results from previous file, yes for > USED HEROIN IN LIFE. Q 10 <, recording result 448 of 3626
Using results from previous file, yes for > USED HEROIN IN LIFE. Q 9 <, recording result 449 of 3626
Using results from previous file, yes for > USED HEROIN IN LIFE <, recording result 450 of 3626
Using results from previous file, yes for > USED HEROIN, PAST 24 HOURS. Q 12 <, recording result 451 of 3626
Using results from previous file, yes for > USED HEROIN, PAST 24 HOURS <, recording result 452 of 3626
Using results from previous file, yes for > VAGINAL DRYNESS, PAST 3 MONTHS. Q 24 <, recording result 453 of 3626
Using results from previous file, no for > VOICE OF SUBJECT. Q 8 <, recording result 454 of 3626
Using results from previous file, yes for > WHICH OPIATE? HEROIN. Q 10 <, recording result 455 of 3626
Using results from previous file, yes for > WHICH OPIATE? HEROIN. Q 9b <, recording result 456 of 3626
Using results from previous file, yes for > WIDOWED. Q 12b <, recording r

Using results from previous file, yes for > WHAT IS THE HIGHEST DEGREE OR LEVEL OF SCHOOL YOU HAVE COMPLETED? (IF CURRENTLY ENROLLED, MARK THE HIGHEST GRADE COMPLETED, DEGREE RECEIVED) <, recording result 641 of 3626
Using results from previous file, yes for > WHAT IS YOUR CURRENT MARITAL STATUS? <, recording result 642 of 3626
Using results from previous file, no for > WHICH OF THE FOLLOWING BEST DESCRIBES YOU - RACE AFRICAN AMERICAN OR BLACK <, recording result 643 of 3626
Using results from previous file, no for > WHICH OF THE FOLLOWING BEST DESCRIBES YOU - RACE AMERICAN INDIAN OR ALASKA NATIVE <, recording result 644 of 3626
Using results from previous file, no for > WHICH OF THE FOLLOWING BEST DESCRIBES YOU - RACE ASIAN <, recording result 645 of 3626
Using results from previous file, no for > WHICH OF THE FOLLOWING BEST DESCRIBES YOU - RACE CAUCASIAN OR WHITE <, recording result 646 of 3626
Using results from previous file, no for > WHICH OF THE FOLLOWING BEST DESCRIBES YOU - RAC

Using results from previous file, no for > Total Cholesterol (sample type: EDTA plasma) <, recording result 832 of 3626
Using results from previous file, no for > Total Cholesterol (sample type: plasma) <, recording result 833 of 3626
Using results from previous file, no for > Total cholesterol (Sample type - EDTA plasma) <, recording result 834 of 3626
Using results from previous file, no for > Total cholesterol (sample type: plasma) <, recording result 835 of 3626
Using results from previous file, no for > Triglyceride (sample type: EDTA plasma) <, recording result 836 of 3626
Using results from previous file, no for > Triglycerides (Sample type - EDTA plasma) <, recording result 837 of 3626
Using results from previous file, no for > Triglycerides (sample type: EDTA plasma) <, recording result 838 of 3626
Using results from previous file, no for > Triglycerides (sample type: plasma) <, recording result 839 of 3626
Using results from previous file, yes for > UNDER MEDICAL CARE BUT NOT

Using results from previous file, no for > R IN V5 OR V6: S IN V1 OR V2: ST DEPRESSION <, recording result 1025 of 3626
Using results from previous file, no for > SECOND EXAMINER'S OPINION: SUBJECT HAS PULMONARY DISEASE <, recording result 1026 of 3626
Using results from previous file, no for > Total cholesterol (sample type: plasma) <, recording result 1027 of 3626
Using results from previous file, no for > Triglycerides (sample type: EDTA plasma) <, recording result 1028 of 3626
Using results from previous file, no for > Triglycerides (sample type: plasma) <, recording result 1029 of 3626
Using results from previous file, yes for > UNDER MEDICAL CARE BUT NOT HOSPITALIZED SINCE LAST EXAM FOR MENTAL OR EMOTIONAL DISEASE <, recording result 1030 of 3626
Using results from previous file, yes for > UNIQUE PARTICIPANT ID <, recording result 1031 of 3626
Using results from previous file, yes for > Unique study participant identification number <, recording result 1032 of 3626
Using results 

Using results from previous file, yes for > MEDICATION USE: ANTI-ANXIETY,SEDATIVE/HYPNOTICS ETC. (LIBRIUM, VALIUM ETC.) <, recording result 1221 of 3626
Using results from previous file, yes for > MEDICATION USE: ANTI-ANXIETY,SEDATIVE/HYPNOTICS ETC. (LIBRIUM, VALIUM, ETC.) <, recording result 1222 of 3626
Using results from previous file, yes for > MEDICATIONS: ANTI-ANXIETY, SEDATIVE/HYPNOTICS (LIBRIUM, VALIUM) <, recording result 1223 of 3626
Using results from previous file, yes for > MEDICINE USE: ANTI-ANXIETY, SEDATIVE/HYPNOTICS, ETC. (LIBRIUM, VALIUM, ETC.) <, recording result 1224 of 3626
Using results from previous file, yes for > MMSE - EXAMINER ASSESSES SUBJECT MENTAL STATUS <, recording result 1225 of 3626
Using results from previous file, yes for > MMSE - EXAMINER'S ASSESSMENT OF SUBJECT'S MENTAL STATUS <, recording result 1226 of 3626
Using results from previous file, yes for > MMSE: APHASIA (FACTORS POTENTIALLY AFFECTING MENTAL STATUS) <, recording result 1227 of 3626
Usin

Using results from previous file, yes for > DEPRESSION SCALE: MY SLEEP WAS RESTLESS <, recording result 1415 of 3626
Using results from previous file, yes for > DEPRESSION SCALE: PEOPLE WERE UNFRIENDLY <, recording result 1416 of 3626
Using results from previous file, no for > DIET: DIETING (SUBJECT'S OPINION) <, recording result 1417 of 3626
Using results from previous file, no for > ECHO: OTHER CONGENITAL ABNORMALITY <, recording result 1418 of 3626
Using results from previous file, yes for > EDUCATION <, recording result 1419 of 3626
Using results from previous file, no for > EXAMINER ASSESSES SUBJECT'S MENTAL STATUS <, recording result 1420 of 3626
Using results from previous file, yes for > EXAMINER'S ASSESSMENT OF SUBJECT'S MENTAL STATUS <, recording result 1421 of 3626
Using results from previous file, yes for > FACTORS POTENTIALLY AFFECTING MENTAL STATUS TESTING - APHASIA <, recording result 1422 of 3626
Using results from previous file, yes for > FACTORS POTENTIALLY AFFECTING 

Using results from previous file, yes for > Subject ID <, recording result 1606 of 3626
Using results from previous file, yes for > Sample ID <, recording result 1607 of 3626
Using results from previous file, yes for > Subject ID <, recording result 1608 of 3626
Using results from previous file, yes for > % 25+ with minimum High School education <, recording result 1609 of 3626
Using results from previous file, yes for > % 25+ with minimum bachelor degree <, recording result 1610 of 3626
Using results from previous file, yes for > 15a. What is the highest degree or years of school your father (or important male caretaker) completed, including trade or vocational school or college? [Visit 2] [First Year Questionnaire, AF1] <, recording result 1611 of 3626
Using results from previous file, yes for > 17a. What is the highest degree or years of school your mother (or important female caretaker) completed, including trade or vocational school or college? [Visit 2] [First Year Questionnaire,

Using results from previous file, no for > SCHOOLS, COLLEGES, OR COMMUNITY CENTERS WITH RECREATIONAL FACILITIES FREE AND OPEN TO PUBLIC <, recording result 1804 of 3626
Using results from previous file, yes for > TRICYCLIC ANTI-DEPRESSANTS PLUS ANTI-PSYCHOTICS COMBINATIONS <, recording result 1805 of 3626
Using results from previous file, no for > ESTIMATION OF HOW HARD SUBJECT ATTEMPTED TO DO A FORCED EXHALATION <, recording result 1806 of 3626
Using results from previous file, yes for > SHARE PARTICIPANT ID NUMBER <, recording result 1807 of 3626
Using results from previous file, yes for > RANDOM COHORT AND CASES NOT IN COHORT <, recording result 1808 of 3626
Using results from previous file, yes for > CASE-COHORT DESIGNATION <, recording result 1809 of 3626
Using results from previous file, yes for > GROUP 3 COHORT INDICATOR <, recording result 1810 of 3626
Using results from previous file, no for > SAMPLE REPLACEMENT <, recording result 1811 of 3626
Using results from previous file

Using results from previous file, no for > HRT5 [Carotid Distensibility, Cohort Visit 1] <, recording result 1991 of 3626
Using results from previous file, no for > HRT6 [Carotid Distensibility, Cohort Visit 1] <, recording result 1992 of 3626
Using results from previous file, no for > HRT7 [Carotid Distensibility, Cohort Visit 1] <, recording result 1993 of 3626
Using results from previous file, no for > HRT8 [Carotid Distensibility, Cohort Visit 1] <, recording result 1994 of 3626
Using results from previous file, no for > HRT9 [Carotid Distensibility, Cohort Visit 1] <, recording result 1995 of 3626
Using results from previous file, no for > Mean diastolic blood pressure [BP] [Carotid Distensibility, Cohort Visit 1] <, recording result 1996 of 3626
Using results from previous file, no for > Mean systolic blood pressure [BP] [Carotid Distensibility, Cohort Visit 1] <, recording result 1997 of 3626
Using results from previous file, no for > MeanHRT [Carotid Distensibility, Cohort Visi

Using results from previous file, no for > Language. Q55. Subject has difficulty speaking [Clinical Dementia Rating Informant Interview] <, recording result 2176 of 3626
Using results from previous file, no for > Language. Q56. Subject has difficulty understanding ordinary conversations [Clinical Dementia Rating Informant Interview] <, recording result 2177 of 3626
Using results from previous file, no for > Language. Q57. Subject has difficulty finding words or names [Clinical Dementia Rating Informant Interview] <, recording result 2178 of 3626
Using results from previous file, no for > Memory. Q10. Subject has been diagnosed with dementia, Alzheimer Disease or mild cognitive impairment? Mci [Clinical Dementia Rating Informant Interview] <, recording result 2179 of 3626
Using results from previous file, no for > Memory. Q11. Subject consistent changes In memory over the past year [Clinical Dementia Rating Informant Interview] <, recording result 2180 of 3626
Using results from previou

Using results from previous file, no for > Medications which secondarily affect cholesterol: using 2004 Med. Code, visit 2 [Cohort, Exam 2] <, recording result 2360 of 3626
Using results from previous file, yes for > Menopause status variable for visit 2 [Cohort, Exam 2] <, recording result 2361 of 3626
Using results from previous file, no for > Plaque (with or without shadowing) [Cohort, Exam 2] <, recording result 2362 of 3626
Using results from previous file, no for > Plaque in either carotid bifurcation [Cohort, Exam 2] <, recording result 2363 of 3626
Using results from previous file, no for > Plaque in either common carotid [Cohort, Exam 2] <, recording result 2364 of 3626
Using results from previous file, no for > Plaque in either internal carotid [Cohort, Exam 2] <, recording result 2365 of 3626
Using results from previous file, no for > Plaque(with and W/O shadowing)-Alt. Def. [Cohort, Exam 2] <, recording result 2366 of 3626
Using results from previous file, no for > Predicte

Using results from previous file, no for > Same as QWAVE47A but uses machine code [Cohort, Exam 4] <, recording result 2545 of 3626
Using results from previous file, no for > Same as QWAVE48B but uses machine code [Cohort, Exam 4] <, recording result 2546 of 3626
Using results from previous file, no for > Same leg used for ankle blood pressure at visit 1 and 4 [Cohort, Exam 4] <, recording result 2547 of 3626
Using results from previous file, no for > Same leg used for ankle blood pressure at visit 3 and 4 [Cohort, Exam 4] <, recording result 2548 of 3626
Using results from previous file, no for > Sex (From FTRA22) [Cohort, Exam 4] <, recording result 2549 of 3626
Using results from previous file, no for > Shadowing in either carotid bifurcation [Cohort, Exam 4] <, recording result 2550 of 3626
Using results from previous file, no for > Shadowing in either common carotid [Cohort, Exam 4] <, recording result 2551 of 3626
Using results from previous file, no for > Shadowing in either int

Using results from previous file, yes for > Combined education levels 2 and 3 <, recording result 2734 of 3626
Using results from previous file, yes for > Source subject ID <, recording result 2735 of 3626
Using results from previous file, no for > Subject source <, recording result 2736 of 3626
Using results from previous file, no for > Q0a. Year Of Completion Date [Subjective Memory Form, SMF] <, recording result 2737 of 3626
Using results from previous file, no for > Q1. Times in past month misplaced items [Subjective Memory Form, SMF] <, recording result 2738 of 3626
Using results from previous file, no for > Q2. Times in past month write reminder notes to self [Subjective Memory Form, SMF] <, recording result 2739 of 3626
Using results from previous file, no for > Q3. Times in past month trouble remembering recent conversations [Subjective Memory Form, SMF] <, recording result 2740 of 3626
Using results from previous file, no for > Q4. Family expressed concern about memory loss [S

Using results from previous file, no for > Carbohydrates gm [Vitamin and Nutrient Measurements, Cohort, Exam 3] <, recording result 2911 of 3626
Using results from previous file, no for > Carotene IU [Vitamin and Nutrient Measurements, Cohort, Exam 3] <, recording result 2912 of 3626
Using results from previous file, no for > Cholesterol mg [Vitamin and Nutrient Measurements, Cohort, Exam 3] <, recording result 2913 of 3626
Using results from previous file, no for > Copper mg [Vitamin and Nutrient Measurements, Cohort, Exam 3] <, recording result 2914 of 3626
Using results from previous file, no for > Crude fiber gm [Vitamin and Nutrient Measurements, Cohort, Exam 3] <, recording result 2915 of 3626
Using results from previous file, no for > Fat eat gm [Vitamin and Nutrient Measurements, Cohort, Exam 3] <, recording result 2916 of 3626
Using results from previous file, no for > Folate mcg [Vitamin and Nutrient Measurements, Cohort, Exam 3] <, recording result 2917 of 3626
Using results

Using results from previous file, no for > Ethnicity of participant <, recording result 3098 of 3626
Using results from previous file, yes for > Sample ID <, recording result 3099 of 3626
Using results from previous file, no for > Subject's height <, recording result 3100 of 3626
Using results from previous file, no for > Subject's weight <, recording result 3101 of 3626
Using results from previous file, no for > Case control status of the subject for atrial fibrillation (AF) <, recording result 3102 of 3626
Using results from previous file, yes for > Subject ID <, recording result 3103 of 3626
Using results from previous file, yes for > Sample ID <, recording result 3104 of 3626
Using results from previous file, yes for > Subject ID <, recording result 3105 of 3626
Using results from previous file, yes for > De-identified Subject ID <, recording result 3106 of 3626
Using results from previous file, no for > Race of participant <, recording result 3107 of 3626
Using results from previo

Using results from previous file, yes for > Subject ID <, recording result 3288 of 3626
Using results from previous file, yes for > Subject Identifier <, recording result 3289 of 3626
Using results from previous file, yes for > Unique Subject ID <, recording result 3290 of 3626
Using results from previous file, yes for > De-identified Subject ID <, recording result 3291 of 3626
Using results from previous file, no for > Ethnicity <, recording result 3292 of 3626
Using results from previous file, no for > Study identified sex <, recording result 3293 of 3626
Using results from previous file, no for > Subject's body mass index <, recording result 3294 of 3626
Using results from previous file, no for > Subject's height <, recording result 3295 of 3626
Using results from previous file, no for > Subject's waist circumference <, recording result 3296 of 3626
Using results from previous file, no for > Subject's weight <, recording result 3297 of 3626
Using results from previous file, yes for 

Using results from previous file, no for > Race amin <, recording result 3478 of 3626
Using results from previous file, no for > Race asn <, recording result 3479 of 3626
Using results from previous file, no for > Race blk <, recording result 3480 of 3626
Using results from previous file, no for > Race haw <, recording result 3481 of 3626
Using results from previous file, no for > Race oth <, recording result 3482 of 3626
Using results from previous file, no for > Race wht <, recording result 3483 of 3626
Using results from previous file, no for > Race <, recording result 3484 of 3626
Using results from previous file, yes for > De-identified Subject ID <, recording result 3485 of 3626
Using results from previous file, no for > Sex encoded value <, recording result 3486 of 3626
Using results from previous file, yes for > Subject ID <, recording result 3487 of 3626
Using results from previous file, yes for > De-identified Subject ID <, recording result 3488 of 3626
Using results from pre


 
STIGMATIZING VARIABLE RESULTS SAVED TO:	 stigmatizing_variable_results/stigmatizing_variable_decisions_10nov2021.txt
Clear cell output and display pandas dataframe?


KeyboardInterrupt: Interrupted by user

You can review your decisions in the specified `output_file` to double-check the final results.

### Export stigmatizing variables as tab-delimited text file

After ensuring the proper decisions were made and stigmatizing variables were selected, you can run the following code to create a tab-delimited text file of the stigmatizing variables. 

In [None]:
stig_vars_for_output = pd.read_csv(output_file, sep='\t')
stig_mask = stig_vars_for_output["stigmatizing"] == "y"
stig_vars_for_output = stig_vars_for_output[stig_mask]
stig_vars_for_output = stig_vars_for_output["full name"]
stig_vars_for_output.reset_index(drop=True, inplace=True)

In [None]:
final_output = 'stigmatizing_variable_results/stigmatizing_variables.txt'
stig_vars_for_output.to_csv(final_output, sep='\t', header=False, index=False)

In [None]:
dst = '/home/ec2-user/SageMaker/studies/ALL-avillach-73-bdcatalyst-etl/general/data/conceptsToRemove.txt'
src = '/home/ec2-user/SageMaker/biodata_catalyst_stigmatizing_variables/'+final_output
copyfile(src, dst)