Skip to content

Latest commit

 

History

History
146 lines (128 loc) · 19.6 KB

readme.md

File metadata and controls

146 lines (128 loc) · 19.6 KB

Patient Risk Profiles

The virtual R/Pharma Conference is happening this week! To celebrate, we're exploring Patient Risk Profiles. Thank you to Jenna Reps for preparing this week's data!

This dataset contains 100 simulated patient's medical history features and the predicted 1-year risk of 14 outcomes based on each patient's medical history features. The predictions used real logistic regression models developed on a large real world healthcare dataset.

Note: We did not clean the column names this week. This data looks more like the sort of data you're likely to encounter in the wild, so we thought it would be good practice to work with it as-is.

The Data

# Option 1: tidytuesdayR package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2023-10-24')
## OR
tuesdata <- tidytuesdayR::tt_load(2023, week = 43)

patient_risk_profiles <- tuesdata$patient_risk_profiles

# Option 2: Read directly from GitHub

patient_risk_profiles <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-24/patient_risk_profiles.csv')

How to Participate

  • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
  • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
  • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

Data Dictionary

patient_risk_profiles.csv

variable class description
personId integer A unique identifier for the simulated patient
age group: 10 - 14 integer A binary column where 1 means the patient is aged between 10-14 (inclusive) and 0 means the patient is not in that age group
age group: 15 - 19 integer A binary column where 1 means the patient is aged between 15-19 (inclusive) and 0 means the patient is not in that age group
age group: 20 - 24 integer A binary column where 1 means the patient is aged between 20-24 (inclusive) and 0 means the patient is not in that age group
age group: 65 - 69 integer A binary column where 1 means the patient is aged between 65-69 (inclusive) and 0 means the patient is not in that age group
age group: 40 - 44 integer A binary column where 1 means the patient is aged between 40-44 (inclusive) and 0 means the patient is not in that age group
age group: 45 - 49 integer A binary column where 1 means the patient is aged between 45-49 (inclusive) and 0 means the patient is not in that age group
age group: 55 - 59 integer A binary column where 1 means the patient is aged between 55-59 (inclusive) and 0 means the patient is not in that age group
age group: 85 - 89 integer A binary column where 1 means the patient is aged between 85-89 (inclusive) and 0 means the patient is not in that age group
age group: 75 - 79 integer A binary column where 1 means the patient is aged between 75-79 (inclusive) and 0 means the patient is not in that age group
age group: 5 - 9 integer A binary column where 1 means the patient is aged between 5-9 (inclusive) and 0 means the patient is not in that age group
age group: 25 - 29 integer A binary column where 1 means the patient is aged between 25-29 (inclusive) and 0 means the patient is not in that age group
age group: 0 - 4 integer A binary column where 1 means the patient is aged between 0-4 (inclusive) and 0 means the patient is not in that age group
age group: 70 - 74 integer A binary column where 1 means the patient is aged between 70-74 (inclusive) and 0 means the patient is not in that age group
age group: 50 - 54 integer A binary column where 1 means the patient is aged between 50-54 (inclusive) and 0 means the patient is not in that age group
age group: 60 - 64 integer A binary column where 1 means the patient is aged between 60-64 (inclusive) and 0 means the patient is not in that age group
age group: 35 - 39 integer A binary column where 1 means the patient is aged between 35-39 (inclusive) and 0 means the patient is not in that age group
age group: 30 - 34 integer A binary column where 1 means the patient is aged between 30-34 (inclusive) and 0 means the patient is not in that age group
age group: 80 - 84 integer A binary column where 1 means the patient is aged between 80-84 (inclusive) and 0 means the patient is not in that age group
age group: 90 - 94 integer A binary column where 1 means the patient is aged between 90-94 (inclusive) and 0 means the patient is not in that age group
Sex = FEMALE integer A binary column where 1 means the patient has a female sex
sex = MALE integer A binary column where 1 means the patient has a male sex
Acetaminophen exposures in prior year integer A binary column where 1 means the patient had a record for acetaminophen in the prior year and 0 means they did not
Occurrence of Alcoholism in prior year integer A binary column where 1 means the patient had a record for alcoholism in the prior year and 0 means they did not
Anemia in prior year integer A binary column where 1 means the patient had a record for anemia in the prior year and 0 means they did not
Angina events in prior year integer A binary column where 1 means the patient had a record for angina in the prior year and 0 means they did not
ANTIEPILEPTICS in prior year integer A binary column where 1 means the patient had a record for a drug in the category ANTIEPILEPTICS in the prior year and 0 means they did not
Occurrence of Anxiety in prior year integer A binary column where 1 means the patient had a record for anxiety in the prior year and 0 means they did not
Osteoarthritis in prior year integer A binary column where 1 means the patient had a record for osteoarthritis in the prior year and 0 means they did not
Aspirin exposures in prior year integer A binary column where 1 means the patient had a record for aspirin in the prior year and 0 means they did not
Occurrence of Asthma in prior year integer A binary column where 1 means the patient had a record for asthma in the prior year and 0 means they did not
Atrial Fibrillation, incident in prior year integer A binary column where 1 means the patient had a record for atrial fibrillation in the prior year and 0 means they did not
HORMONAL CONTRACEPTIVES in prior year integer A binary column where 1 means the patient had a record for a drug in the category hormonal contraceptives in the prior year and 0 means they did not
Any cancer (excl. prostate cancer and benign cancer) in prior year integer A binary column where 1 means the patient had a record for cancer excluding prostate and benign cancers in the prior year and 0 means they did not
Acute Kidney Injury (AKI) in prior year integer A binary column where 1 means the patient had a record for acute kidney injury in the prior year and 0 means they did not
Chronic kidney disease or end stage renal disease in prior year integer A binary column where 1 means the patient had a record for chronic kidney disease in the prior year and 0 means they did not
Heart failure in prior year integer A binary column where 1 means the patient had a record for heart failure in the prior year and 0 means they did not
Chronic obstructive pulmonary disease (COPD) in prior year integer A binary column where 1 means the patient had a record for chronic obstructive pulmonary disease in the prior year and 0 means they did not
Coronary artery disease (CAD) in prior year integer A binary column where 1 means the patient had a record for coronary artery disease in the prior year and 0 means they did not
Major depressive disorder, with NO occurrence of certain psychiatric disorder in prior year integer A binary column where 1 means the patient had a record for major depressive disorder and no certain psychiatric disorders in the prior year and 0 means they did not
Type 1 diabetes and no prior specific non-T1DM diabetes in prior year integer A binary column where 1 means the patient had a record for type 1 diabetes in the prior year and 0 means they did not
Type 2 Diabetes Mellitus (DM), with no type 1 or secondary DM in prior year integer A binary column where 1 means the patient had a record for type 2 diabetes in the prior year and 0 means they did not
Deep Vein Thrombosis (DVT) in prior year integer A binary column where 1 means the patient had a record for deep vein thrombosis in the prior year and 0 means they did not
Dyspnea in prior year integer A binary column where 1 means the patient had a record for dyspnea in the prior year and 0 means they did not
Edema in prior year integer A binary column where 1 means the patient had a record for edema in the prior year and 0 means they did not
Gastroesophageal reflux disease in prior year integer A binary column where 1 means the patient had a record for gastroesophageal reflux in the prior year and 0 means they did not
Acute gastrointestinal (GI) bleeding in prior year integer A binary column where 1 means the patient had a record for acute gastrointestinal bleeding in the prior year and 0 means they did not
Heart valve disorder in prior year integer A binary column where 1 means the patient had a record for heart valve disorder in the prior year and 0 means they did not
Chronic hepatitis in prior year integer A binary column where 1 means the patient had a record for chronic hepatitis in the prior year and 0 means they did not
Hyperlipidemia in prior year integer A binary column where 1 means the patient had a record for hyperlipidemia in the prior year and 0 means they did not
Hypertension in prior year integer A binary column where 1 means the patient had a record for hypertension in the prior year and 0 means they did not
Hypothyroidism in prior year integer A binary column where 1 means the patient had a record for hypothyroidism in the prior year and 0 means they did not
Inflammatory Bowel Disease in prior year integer A binary column where 1 means the patient had a record for inflammatory bowel disease in the prior year and 0 means they did not
Low back pain in prior year integer A binary column where 1 means the patient had a record for low back pain in the prior year and 0 means they did not
Occurrence of neuropathy in prior year integer A binary column where 1 means the patient had a record for neuropathy in the prior year and 0 means they did not
Obesity in prior year integer A binary column where 1 means the patient had a record for obesity in the prior year and 0 means they did not
Opioids in prior year integer A binary column where 1 means the patient had a record for an opioid in the prior year and 0 means they did not
Osteoporosis in prior year integer A binary column where 1 means the patient had a record for osteoporosis in the prior year and 0 means they did not
Peripheral vascular disease in prior year integer A binary column where 1 means the patient had a record for peripheral vascular disease in the prior year and 0 means they did not
Pneumonia in prior year integer A binary column where 1 means the patient had a record for pneumonia in the prior year and 0 means they did not
Psychotic disorder in prior year integer A binary column where 1 means the patient had a record for a psychotic disorder in the prior year and 0 means they did not
Acute Respiratory failure in prior year integer A binary column where 1 means the patient had a record for acute respiratory failure in the prior year and 0 means they did not
Rheumatoid Arthritis in prior year integer A binary column where 1 means the patient had a record for rheumatoid arthritis in the prior year and 0 means they did not
Seizure in prior year integer A binary column where 1 means the patient had a record for a seizure in the prior year and 0 means they did not
Sepsis in prior year integer A binary column where 1 means the patient had a record for sepsis in the prior year and 0 means they did not
Skin ulcer in prior year integer A binary column where 1 means the patient had a record for a skin ulcer in the prior year and 0 means they did not
Sleep apnea in prior year integer A binary column where 1 means the patient had a record for sleep apnea in the prior year and 0 means they did not
Smoking in prior year integer A binary column where 1 means the patient had a record for smoking in the prior year and 0 means they did not
STEROIDS in prior year integer A binary column where 1 means the patient had a record for any steroid in the prior year and 0 means they did not
Hemorrhagic stroke in an inpatient setting in prior year integer A binary column where 1 means the patient had a record for hemorrhagic stroke in the prior year and 0 means they did not
Non-hemorrhagic Stroke in an inpatient setting in prior year integer A binary column where 1 means the patient had a record for non-hemorrhagic stroke in the prior year and 0 means they did not
Urinary tract infectious disease in prior year integer A binary column where 1 means the patient had a record for urinary tract infection in the prior year and 0 means they did not
Antibiotics Carbapenems in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class carbapenems in the prior year and 0 means they did not
Antibiotics Aminoglycosides in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class aminoglycosides in the prior year and 0 means they did not
Antibiotics Cephalosporins in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class cephalosporins in the prior year and 0 means they did not
Antibiotics Fluoroquinolones in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class Fluoroquinolones in the prior year and 0 means they did not
Antibiotics Glycopeptides and lipoglycopeptides in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class glycopeptides and lipoglycopeptides in the prior year and 0 means they did not
Antibiotics Macrolides in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class macrolides in the prior year and 0 means they did not
Antibiotics Monobactams in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class monobactams in the prior year and 0 means they did not
Antibiotics Oxazolidinones in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class oxazolidinones in the prior year and 0 means they did not
Antibiotics Penicillins in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class penicillins in the prior year and 0 means they did not
Antibiotics Polypeptides in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class polypeptides in the prior year and 0 means they did not
Antibiotics Rifamycins in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class carbapenems in the prior year and 0 means they did not
Antibiotics Sulfonamides in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class sulfonamides in the prior year and 0 means they did not
Antibiotics Streptogramins in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class streptogramins in the prior year and 0 means they did not
Antibiotics Tetracyclines in prior year integer A binary column where 1 means the patient had a record for an antibiotic in the class tetracyclines in the prior year and 0 means they did not
predicted risk of Pulmonary Embolism numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having pulmonary embolism given their features (0 = 0% and 1= 100%)
predicted risk of Sudden Hearing Loss, No congenital anomaly or middle or inner ear conditions numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having sudden hearing loss given their features (0 = 0% and 1= 100%)
predicted risk of Restless Leg Syndrome numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having restless leg syndrome given their features (0 = 0% and 1= 100%)
predicted risk of Sudden Vision Loss, with no eye pathology causes numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having sudden vision loss given their features (0 = 0% and 1= 100%)
predicted risk of Muscle weakness or injury numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having muscle weakness or injury given their features (0 = 0% and 1= 100%)
predicted risk of Ankylosing Spondylitis numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having ankylosing spondylitis given their features (0 = 0% and 1= 100%)
predicted risk of Autoimmune hepatitis numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having autoimmune hepatitis given their features (0 = 0% and 1= 100%)
predicted risk of Multiple Sclerosis numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having multiple sclerosis given their features (0 = 0% and 1= 100%)
predicted risk of Acute pancreatitis, with No chronic or hereditary or common causes of pancreatitis numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having acute pancreatitis given their features (0 = 0% and 1= 100%)
predicted risk of Ulcerative colitis numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having ulcerative colitis given their features (0 = 0% and 1= 100%)
predicted risk of Migraine numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having a migraine given their features (0 = 0% and 1= 100%)
predicted risk of Dementia numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having dementia given their features (0 = 0% and 1= 100%)
predicted risk of Treatment resistant depression (TRD) numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having treatment resistant depression given their features (0 = 0% and 1= 100%)
predicted risk of Parkinson's disease, inpatient or with 2nd diagnosis numeric A value between 0 and 1 corresponding to the patient's predicted 1-year risk of having Parkinson's disease given their features (0 = 0% and 1= 100%)

Cleaning Script

Clean data provided by the R/Pharma team!