Drug consumption dataset from UCI Machine Learning Repositry The goal is to predict whether an individual would be a hard drug user.
Database contains records for 1885 respondents. For each respondent 12 attributes are known: Personality measurements and demographic information. All input attributes are originally categorical and are quantified. After quantification values of all input features can be considered as real-valued. In addition, participants were questioned concerning their use of 18 legal and illegal drugs on the last used.
The dataset is a record of 1885 respondents attributes on:
- Personality measurements:
- NEO-FFI-R (Neuroticism, Extraversion, Openness to experience, Agreeableness, and Conscientiousness)
- BIS-11 (Impulsivity)
- ImpSS (Sensation Seeking)
- Demographic information:
- Level of education
- Age
- Gender
- Country of residence
- Ethnicity
- Self-reported drug consumption on 18 substances + Semeron:
Alcohol, Amphetamines, Amyl nitrite, Benzodiazepine, Cannabis, Chocolate, Cocaine, Caffeine, Crack, Ecstasy, Heroin, Ketamine, Legal highs, LSD, Methadone, Mushrooms, Nicotine and Volatile substance
and one fictitious drug (Semeron) 0. fdakj
- dflkj
There are 10,299 observations within the dataset, with training and test set split at 70% / 30%.
The jupyter notebook was the code that was used in examining the study,