(dataset from Kaggle)
The data comes from a survey conducted among math and Portuguese language students in secondary school. The aim of the analysis is to assess the existence of a relationship between students' private life and school performance.
The notebook is organized as follows:
- EDA (exploratory data analysis)
- data preprocessing (encoding, oversampling)
- model evaluation (classification problem where the variable to predict is the letter grade)
- feature importance
- outliers removal
- performance on the test set
- conclusions
NOTE: github performs a static render of the notebooks so it doesn't include the embedded HTML/JavaScript that makes up a plotly graph. To view the notebook with the content rendered: https://nbviewer.jupyter.org/github/gae7/alcohol_consumption/blob/main/student_alcohol_consumption.ipynb
