Bloc n°3 : Analyse prédictive de données structurées par l'intelligence artificielle.
- Part 1 : make an EDA and all the necessary preprocessings to prepare data for machine learning
- Part 2 : train a linear regression model (baseline)
- Part 3 : avoid overfitting by training a regularized regression model
- Part 1 : make an EDA and the preprocessings and train a baseline model with the file data_train.csv
- Part 2 : improve your model's f1-score on your test set (you can try feature engineering, feature selection, regularization, non-linear models, hyperparameter optimization by grid search, etc...)
- Part 3 : Once you're satisfied with your model's score, you can use it to make some predictions with the file data_test.csv. You will have to dump the predictions into a .csv file that will be sent to Kaggle (actually, to your teacher/TA 🤓). You can make as many submissions as you want, feel free to try different models !
- Part 4 : Take some time to analyze your best model's parameters. Are there any lever for action that would help to improve the newsletter's conversion rate ? What recommendations would you make to the team ?
- Create an algorithm to find hot zones
- Visualize results on a nice dashboard
- 01 EDA : This file convert date to variables and categorie int to object for futur preprocessing
- 02 Linear regression : This Linear regression find the bests explications value for find the Weekly_Sales
- 03 Lasso Rigde : This model find a solution for a little bit prediction about 0.957 vs 0.943 (+ 1.4 %)
- 01 EDA : Dataset is cleaned and columns are plots on converted users
- 02 Model Prediction : A random forest predict a model and tree of decision is display
- 03 Model Conclusion : The model is reused on test set and plot explains values for this conclusion.
- 01 Uber Data : Datasets are merges and date cleaned
- 02 Uber Kmeans : Clusters are found with a kmeans algorithme