Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 2.29 KB

README.md

File metadata and controls

40 lines (30 loc) · 2.29 KB

Bloc n°3 : Analyse prédictive de données structurées par l'intelligence artificielle.

Contact

voguant-cal0n@icloud.com

Video explain

Bloc n°3 : Analyse prédictive de données structurées par l'intelligence artificielle.

Goals

Walmart Sales

  • Part 1 : make an EDA and all the necessary preprocessings to prepare data for machine learning
  • Part 2 : train a linear regression model (baseline)
  • Part 3 : avoid overfitting by training a regularized regression model

Conversion Rate Challenge

  • Part 1 : make an EDA and the preprocessings and train a baseline model with the file data_train.csv
  • Part 2 : improve your model's f1-score on your test set (you can try feature engineering, feature selection, regularization, non-linear models, hyperparameter optimization by grid search, etc...)
  • Part 3 : Once you're satisfied with your model's score, you can use it to make some predictions with the file data_test.csv. You will have to dump the predictions into a .csv file that will be sent to Kaggle (actually, to your teacher/TA 🤓). You can make as many submissions as you want, feel free to try different models !
  • Part 4 : Take some time to analyze your best model's parameters. Are there any lever for action that would help to improve the newsletter's conversion rate ? What recommendations would you make to the team ?

Uber Pickups.

  • Create an algorithm to find hot zones
  • Visualize results on a nice dashboard

Informations about files:

Walmart Sales

  • 01 EDA : This file convert date to variables and categorie int to object for futur preprocessing
  • 02 Linear regression : This Linear regression find the bests explications value for find the Weekly_Sales
  • 03 Lasso Rigde : This model find a solution for a little bit prediction about 0.957 vs 0.943 (+ 1.4 %)

Conversion Rate Challenge

  • 01 EDA : Dataset is cleaned and columns are plots on converted users
  • 02 Model Prediction : A random forest predict a model and tree of decision is display
  • 03 Model Conclusion : The model is reused on test set and plot explains values for this conclusion.

Uber Pickups.

  • 01 Uber Data : Datasets are merges and date cleaned
  • 02 Uber Kmeans : Clusters are found with a kmeans algorithme