Bloc n°3 : Analyse prédictive de données structurées par l'intelligence artificielle.

Contact

voguant-cal0n@icloud.com

Video explain

Bloc n°3 : Analyse prédictive de données structurées par l'intelligence artificielle.

Goals

Walmart Sales

Part 1 : make an EDA and all the necessary preprocessings to prepare data for machine learning
Part 2 : train a linear regression model (baseline)
Part 3 : avoid overfitting by training a regularized regression model

Conversion Rate Challenge

Part 1 : make an EDA and the preprocessings and train a baseline model with the file data_train.csv
Part 2 : improve your model's f1-score on your test set (you can try feature engineering, feature selection, regularization, non-linear models, hyperparameter optimization by grid search, etc...)
Part 3 : Once you're satisfied with your model's score, you can use it to make some predictions with the file data_test.csv. You will have to dump the predictions into a .csv file that will be sent to Kaggle (actually, to your teacher/TA 🤓). You can make as many submissions as you want, feel free to try different models !
Part 4 : Take some time to analyze your best model's parameters. Are there any lever for action that would help to improve the newsletter's conversion rate ? What recommendations would you make to the team ?

Uber Pickups.

Create an algorithm to find hot zones
Visualize results on a nice dashboard

Informations about files:

Walmart Sales

01 EDA : This file convert date to variables and categorie int to object for futur preprocessing
02 Linear regression : This Linear regression find the bests explications value for find the Weekly_Sales
03 Lasso Rigde : This model find a solution for a little bit prediction about 0.957 vs 0.943 (+ 1.4 %)

Conversion Rate Challenge

01 EDA : Dataset is cleaned and columns are plots on converted users
02 Model Prediction : A random forest predict a model and tree of decision is display
03 Model Conclusion : The model is reused on test set and plot explains values for this conclusion.

Uber Pickups.

01 Uber Data : Datasets are merges and date cleaned
02 Uber Kmeans : Clusters are found with a kmeans algorithme