Segmented sellers based on sale/no_sale as well as on avg LTV (rmse ~BRL289) to find what characteristics makes a top seller, following CRISP-DM process model.
- Business Understanding
- Data Understanding: Explored and Wranged data from 9 datasets with +100k observations (one of the datasets have +1M observations!) and max 15 variables. Wordclouds of good/bad/neutral scored reviews
- Data Preparation: Engineered Features and cleaned city names. Dealt with missing values and encoded categorical data.
- Data Modeling: Optimized Random Forest Regressor and Classifier using GridCV.
- Results Evaluation: Extracted actionable insights from model with Partial Dependant Plots.
- Python Version: 3.8.5
- Packages: pandas, numpy, sklearn, matplotlib, seaborn, plotly, fuzzywuzzy, wordcloud, graphviz, pdpbox
Please read this project' post for a business point of view
- 00byMontse.py: A few auxiliary basic functions. i.e. Summary data function based on describe() method
- 01Olist_EDA: Explored data on each dataset individually to better understand the provided data. Looked at the distributions of the data and value counts.
- 02Olist_DataWrangling: Merged, Cleaned & Prepared Data to create a 'basic Seller Dataset' based on their characteristics, acquisition journey and sales to date. Also wranged a second version including (cleaned) location data.
- 03Olist_ProductBusinessSegments: Answered business questions: -Is there any Business Segment/Product Category doing particularly well … or particularly badly? (Sellers acquired between Dic,2017-Aug,2018)
- 04Olist_SellerSegmentation_Binary: Seller Segmentation based on Sale/NoSale. What does it take to sell?
- 05Olist_SellerSegmentation_LTV: Seller Segmentation based on avg LTV (1 month). What does it take to be a top seller?
- 06Olist Channels:What Channels bring in the most Top Sellers?
- JungJoon Lee - on EDA with eCommerce Marketplace (Seller Side)
- Eduardo Cuducos - on getting a clean list of Brazillian cities
- Gene Diaz jr - on stopwords in Portuguese
- Sankarshana Kadambari - on how to clean city column