This is my third Udacity Machine Learning Engineer nanodegree project.
-
To view my work, open the creating_customer_segments.ipynb file. The rest are supporting files or less readable formats of the same work.
-
As the name suggests, this Unsupervised Learning project is centered on the creation of a machine learning model capable of being used by a wholesale distributor to determine the different types of customers that it has, so that it may offer a new customer the type service that has been most useful for similar customers.
-
This Jupyter Notebook project is comprised of Python code blocks with my contributions including the word 'Implementation' in the section header, and with the 'TODO' keyword in the comments preceding my code contribution.
-
Furthermore, there are 12 conceptual questions that I have answered thoroughly and demonstrate my understanding of the data and the work that I engaged in with it.
-
The main machine learning concepts covered in this project include the following:
-
Data Exploration: This included determining basic statistical breakdown of the dataset, making a hypothesis as to what the different types of clients are based on their feature values, and then determining which features are most relevant for distinguishing the different customer types from each other.
-
Data Preprocessing: Applied feature scaling to generate a more normal distribution of the data, and made a decision about how to treat outliers
-
Feature Transformation: Reduced dimensionality of the data with Principle Component Analysis (PCA), generated clusters, and derived insights from those clusters.
-
Conclusion: determined how the wholesale distributor can use the model to gain an understanding of which services it should offer to different types of clients. Example-A/B testing.