#How to Best Visualize a Dataset Easily
This is the code for this video by Siraj Raval on Youtube. The human activities dataset contains 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. The data set is downloaded from here. This code downloads the dataset, cleans it, creates feature vectors, then uses T-SNE to reduce the dimensionality of the feature vectors to just 2. Then, we use matplotlib to visualize the data.
- numpy (http://www.numpy.org/)
- scikit-learn (http://scikit-learn.org/)
- matplotlib (http://matplotlib.org/)
Install dependencies via 'pip install'. (i.e pip install pandas).
Note** updated dataset is here if the other link is broken http://rstudio-pubs-static.s3.amazonaws.com/19668_2a08e88c36ab4b47876a589bb1d61c37.html
To run this code, just run the following in terminal:
The challenge for this video is to visualize this Game of Thrones dataset. Use T-SNE to lower the dimensionality of the data and plot it using matplotlib. In your README, write our 2-3 sentences of something you discovered about the data after visualizing it. This will be great practice in understanding why dimensionality reduction is so important and analyzing data visually.
##Due Date is December 29th 2016
The credits for this code go to Yifeng-He. I've merely created a wrapper around the code to make it easy for people to get started.