A collection of various small data science projects:
- 20-news-project: intro to spacy and basic NN for news classification
- clustering-challenge-otto: unsupervised learning on the challenging Otto Online Dataset (61k rows, 93 features)
- crawling-yelp: catching Yelp reviews of restaurants in Vienna's inner districts via a Python crawler
- dash-basics-macrodata: visualizing macroeconomic data of Latin America with a very basic dash webapp
- dimred-exploration: exploring dimensionality reduction methods in scikit-learn based on a Sebastian Raschka's tutorial and Hands-On Machine Learning with Scikit-Learn and TensorFlow
- mnist-deepdive: running different algorithms on the MNIST dataset (SVM, Boosting, Random Forests, NN)
- point-patterns-python: examining the geo distribution of Wifi Hotspots in Linz, Austria (German)
- quora-project: explored and tested few ideas for Kaggle Insincere Questions Challenge, code running on Google Cloud Deep Learning VM
- scikit-learn-trees-forests-exploration: exploring scikit-learn's decision trees and random forests using a self-created as well as a telemarketing dataset
- spark-hdinsight-exploration: using a spark cluster on HDInsight to analyze the click dataset of Kengoo (15m rows), exploring Spark's MLLib incl. pipelines and logistic regression