Python scripts and notebook files
ml-uwash/course-2: Folder contains python notebooks for exercises from course 2 (Multiple Regression) of University of Washington's Machine Learning Specialization.
misc/wine: Folder contains analysis of wine quality data from the UCI machine learning repository. An analysis in R of the same data can be found here.
misc/bikeshare: Code for wrangling bikeshare data for Vancouver, Montreal and Toronto. The collected data is cleaned and inserted into a postgres database. Also includes code for downloading Canadian weather data which is then inserted into the postgres database. There are three tables in the database: list of bike stations, bicycle trips for all three cities and weather data for the periods when the trips were taken. The tables are linked. There are more than twenty million records in the bikeshare database (mostly from Montreal).
spark: Run a linear regression on an apache spark 4-node cluster on AWS using pyspark.