This is a repository of programming assignments of the "Intorduction to Data Science" MOOC from University of Washington at coursera.
Course Syllabus
Chapter 0: Introduction
- Examples, data science articulated, history and context, technology landscape
Chapter 1: Data Manipulation, at Scale
- Databases and the relational algebra
- Parallel databases, parallel query processing, in-database analytics
- MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages
- Key-value stores and NoSQL; tradeoffs of SQL and NoSQL
- Entity resolution, record linkage, data cleaning
Chapter 2: Analytics
- Basic statistical modeling, experiment design, introduction to machine learning, overfitting
- Supervised learning: overview, simple nearest neighbor, decision trees/forests, regression
- Unsupervised learning: k-means, multi-dimensional scaling
- Graph Analytics: PageRank, community detection, recursive queries, iterative processing
- Text Analytics: latent semantic analysis
- Collaborative Filtering: slope-one
Chapter 3: Communicating Results
- Visualization, data products, visual data analytics
- Provenance, privacy, ethics, governance
Chapter 4: Guest Lectures
- Guest Lectures: AMPLab, Datameer, SciDB, more