Skip to content

hgasimov/Data_Science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 

Repository files navigation

Data_Science

This is a repository of programming assignments of the "Intorduction to Data Science" MOOC from University of Washington at coursera.

Course Syllabus

Chapter 0: Introduction

  • Examples, data science articulated, history and context, technology landscape

Chapter 1: Data Manipulation, at Scale

  • Databases and the relational algebra
  • Parallel databases, parallel query processing, in-database analytics
  • MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages
  • Key-value stores and NoSQL; tradeoffs of SQL and NoSQL
  • Entity resolution, record linkage, data cleaning

Chapter 2: Analytics

  • Basic statistical modeling, experiment design, introduction to machine learning, overfitting
  • Supervised learning: overview, simple nearest neighbor, decision trees/forests, regression
  • Unsupervised learning: k-means, multi-dimensional scaling
  • Graph Analytics: PageRank, community detection, recursive queries, iterative processing
  • Text Analytics: latent semantic analysis
  • Collaborative Filtering: slope-one

Chapter 3: Communicating Results

  • Visualization, data products, visual data analytics
  • Provenance, privacy, ethics, governance

Chapter 4: Guest Lectures

  • Guest Lectures: AMPLab, Datameer, SciDB, more

About

Data Science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages