Skip to content

kevcraig/iaa_2021

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distruted Analytics & Machine Learning - Dan Zaratsian, March 2021


IAA Module - Session 1 - Distributed Services and Platform Overview

Asset Directory

Slides


IAA Module - Session 2 - SQL and NoSQL Services

Asset Directory

Slides

  • Hadoop 101
  • Intro to Apache Hive
  • Apache Hive Syntax and Schema Design
  • Intro to Apache HBase and Apache Phoenix (NoSQL)
  • Apache HBase Schema Design & Best Practices
  • Apache Phoenix Syntax
  • Intro to Apache SparkSQL
  • Apache SparkSQL
  • BigQuery (Serverless SQL)
  • Google Cloud Firestore (NoSQL)

Assignment


IAA Module - Session 3 - Spark Data Processing & Machine Learning

Asset Directory

Slides

  • Apache Spark Overview
  • Spark Machine Learning (MLlib)
  • ML Pipelines
  • Building and deploying Spark machine learning models
  • Considerations for ML in distributed environments
  • Spark Best Practices and Tuning
  • Spark Code Walk-through (within Google Colab)

Assignment


IAA Module - Session 4 - SparkML & Scikit-learn Model Deployment

NOTE: Slides from this week were a continuation from Session 3


IAA Module - Session 5 - Realtime, Streaming Systems

Asset Directory

Slides

  • Apache Kafka
  • Google PubSub
  • Demo of PubSub
  • Spark Streaming
  • Demo of Spark Streaming
  • Apache Beam (Google Dataflow)

IAA Module - Session 6 - CloudML & Serveless Deployments

Asset Directory

Slides

Assignment


References:

About

Institute for Advanced Analytics 2021

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 75.9%
  • Python 23.1%
  • Shell 1.0%