Skip to content

sarahocampo/iaa_2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distruted Analytics & Machine Learning - Dan Zaratsian, March 2021


IAA Module - Session 1 - Distributed Services and Platform Overview

Slides


IAA Module - Session 2 - SQL and NoSQL Services

Slides

  • Hadoop 101
  • Intro to Apache Hive
  • Apache Hive Syntax and Schema Design
  • Intro to Apache HBase and Apache Phoenix (NoSQL)
  • Apache HBase Schema Design & Best Practices
  • Apache Phoenix Syntax
  • Intro to Apache SparkSQL
  • Apache SparkSQL
  • BigQuery (Serverless SQL)
  • Google Cloud Firestore (NoSQL)

Assignment


IAA Module - Session 3 - Spark Data Processing & Machine Learning

Slides

  • Apache Spark Overview
  • Spark Machine Learning (MLlib)
  • ML Pipelines
  • Building and deploying Spark machine learning models
  • Considerations for ML in distributed environments
  • Spark Best Practices and Tuning
  • Spark Code Walk-through (within Google Colab)

Assignment


IAA Module - Session 4 - Realtime, Streaming Systems

Slides

  • Apache Kafka
  • Google PubSub
  • Demo of PubSub
  • Spark Streaming
  • Demo of Spark Streaming
  • Apache Beam (Google Dataflow)

IAA Module - Session 5 - Serverless Technology

Slides

  • Overview of Serverless
  • Serverless ML
  • BigQuery ML
  • Google Cloud Functions
  • Google Cloud AutoML

IAA Module - Session 6 - Cloud Machine Learning and Deployments

Slides

  • Overview of Google Cloud and general cloud services for ML Deployment
  • Google Cloud AI Platform
  • Demo ML Model Deployment for NFL Play Predictions (link to repo)
  • Cloud Deployments - App Engine
  • Demo App Engine Deployment
  • Cloud Deployments - Kubernetes
  • Demo Kubernetes Deployment

References:

About

Institute for Advanced Analytics 2021

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published