Institute for Advanced Analytics

Distruted Analytics & Machine Learning - Dan Zaratsian, March 2021

IAA Module - Session 1 - Distributed Services and Platform Overview

Slides

Introduction and Module Agenda
Distributed Computing
Walk-through of Tools and Services for Big Data
Distributed Architectures and Use Cases
Google Colab Notebook Environment
Google BigQuery Sandbox

IAA Module - Session 2 - SQL and NoSQL Services

Slides

Hadoop 101
Intro to Apache Hive
Apache Hive Syntax and Schema Design
Intro to Apache HBase and Apache Phoenix (NoSQL)
Apache HBase Schema Design & Best Practices
Apache Phoenix Syntax
Intro to Apache SparkSQL
Apache SparkSQL
BigQuery (Serverless SQL)
Google Cloud Firestore (NoSQL)

Assignment

Assignment 1 SQL - Solution
- Due on Friday, March 26
- Please complete as an individual assignment
- Email your code and answers to d.zaratsian@gmail.com
Assignment 2 NoSQL - Solution (Due on Friday, March 26)
- Due on Friday, March 26
- Please complete as an individual assignment
- Email your code and answers to d.zaratsian@gmail.com

IAA Module - Session 3 - Spark Data Processing & Machine Learning

Slides

Apache Spark Overview
Spark Machine Learning (MLlib)
ML Pipelines
Building and deploying Spark machine learning models
Considerations for ML in distributed environments
Spark Best Practices and Tuning
Spark Code Walk-through (within Google Colab)

Assignment

Assignment 3 - Solution
- Due on Friday, April 2
- Please complete as an individual assignment
- Email your code to d.zaratsian@gmail.com

IAA Module - Session 4 - Realtime, Streaming Systems

Slides

Apache Kafka
Google PubSub
Demo of PubSub
Spark Streaming
Demo of Spark Streaming
Apache Beam (Google Dataflow)

IAA Module - Session 5 - Serverless Technology

Slides

Overview of Serverless
Serverless ML
BigQuery ML
Google Cloud Functions
Google Cloud AutoML

IAA Module - Session 6 - Cloud Machine Learning and Deployments

Slides

Overview of Google Cloud and general cloud services for ML Deployment
Google Cloud AI Platform
Demo ML Model Deployment for NFL Play Predictions (link to repo)
Cloud Deployments - App Engine
Demo App Engine Deployment
Cloud Deployments - Kubernetes
Demo Kubernetes Deployment

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
session_01		session_01
session_02		session_02
session_03		session_03
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

session_01

session_01

session_02

session_02

session_03

session_03

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Institute for Advanced Analytics

IAA Module - Session 1 - Distributed Services and Platform Overview

IAA Module - Session 2 - SQL and NoSQL Services

IAA Module - Session 3 - Spark Data Processing & Machine Learning

IAA Module - Session 4 - Realtime, Streaming Systems

IAA Module - Session 5 - Serverless Technology

IAA Module - Session 6 - Cloud Machine Learning and Deployments

References:

About

Releases

Packages

Languages

License

sarahocampo/iaa_2021

Folders and files

Latest commit

History

Repository files navigation

IAA Module - Session 1 - Distributed Services and Platform Overview

IAA Module - Session 2 - SQL and NoSQL Services

IAA Module - Session 3 - Spark Data Processing & Machine Learning

IAA Module - Session 4 - Realtime, Streaming Systems

IAA Module - Session 5 - Serverless Technology

IAA Module - Session 6 - Cloud Machine Learning and Deployments

References:

About

Resources

License

Stars

Watchers

Forks

Languages