Skip to content

Data AI Resources

Zoiner Tejada edited this page Jun 10, 2019 · 33 revisions

This page summarizes the available content and code artifacts from around the web that the Solliance Data + AI Practice has helped to build.

Public Resources

Microsoft Cloud Workshops

6 hour content packages including whiteboard design and hackathon.

Title Link Description Data Set
Big Compute Using Azure Batch for distributed processing Big buck bunny video
Big data and visualization Using Azure Databricks to train a model that predicts flight delays, and uses Azure Machine Learning Service to deploy it to a web service US DOT Flight Delay data
Cosmos DB Real-Time Advanced Analytics Design a data pipeline solution leveraging Cosmos DB for scalable ingest and global distribution. Use Azure Databricks Delta with a modern data warehouse to reduce risk. Generator to simulate real-time transactions.
Cognitive Services and Deep Learning Combining custom text analytics models built in Azure Databricks with pre-built models from Cognitive Services for augmenting agent performance processing insurance claims. Custom authored text of insurance claims and images from a google search.
Data Platform Upgrade & Migration Migrate from Oracle to SQL Server, upgrade from SQL Server to Azure SQL Database Northwind database
Intelligent Analytics Real-time streaming analytics as shown using chat backed by Event Hubs and Stream Analytics User entered text
Intelligent Vending Machines IoT solution for vending machines showing real-time analytics, cognitive services and SQL database in memory/columnar User generated transactions
Internet of Things Real-time analytics using IoT Hub and Stream Analytics Temperature telemetry generated via a simulated device
IoT and the Smart City Smart city solution using IoT Edge devices, IoT Hub, Stream Analytics and Time Series Insights to process telemetry from a city bus Telemetry generated from a simulator
Serverless Architecture License plate processing using scale out serverless services including Functions, Event Grid, Application Insights, Cosmos DB and Logic Apps Supplied vehicle photos

Coming Soon

Title Link Description Data Set
MLOps Azure DevOps and MLOps with Azure Machine Learning service
Deep Learning with Azure Databricks TBD TBD
Managed Open Source Databases on Azure TBD
Migrating to Azure SQL Database Managed Instance TBD
Modernizing Data Analytics with SQL Server 2019 TBD TBD
Securing the IoT end-to-end TBD

Dev Guides

Official guides (small books)

Title Link
Azure Data Architecture Guide
Azure Machine Learning Developer Guide
HDInsight Developer Guide
SQL Server Developer Guide
Databricks Developer Guide

Labs & Quick Starts

Title Link
Azure Machine Learning Quickstarts
Azure Machine Learning Service Labs

Tech Immersion Labs

1-hour labs designed for a managed labs environment.

Title Link
Handling Big Data with SQL Server 2019 Big Data Clusters
Leveraging Cosmos DB for near real-time analytics
Unlocking new capabilities with friction-free migrations to Azure SQL Managed Instance
Delivering the Modern Data Warehouse with Azure SQL Data Warehouse, Azure Databricks, Azure Data Factory, and Power BI
Quickly build comprehensive Bot solutions with the Virtual Assistant Solution Accelerator
Yield quick insights from unstructured data with Knowledge Mining and Cognitive Services
Better models made easy with Automated Machine Learning
Making deep learning portable with ONNX

Microsoft Learn

Hands-on labs broken into 45 minute modules.

Modern Data Warehouse

Title Link Description Data Set Notebooks
Intro Spark and Azure Databricks intro Databricks supplied training data (10 M people, Databricks blog, city populations, small, airlines, flights, IP geocode and bike sharing) 01-Getting-Started, 02-Why-Spark (flights)
Azure SQL DW Understanding and using the SQL DW Connector with Azure Databricks AdventureWorks sample database Understanding and using the SQL DW Connector with Azure Databricks (AdventureWorks)
Data Ingestion via Azure Data Factory Import data from a public Azure Storage account and transform it using ADF and a notebook activity Crime Data from 2016 01-Getting-Started, 02-Data-Ingestion (crime data), 03-Data-Transformation (crime data), includes/Databricks-Data-Transformation (crime data)
Reading Writing Data Querying, Joins/Aggregates, Mounting Azure Storage, handling JSON using Spark and Databricks Databricks supplied training data (10 M people, Databricks blog, city populations, small, airlines, flights, IP geocode and bike sharing) 01-Getting-Started, 02-Querying-Files (10 M people), 03-Joins-Aggregations (10 M people, ssn), 04-Accessing-Data (IP geocode, bike sharing, state-income.csv, auto-mpg.csv), 05-Querying-JSON (Databricks blog), 06-Data-Lakes (crime data), 07-Key-Vault-backed-secret-scopes, 08-SQL-Database-Connect-Using-Key-Vault (AdventureWorks sample database), 09-Cosmos-DB-Connect-Using-Key-Vault (crime data), 10-Capstone-Project (crime data)
ETL with Databricks ETL, UDFs and Libraries, connecting to JDBC stores, Jobs Databricks supplied training data (Wikipedia, 10M People, etc) ETL-Part-1/01-Course-Overview-and-Setup, ETL-Part-1/02-ETL-Process-Overview (EDGAR log), ETL-Part-1/03-Connecting-to-Azure-Blob-Storage (crime data, Wikipedia), ETL-Part-1/04-Connecting-to-JDBC (hosted Postgres server), ETL-Part-1/05-Applying-Schemas-to-JSON-Data (zip codes, smartphone data), ETL-Part-1/06-Corrupt-Record-Handling (smartphone data), ETL-Part-1/07-Loading-Data-and-Productionalizing (crime data), ETL-Part-1/08-Capstone-Project (twitter), ETL-Part-2/01-Course-Overview-and-Setup, ETL-Part-2/02-Common-Transformations (people with dups), ETL-Part-2/03-User-Defined-Functions, ETL-Part-2/04-Advanced-UDFs (weather), ETL-Part-2/05-Joins-and-Lookup-Tables (day-of-week, Wikipedia, countries, EDGAR log), ETL-Part-2/06-Database-Writes (Wikipedia), ETL-Part-2/07-Table-Management, ETL-Part-2/08-Capstone-Project (twitter)
Databricks Delta Create, Append and Upsert, Streaming with Delta, dealing with small files Databricks supplied training data (customer data, online retail data) 01-Introducing-Delta, 02-Create (customer data, online retail data), 03-Append (customer data, online retail data, structured streaming), 04-Upsert (customer data (mini), online retail data, structured streaming), 05-Streaming (smartphone accelerometer samples (definitive-guide/activity-data)), 06-Optimization (online retail data), 07-Architecture (Wikipedia streaming via hosted Kafka servers), 08-Capstone-Project (gaming data)
Visualization Visualizing data in notebooks using built-in visualizers and with Power BI Databricks supplied data (10 M people, crime data) 01-Querying-Files (10 M people), 02-Capstone-Project (crime data), 03-Power-BI (10 M people), 04-Matplotlib (crime data)
Streaming Using Spark Structured Streaming with Event Hubs and Databricks Delta Generated streaming flights data 01-Getting-Started, 02-Spark-Structured-Streaming (generated streaming flights), 03-Event-Hubs (generated messages), 04-Streaming-with-Databricks-Delta (smartphone accelerometer samples (definitive-guide/activity-data))

Data Science with Azure Databricks

Title Link Description Data Set Notebooks
Setup Creating cluster and configuring libraries using the REST API N/A
Notebook Fundamentals Code vs markdown cells, supported magics, running, navigating and managing cells, intro to Spark and Pandas DataFrames Locally created data
Exploratory Data Analysis Create tables with UI, adjusting dataframe data types, summarizing data, handling nulls, correlation, train parsimonious model, visualizations, logistic regression, widgets, model evaluation, one hot encoding, feature scaling, dimensionality reduction (PCA), random forest UsedCars.csv
Model Training, Selection and Evaluation Using Scikit-learn for regression/classification and pipelines, data splitting, scaling and regression, feature selection, measuring error (MAE, RMSE, R2, precision, recall, RoC), cross validation, model explanation (ELI5), visualizations with matplotlib UsedCars.csv
Deep Learning Tensorflow in a single node and distributed with Spark Deep Learning Pipelines, autoencoder neural network, Keras, dimensionality reduction (TSNE) Generated data, Fashion MNIST
Text Analytics classification for text (Gensim, Scikit-learn, RNN/Keras, DNN/TFLearn), n-grams, word embeddings (bag of words, TF-IDF, Word2Vec), dimensionality reduction (PCA, TSNE) IMDB movie data, custom insurance claim text
Model Deployment Azure Machine Learning Python SDK, creating AML Workspace, DBFS vs local storage, AML Run History, model evaluation, model deployment to Azure Container Instance AdultCensusIncome.csv


Artifacts used in presentations delivered worldwide.

Title Link Description Data Set
Deep Learning for Developers Presentation deck and notebooks used in workshop Component compliance text and Fashion MNIST
Deploy Classifer to Azure ML Single notebook deploying a pre-created claims classification model using Azure Machine Learning N/A
Create and Deploy Flight Delays Model Series of short notebooks that create and deploy a Flight Delays model using Azure Databricks and Azure Machine Learning Service FlightDelaysWithAirportCodes.csv and FlightWeatherWithAirportCode.csv
Claim Classification in Azure Notebooks Using Azure Notebooks to show both Cognitive Services and Deep Learning custom insurance claim text and images from google
Clone this wiki locally
You can’t perform that action at this time.