Skip to content

Data AI Resources

Zoiner Tejada edited this page Jun 10, 2019 · 33 revisions

This page summarizes the available content and code artifacts from around the web that the Solliance Data + AI Practice has helped to build.

Public Resources

Microsoft Cloud Workshops

6 hour content packages including whiteboard design and hackathon.

Title Link Description Data Set
Big Compute https://github.com/Microsoft/MCW-Big-Compute Using Azure Batch for distributed processing Big buck bunny video
Big data and visualization https://github.com/Microsoft/MCW-Big-Data-and-Visualization Using Azure Databricks to train a model that predicts flight delays, and uses Azure Machine Learning Service to deploy it to a web service US DOT Flight Delay data
Cosmos DB Real-Time Advanced Analytics https://github.com/Microsoft/MCW-Cosmos-DB-Real-Time-Advanced-Analytics Design a data pipeline solution leveraging Cosmos DB for scalable ingest and global distribution. Use Azure Databricks Delta with a modern data warehouse to reduce risk. Generator to simulate real-time transactions.
Cognitive Services and Deep Learning https://github.com/Microsoft/MCW-Cognitive-Services-and-Deep-Learning Combining custom text analytics models built in Azure Databricks with pre-built models from Cognitive Services for augmenting agent performance processing insurance claims. Custom authored text of insurance claims and images from a google search.
Data Platform Upgrade & Migration https://github.com/Microsoft/MCW-Data-Platform-upgrade-and-migration Migrate from Oracle to SQL Server, upgrade from SQL Server to Azure SQL Database Northwind database
Intelligent Analytics https://github.com/Microsoft/MCW-Intelligent-Analytics Real-time streaming analytics as shown using chat backed by Event Hubs and Stream Analytics User entered text
Intelligent Vending Machines https://github.com/Microsoft/MCW-Intelligent-Vending-Machines IoT solution for vending machines showing real-time analytics, cognitive services and SQL database in memory/columnar User generated transactions
Internet of Things https://github.com/Microsoft/MCW-Internet-of-Things Real-time analytics using IoT Hub and Stream Analytics Temperature telemetry generated via a simulated device
IoT and the Smart City https://github.com/Microsoft/MCW-IoT-for-Business Smart city solution using IoT Edge devices, IoT Hub, Stream Analytics and Time Series Insights to process telemetry from a city bus Telemetry generated from a simulator
Serverless Architecture https://github.com/Microsoft/MCW-Serverless-Architecture License plate processing using scale out serverless services including Functions, Event Grid, Application Insights, Cosmos DB and Logic Apps Supplied vehicle photos

Coming Soon

Title Link Description Data Set
MLOps https://github.com/solliancenet/MCW-operationalizing-deep-learning Azure DevOps and MLOps with Azure Machine Learning service
Deep Learning with Azure Databricks TBD TBD
Managed Open Source Databases on Azure https://github.com/solliancenet/MCW-Managed-open-source-databases-on-Azure TBD
Migrating to Azure SQL Database Managed Instance https://github.com/solliancenet/MCW-Migrating-to-Azure-SQL-Database-Managed-Instance TBD
Modernizing Data Analytics with SQL Server 2019 https://github.com/solliancenet/MCW-Modernizing-data-analytics-with-SQL-Server-2019 TBD TBD
Securing the IoT end-to-end https://github.com/solliancenet/MCW-Securing-the-IoT-end-to-end TBD

Dev Guides

Official guides (small books)

Title Link
Azure Data Architecture Guide https://docs.microsoft.com/en-us/azure/architecture/data-guide/
Azure Machine Learning Developer Guide https://github.com/solliancenet/Azure-Machine-Learning-Dev-Guide
HDInsight Developer Guide https://github.com/hdinsight/hdinsight-dev-guide/blob/master/HDInsight%20Developer%20Guide.pdf
SQL Server Developer Guide https://github.com/solliancenet/sql-dev-guide
Databricks Developer Guide https://github.com/solliancenet/azure-databricks-dev-guide

Labs & Quick Starts

Title Link
Azure Machine Learning Quickstarts https://github.com/solliancenet/azure-machine-learning-quickstarts
Azure Machine Learning Service Labs https://github.com/solliancenet/azure-machine-learning-service-labs

Tech Immersion Labs

1-hour labs designed for a managed labs environment.

Title Link
Handling Big Data with SQL Server 2019 Big Data Clusters https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day1-exp1/README.md
Leveraging Cosmos DB for near real-time analytics https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day1-exp2/README.md
Unlocking new capabilities with friction-free migrations to Azure SQL Managed Instance https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day1-exp3/README.md
Delivering the Modern Data Warehouse with Azure SQL Data Warehouse, Azure Databricks, Azure Data Factory, and Power BI https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day1-exp4/README.md
Quickly build comprehensive Bot solutions with the Virtual Assistant Solution Accelerator https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day2-exp1/README.md
Yield quick insights from unstructured data with Knowledge Mining and Cognitive Services https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day2-exp2/README.md
Better models made easy with Automated Machine Learning https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day2-exp3/README.md
Making deep learning portable with ONNX https://github.com/solliancenet/tech-immersion-data-ai/blob/master/day2-exp4/README.md

Microsoft Learn

Hands-on labs broken into 45 minute modules.

Modern Data Warehouse

Title Link Description Data Set Notebooks
Intro https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/01-Intro.dbc Spark and Azure Databricks intro Databricks supplied training data (10 M people, Databricks blog, city populations, small, airlines, flights, IP geocode and bike sharing) 01-Getting-Started, 02-Why-Spark (flights)
Azure SQL DW https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/02-Azure-SQL-DW.dbc Understanding and using the SQL DW Connector with Azure Databricks AdventureWorks sample database Understanding and using the SQL DW Connector with Azure Databricks (AdventureWorks)
Data Ingestion via Azure Data Factory https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/03-Data-Ingestion-Via-ADF.dbc Import data from a public Azure Storage account and transform it using ADF and a notebook activity Crime Data from 2016 01-Getting-Started, 02-Data-Ingestion (crime data), 03-Data-Transformation (crime data), includes/Databricks-Data-Transformation (crime data)
Reading Writing Data https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/04-Reading-Writing-Data.dbc Querying, Joins/Aggregates, Mounting Azure Storage, handling JSON using Spark and Databricks Databricks supplied training data (10 M people, Databricks blog, city populations, small, airlines, flights, IP geocode and bike sharing) 01-Getting-Started, 02-Querying-Files (10 M people), 03-Joins-Aggregations (10 M people, ssn), 04-Accessing-Data (IP geocode, bike sharing, state-income.csv, auto-mpg.csv), 05-Querying-JSON (Databricks blog), 06-Data-Lakes (crime data), 07-Key-Vault-backed-secret-scopes, 08-SQL-Database-Connect-Using-Key-Vault (AdventureWorks sample database), 09-Cosmos-DB-Connect-Using-Key-Vault (crime data), 10-Capstone-Project (crime data)
ETL with Databricks https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/05-UDFs-Libraries-Transforming-Data.dbc ETL, UDFs and Libraries, connecting to JDBC stores, Jobs Databricks supplied training data (Wikipedia, 10M People, etc) ETL-Part-1/01-Course-Overview-and-Setup, ETL-Part-1/02-ETL-Process-Overview (EDGAR log), ETL-Part-1/03-Connecting-to-Azure-Blob-Storage (crime data, Wikipedia), ETL-Part-1/04-Connecting-to-JDBC (hosted Postgres server), ETL-Part-1/05-Applying-Schemas-to-JSON-Data (zip codes, smartphone data), ETL-Part-1/06-Corrupt-Record-Handling (smartphone data), ETL-Part-1/07-Loading-Data-and-Productionalizing (crime data), ETL-Part-1/08-Capstone-Project (twitter), ETL-Part-2/01-Course-Overview-and-Setup, ETL-Part-2/02-Common-Transformations (people with dups), ETL-Part-2/03-User-Defined-Functions, ETL-Part-2/04-Advanced-UDFs (weather), ETL-Part-2/05-Joins-and-Lookup-Tables (day-of-week, Wikipedia, countries, EDGAR log), ETL-Part-2/06-Database-Writes (Wikipedia), ETL-Part-2/07-Table-Management, ETL-Part-2/08-Capstone-Project (twitter)
Databricks Delta https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/06-Databricks-Delta.dbc Create, Append and Upsert, Streaming with Delta, dealing with small files Databricks supplied training data (customer data, online retail data) 01-Introducing-Delta, 02-Create (customer data, online retail data), 03-Append (customer data, online retail data, structured streaming), 04-Upsert (customer data (mini), online retail data, structured streaming), 05-Streaming (smartphone accelerometer samples (definitive-guide/activity-data)), 06-Optimization (online retail data), 07-Architecture (Wikipedia streaming via hosted Kafka servers), 08-Capstone-Project (gaming data)
Visualization https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/07-Visualization.dbc Visualizing data in notebooks using built-in visualizers and with Power BI Databricks supplied data (10 M people, crime data) 01-Querying-Files (10 M people), 02-Capstone-Project (crime data), 03-Power-BI (10 M people), 04-Matplotlib (crime data)
Streaming https://github.com/solliancenet/proj-learning-paths-public/blob/master/modern-data-warehouse/08-Streaming.dbc Using Spark Structured Streaming with Event Hubs and Databricks Delta Generated streaming flights data 01-Getting-Started, 02-Spark-Structured-Streaming (generated streaming flights), 03-Event-Hubs (generated messages), 04-Streaming-with-Databricks-Delta (smartphone accelerometer samples (definitive-guide/activity-data))

Data Science with Azure Databricks

Title Link Description Data Set Notebooks
Setup https://github.com/solliancenet/proj-learning-paths-public/blob/master/data-science/00-setup.dbc Creating cluster and configuring libraries using the REST API N/A
Notebook Fundamentals https://github.com/solliancenet/proj-learning-paths-public/blob/master/data-science/01-notebook-fundamentals.dbc Code vs markdown cells, supported magics, running, navigating and managing cells, intro to Spark and Pandas DataFrames Locally created data
Exploratory Data Analysis https://github.com/solliancenet/proj-learning-paths-public/blob/master/data-science/02-exploratory-data-analysis.dbc Create tables with UI, adjusting dataframe data types, summarizing data, handling nulls, correlation, train parsimonious model, visualizations, logistic regression, widgets, model evaluation, one hot encoding, feature scaling, dimensionality reduction (PCA), random forest UsedCars.csv
Model Training, Selection and Evaluation https://github.com/solliancenet/proj-learning-paths-public/blob/master/data-science/03-model-training-selection-evaluation.dbc Using Scikit-learn for regression/classification and pipelines, data splitting, scaling and regression, feature selection, measuring error (MAE, RMSE, R2, precision, recall, RoC), cross validation, model explanation (ELI5), visualizations with matplotlib UsedCars.csv
Deep Learning https://github.com/solliancenet/proj-learning-paths-public/blob/master/data-science/04-deep-learning.dbc Tensorflow in a single node and distributed with Spark Deep Learning Pipelines, autoencoder neural network, Keras, dimensionality reduction (TSNE) Generated data, Fashion MNIST
Text Analytics https://github.com/solliancenet/proj-learning-paths-public/blob/master/data-science/05-text-analytics.dbc classification for text (Gensim, Scikit-learn, RNN/Keras, DNN/TFLearn), n-grams, word embeddings (bag of words, TF-IDF, Word2Vec), dimensionality reduction (PCA, TSNE) IMDB movie data, custom insurance claim text
Model Deployment https://github.com/solliancenet/proj-learning-paths-public/blob/master/data-science/06-model-deployment.dbc Azure Machine Learning Python SDK, creating AML Workspace, DBFS vs local storage, AML Run History, model evaluation, model deployment to Azure Container Instance AdultCensusIncome.csv

Presentations

Artifacts used in presentations delivered worldwide.

Title Link Description Data Set
Deep Learning for Developers http://bit.ly/deep-learning-for-devs Presentation deck and notebooks used in workshop Component compliance text and Fashion MNIST
Deploy Classifer to Azure ML https://github.com/solliancenet/data-ai-practice-public-demos/blob/master/Deploy%20Classifier%20Web%20Service%20end%20to%20end.ipynb Single notebook deploying a pre-created claims classification model using Azure Machine Learning N/A
Create and Deploy Flight Delays Model https://github.com/solliancenet/data-ai-practice-public-demos/tree/master/FlightDelaysSimple Series of short notebooks that create and deploy a Flight Delays model using Azure Databricks and Azure Machine Learning Service FlightDelaysWithAirportCodes.csv and FlightWeatherWithAirportCode.csv
Claim Classification in Azure Notebooks https://notebooks.azure.com/Solliance/libraries/claims-clean Using Azure Notebooks to show both Cognitive Services and Deep Learning custom insurance claim text and images from google
Clone this wiki locally
You can’t perform that action at this time.