Skip to content

mrpaulandrewltd/Microsoft-Data-Integration-Pipeline-Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Microsoft Data Integration Pipeline Training

The Fundamentals to Level 300

with Paul Andrew

Slide Header

Hey friends and welcome to my training workshop on Microsoft Data Integration Pipelines.

Overview

In this full day of training, we’ll start with the very basics and learn how to orchestrate your Azure data platform from start to finish. You will learn how to build out Azure control flow and data flow components as processing pipelines using Azure Data Factory and Azure Synapse Analytics. We’ll start by covering the fundamentals within the resources and together build out pipelines that ingest data from local source systems, transform and serve it to consumers. We’ll then continue taking an end-to-end look at our Azure integration pipeline tools within highly scalable cloud native architectures, dealing with triggering, monitoring, dynamic pipeline content as well as CI/CD practices. Start the day knowing nothing about Azure Data Integration pipelines and leave with the knowledge, slides, demos, and code to apply these resources in your role as a data engineering professional.


Objectives

  • How cloud native data integration resources have evolved over time.
  • What the basic data pipeline artifacts are.
  • What the common data movement deployment patterns are.
  • How to build complex, high dynamic control flows.
  • How to massively scale out executions and handle parallel orchestration workloads.
  • Best practices for the deployment of orchestration resources.

Agenda

The following offers an insight into the complete agenda and module breakdown for this workshop.

  • Module 1: Pipeline Fundamentals - Slides PDF >>>
    • The History of Azure Orchestration
    • Synapse Analytics vs Data Factory vs Microsoft Fabric
    • Integration Components
    • Common Activities
    • Execution Dependencies

  • Module 2: Integration Runtime Design Patterns - Slides PDF >>>
    • Compute Types
      • Azure
      • Hosted
      • SSIS
    • Patterns & Configuration

  • Module 3: Data Transformation - Slides PDF >>>
    • Data Flows
    • Power Query Injection
    • Spark Configuration
    • Use Cases

  • Labs: Getting Hands On
    • Create Azure Resources
    • Build a Copy Pipeline
    • Create a Reusable Pipeline
    • Author a Data Flow
    • Monitor Factory Activities
    • Explore Synapse Pipelines
    • Explore Fabric Pipelines
    • Mini-project

  • Module 4: Dynamic Pipelines - Slides PDF >>>
    • Expressions & Interpolation
    • Simple Metadata Driven Execution
    • Dynamic Content Chains
    • Reference Names

  • Module 5: Pipeline Extensibility - Slides PDF >>>
    • Azure Batch Service
      • Tasks
      • Compute Pools
      • Scaling
    • Pipeline Custom Activities
    • Azure Management API
    • Azure Functions

  • Module 6: Execution Parallelism - Slides PDF >>>
    • Control Flow Scale Out
    • Concurrency Limitations
    • Internal vs External Activities
    • Orchestration Framework - See Cloud Formations: CF.Cumulus

  • Module 7: VNet Integration - Slides PDF >>>
    • Private Endpoints
    • Managed VNet's
    • Firewall Bypass

  • Module 8: Security - Slides PDF >>>
    • Service Principals
    • Managed Identities
    • Azure Key Vault Integration
    • Customer Managed Keys
    • Pipeline Access & Permissions

  • Module 9: Monitoring & Alerting - Slides PDF >>>
    • Studio Monitoring
    • Log Analytics & Kusto Queries
    • Operational Dashboards
    • Advanced Alerting

  • Module 10: Solution Testing - Slides PDF >>>
    • Development Time Validation
    • Test Coverage
    • NUnit Tests

  • Module 11: CI/CD - Slides PDF >>>
    • Source Control vs Developer UI
    • Basic ARM Template Deployments
    • Advanced Deployment Patterns

  • Module 12: Final Thoughts - Slides PDF >>>
    • Running Costs
    • Conclusions
    • Best Practices

Suggested Prerequisites

If participating in any of these training workshop there will be labs to work through and demo code to optionally participate in. These labs will focus on the development of Azure data platform resources, it is therefore recommended that you bring the following ready to use. There will be little spare time for initial setup work.

  • Most importantly, access to a Microsoft Azure Tenant including a usable Azure Subscription.
    • A free trial account is sufficient, but please have this setup prior to the event to avoid delays.
    • This should include the ability to provision resources in an Azure Resource Group with owner level access.
  • A developer laptop with power and some form of WiFi connectivity (sorry if obvious).
  • Suggested software to be installed on your laptop to make the learning experience run smoothly:
    • A modern web browser, Microsoft Edge or similar as preferred.
    • A suitable IDE, VSCode or Visual Studio including Azure development extensions.
    • Database tools, SQL Server Management Studio or Azure Data Studio.
    • GitHub desktop or similar for repository interaction.
    • Azure Storage Explorer.
    • A PDF file viewer.
  • Play the Azure Icon Game, it will help. See blog post for context: https://mrpaulandrew.com/2017/12/15/the-azure-icon-game

For software downloads, please complete these tasks prior to the event to avoid internet bandwidth contention for other attendees.

Many thanks


Speaker Biography

Paul (AKA @mrpaulandrew) is the Founder & CTO of Cloud Formations, a specialist data consultancy based in the UK. With nearly 20 years’ experience designing and delivering Microsoft data architectures, Paul leads a passionate team of engineers, supporting businesses small and large with scalable cloud platforms. Business value delivered through data insights. Over the years, Paul has covered the breadth and depth of design patterns and industry leading concepts, including Lambda, Kappa, Delta Lake, Data Mesh and Data Fabric.

Paul is also a Microsoft Data Platform MVP, organiser for the Data Relay community conference, East Midlands user group leader, book author and mentor. In addition to the day job(s), Paul is a father of three, husband, foodie, runner, blood donor, geek, Lego, and Star Wars fan! Lastly, Paul confesses to enjoying a Ramstein playlist when given half a chance to do some coding for a customer project.

Speaker Contact Details

Contact QR Code

mrpaulandrew.com/contact


About

Training workshop content on Azure Data Factory and Azure Synapse Analytics Data Integration Pipelines

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project