Azure Databricks and Spark

Feature Engineering, Spark ML Random Forest Model, Log MLFlow, Streaming Data Source

Lab Overview

Create DataFrames

As data engineers, we need to make data available to our marketing analysts and data scientists for reporting and modeling. The first step in that process, is to read in data and define schemas.

Read Mounted Data
Create Dataframes
View, Infer, and Define Schemas

Transform and Load Data

Learning how to prepare data and load that transformed data into Databricks Delta Tables. We will:

Merge Data
Join Data
Change Data Types
Remove Duplicate Values
Resolve Data Discrepancies
Create Views using Delta Tables

Explore Data

Working as marketing analysts, we will explore our data and look for answers to a few questions:

How does customer spend compare across channels? When looking at discount amounts, do we see a dip in spend for higher discount amounts? Can we identify any instance in which a lower discount amount leads to higher spend or more conversions?

Read a Databricks Delta Table
Aggregate Data
Quickly Visualize Data

Machine Learning

Build a Pipeline for Feature Engineering
Train a Spark ML Random Forest Model
Evaluate the Model and Tune Parameters
Log Experiments with MLflow

Connect to Streaming Data

Connect to a Streaming Data Source
View and Interact with Streaming Data
Insert Streaming Data into Delta Table

Create and Run a Job

View Code for Constructing a Simple BI Report
Create a Job to Run this Notebook
Run the Job

View Job Output

Read the File Generated from the Job Run
View the DataFrame

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
code		code
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Databricks and Spark

Lab Overview

Create DataFrames

Transform and Load Data

Explore Data

Machine Learning

Connect to Streaming Data

Create and Run a Job

View Job Output

Azure Databricks is a Unified Analytics Platform for Data Engineers, Data Scientist, and Analysis

About

Releases

Packages

Languages

richiebachala/Databricks-and-Spark

Folders and files

Latest commit

History

Repository files navigation

Azure Databricks and Spark

Lab Overview

Create DataFrames

Transform and Load Data

Explore Data

Machine Learning

Connect to Streaming Data

Create and Run a Job

View Job Output

Azure Databricks is a Unified Analytics Platform for Data Engineers, Data Scientist, and Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages