Feature Engineering, Spark ML Random Forest Model, Log MLFlow, Streaming Data Source
As data engineers, we need to make data available to our marketing analysts and data scientists for reporting and modeling. The first step in that process, is to read in data and define schemas.
- Read Mounted Data
- Create Dataframes
- View, Infer, and Define Schemas
Learning how to prepare data and load that transformed data into Databricks Delta Tables. We will:
- Merge Data
- Join Data
- Change Data Types
- Remove Duplicate Values
- Resolve Data Discrepancies
- Create Views using Delta Tables
Working as marketing analysts, we will explore our data and look for answers to a few questions:
How does customer spend compare across channels? When looking at discount amounts, do we see a dip in spend for higher discount amounts? Can we identify any instance in which a lower discount amount leads to higher spend or more conversions?
- Read a Databricks Delta Table
- Aggregate Data
- Quickly Visualize Data
- Build a Pipeline for Feature Engineering
- Train a Spark ML Random Forest Model
- Evaluate the Model and Tune Parameters
- Log Experiments with MLflow
- Connect to a Streaming Data Source
- View and Interact with Streaming Data
- Insert Streaming Data into Delta Table
- View Code for Constructing a Simple BI Report
- Create a Job to Run this Notebook
- Run the Job
- Read the File Generated from the Job Run
- View the DataFrame