# Spark Fuse Notebook Course

Use this learning path to move from notebook fundamentals to advanced PySpark testing techniques. Work through the modules in order; each later notebook builds on the concepts introduced earlier.

## How to Use This Course

1. Open each notebook inside `notebooks/tutorials/` using Jupyter, VS Code, or Databricks.
2. Run the cells sequentially, taking notes and adapting the examples to your environment.
3. Revisit earlier notebooks whenever you need a refresher—the linked summaries below explain what you will learn in each module.

## Course Modules

- [01_notebook_fundamentals.ipynb](01_notebook_fundamentals.ipynb) — Understand interactive notebook concepts, cell types, and typical workflows.
- [02_python_essentials_for_pyspark.ipynb](02_python_essentials_for_pyspark.ipynb) — Refresh core Python patterns (collections, functions, pathlib) that underpin PySpark code.
- [03_getting_started_with_pyspark.ipynb](03_getting_started_with_pyspark.ipynb) — Spin up a SparkSession, build DataFrames, and perform foundational transformations.
- [04_spark_sql_vs_pyspark.ipynb](04_spark_sql_vs_pyspark.ipynb) — Compare equivalent logic expressed with the DataFrame API and Spark SQL.
- [05_spark_dataframe_joins.ipynb](05_spark_dataframe_joins.ipynb) — Practice inner, left, and broadcast joins using a shared dataset.
- [06_spark_window_functions.ipynb](06_spark_window_functions.ipynb) — Apply running totals, rolling averages, and ranking with window specifications.
- [07_testing_pyspark_workflows.ipynb](07_testing_pyspark_workflows.ipynb) — Learn pragmatic strategies for validating PySpark logic with assertions and pytest.
- [08_advanced_testing_with_pyspark_testing.ipynb](08_advanced_testing_with_pyspark_testing.ipynb) — Leverage `pyspark.testing` utilities for robust DataFrame comparisons and schema checks.

## Shared Resources

All notebooks rely on the demo dataset stored at `notebooks/data/orders_demo.csv`. The examples infer schema on load, so you can extend the dataset or swap in your own sample data as you progress.

## Exercises

- Sketch a learning schedule for the course and note which datasets or cluster resources you'll need for each module.
- Identify which existing projects could benefit from the concepts in each notebook.
- Share the overview with your team and gather additional topics they'd like to see covered.
