# Spark Streaming with PySpark
## Module 1: Course Introduction & Agenda

Welcome to this comprehensive series on **Spark Streaming with PySpark**. Real-time data processing is one of the most in-demand skills in the data engineering landscape today.

In this course, we will move beyond static batch processing and learn how to process data as it arrives, enabling real-time analytics, dashboards, and decision-making.

### What is this course about?
This series is designed to take you from the fundamental concepts of **Structured Streaming** to advanced, real-world implementations using industry-standard tools like Kafka, Redis, and Cosmos DB.

## Prerequisites

Before diving into the streaming concepts, ensure you have the following foundational knowledge. If you are new to these topics, it is highly recommended to brush up on them first.

1.  **Basic Python:** Understanding of functions, libraries, and data structures.
2.  **Apache Spark Core:** Knowledge of how Spark works (Drivers, Executors).
3.  **PySpark DataFrames:** Familiarity with transformations, actions, and the DataFrame API.

*Note: This course assumes you have a working PySpark environment set up (Local or Docker).*

## Course Agenda

Here is the roadmap of what we will cover in the upcoming notebooks:

1.  **Structured Streaming Basics:** Understanding the "What, When, How, and Where" of streaming.
2.  **Code Migration:** How to convert standard Spark **Batch** code into **Streaming** code.
3.  **Integrations:** Real-time coding examples connecting Spark with:
    *   **Apache Kafka** (Message Broker)
    *   **Redis** (In-memory Data Store)
    *   **Cosmos DB** (NoSQL Database)
4.  **File Formats:** Handling various file formats (JSON, CSV, Parquet) in a streaming context.
5.  **Time Handling:**
    *   Event-time processing.
    *   Handling late data and watermarking.
6.  **Optimization:** Performance tuning techniques specific to Spark Streaming.

In [None]:
# Let's perform a quick sanity check to ensure your PySpark environment is ready.
# We will initialize a SparkSession and check the version.

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("Streaming_Intro_Check") \
    .getOrCreate()

print("Spark Session Created Successfully!")
print(f"Spark Version: {spark.version}")

# If this cell runs without errors, you are ready for the next module.

## Up Next

In the next notebook, we will cover the **Basics of Spark Streaming**. We will answer four fundamental questions to build our theoretical foundation:

1.  **What** is Structured Streaming?
2.  **When** should you use it?
3.  **How** does it work under the hood?
4.  **Where** does it fit in the data ecosystem?

See you in **Notebook 02!**