# Alternative: Run via Databricks CLI

If the Databricks kernel isn't working, you can run cells remotely using the CLI:

1. **Upload this notebook to workspace**: `databricks workspace import interactive_demo.ipynb /Workspace/Users/jq22184@hotmail.com/interactive_demo.ipynb --language PYTHON --format JUPYTER`
2. **Run in workspace**: Open it in the Databricks UI
3. **Or use the .py version** below for individual cell execution

# Databricks Interactive Notebook Demo

This notebook demonstrates how to run PySpark code interactively against your Databricks cluster from VS Code.

## Prerequisites
- Databricks extension installed in VS Code
- Authenticated with your workspace (using codespaces profile)
- Running cluster available

In [1]:
# Test connection to Spark
print("Testing Spark connection...")
print(f"Spark version: {spark.version}")
print(f"Spark context: {spark.sparkContext}")

Testing Spark connection...


NameError: name 'spark' is not defined

## Create Sample Data

Let's create a simple DataFrame with some sample data.

In [None]:
# Create sample data
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Create a sample DataFrame
data = [
    ("Alice", 25, "Engineer"),
    ("Bob", 30, "Manager"), 
    ("Charlie", 35, "Analyst"),
    ("Diana", 28, "Designer")
]

schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True),
    StructField("role", StringType(), True)
])

df = spark.createDataFrame(data, schema)
print("DataFrame created successfully!")
df.show()

## Data Transformations

Now let's perform some basic transformations and aggregations.

In [None]:
# Add a new column and filter data
df_transformed = df.withColumn("age_group", 
    F.when(F.col("age") < 30, "Young")
     .otherwise("Experienced")
)

print("DataFrame with age groups:")
df_transformed.show()

# Group by age group and count
print("Count by age group:")
df_transformed.groupBy("age_group").count().show()

## File Operations

Let's try reading from and writing to DBFS (Databricks File System).

In [None]:
# Write DataFrame to DBFS
output_path = "/tmp/sample_data.parquet"

df_transformed.write.mode("overwrite").parquet(output_path)
print(f"Data written to: {output_path}")

# Read it back
df_read = spark.read.parquet(output_path)
print("Data read back from DBFS:")
df_read.show()

## Delta Lake Example

If your workspace supports Delta Lake, try this example:

In [None]:
# Delta Lake example (if available)
try:
    delta_path = "/tmp/sample_delta_table"
    
    # Write as Delta table
    df_transformed.write.format("delta").mode("overwrite").save(delta_path)
    print(f"Delta table written to: {delta_path}")
    
    # Read Delta table
    df_delta = spark.read.format("delta").load(delta_path)
    print("Delta table contents:")
    df_delta.show()
    
except Exception as e:
    print(f"Delta Lake not available or error occurred: {e}")

## Display Function

Use the `display()` function for rich visualizations (when available):

In [None]:
# Try using display function for better visualization
try:
    display(df_transformed)
except NameError:
    print("display() function not available in this context")
    print("Showing DataFrame with .show() instead:")
    df_transformed.show()