**Introduction to Python for Data Science and Data Engineering**

In [0]:
%python
class LowBalanceError(Exception):
    """Exception raised when an account balance is too low."""
    def __init__(self, current_balance, required_amount, message="Insufficient balance"):
        self.current_balance = current_balance
        self.required_amount = required_amount
        self.message = f"{message}. Current: {current_balance}, Required: {required_amount}"
        super().__init__(self.message)

def withdraw(account_balance, amount):
    if amount <= 0:
        raise ValueError("Withdrawal amount must be positive.")
    if account_balance < amount:
        raise LowBalanceError(account_balance, amount)
    if (account_balance - amount)<100:
        raise ValueError("Minimum balance should be maintained")
    return account_balance - amount

# --- Usage ---
account_bal = 700

try:
    new_bal = withdraw(account_bal, 700)
    print(f"Withdrawal successful. New balance: {new_bal}")
except LowBalanceError as e:
    print(f"Error: {e}")
    print(f"Current balance: {e.current_balance}, Attempted withdrawal: {e.required_amount}")
except ValueError as e:
    print(f"Input Error: {e}")
except Exception as e: # Catch any other unexpected errors
    print(f"An unexpected error occurred: {e}")

print("\n--- Another attempt ---")
try:
    new_bal = withdraw(account_bal, -100)
    print(f"Withdrawal successful. New balance: {new_bal}")
except LowBalanceError as e:
    print(f"Error: {e}")
except ValueError as e:
    print(f"Input Error: {e}")

print("\n--- Successful attempt ---")
try:
    new_bal = withdraw(account_bal, 200)
    print(f"Withdrawal successful. New balance: {new_bal}")
except LowBalanceError as e:
    print(f"Error: {e}")
except ValueError as e:
    print(f"Input Error: {e}")

In [0]:
%python
display(dbutils.fs.mounts())

Spark Components 

  ![](https://jmp.sh/s/YzccOZReG7jdljk57jox)

  Spark Core (RDD API)-> Dataframe API -> Spark SQL/ Spark R API/ MLlib/ Structured Streaming API

# Spark Runtime Architecture 
Driver / Cluster Manager / Workers / Executers 
###  Driver    
- Responsible for planning and co-ordinating execution.
- Creates the SparkSession, the entry point to all spark applications.
- analyzes spark application and constructs DAG
- Schedules and distributes tasks to executors for execution
- monitors the progress of tasks and handles failures
- returns results to the client 
###  Cluster Manager/Master
- Manages cluster resources and allocates them to driver
###  Workers
 - Nodes in the cluster that host ececutors.
###  Executors 
- Processes on worker nodes that execute tasks assigned by the driver.
- Run on worker nodes in a spark cluster and host Tasks.
- Store intermediate and final resluts in memory or on disk.
- Interact with the driver for task co-ordination and data transfer


# The Spark DAG 
 - Spark jobs are broken down in stages i.e group of tasks that can be run in parallel.
 - Computations flow in one direction through the stages 
 - Stages never loop back, ensuring the job terminates 
 - Stages are organized into a dependency graph for execution flow.

The Spark UI
 Visualising Spark applications 

 Spark provides web user interfaces for monitoring and management including

###  Application UI 
  - Per application SparkSession 
  - Track Application progress and task execution 
  - DAG visualization and stage details
  - Resource usage and performance metrics.

### Master UI 
  - Per cluster
  - Worker node status and health and cluster-wide resource allocation 
  - Shows all running applications and available resources.

# Spark Clusters in databricks 

- **All purpose clusters** - interactive clusters that support notebooks , jobs, dashboards with auto termination
- **Job Cluster** - Clusters that stat when a job runs and terminate automatically upon completion, optimized for non interactive workloads.
- **SQL Warehouses** - Optimized clusters for SQL query performance with instant startup and auto-scaling to balance cost and performance.

In [0]:
%python
import math
def primecheck(myNum):
	if myNum==2:
		return True
	elif myNum%2==0 or myNum==1:
		return False
	x=int(math.sqrt(myNum))
	if x%2==0:
		strtno=x+1
	else:
		strtno=x
	for i in range(strtno,1,-2):
		if myNum%i==0:
			return False
	return True

In [0]:
%python
from pyspark.sql.functions import *
import time 

df = spark.range(0,100)
df = df.withColumn("isPrime", call_udf("primecheck", col("id")))
df = df.filter(col("isPrime")==True)

In [0]:
display(df)