**Problem:**  
You have a DataFrame `df` with the following data:

| user_id | email              |
|---------|---------------------|
| 1       | john.doe@example.com|
| 2       | alice.smith@sample.org |
| 3       | john.doe@example.com|
| 4       | bob_jones@mail.net  |

**Task:** Use a regular expression to extract the domain names from the email addresses, then find the distinct domain names in the DataFrame.


In [None]:
import pandas as pd

df = pd.DataFrame({
    'user_id': [1, 2, 3, 4],
    'email': ['john.doe@example.com', 'alice.smith@sample.org', 'john.doe@example.com', 'bob_jones@mail.net']
})

df['domain'] = df['email'].str.extract(r'@([^.]+)\.')
distinct_domains = df['domain'].unique()

print(distinct_domains)


**Problem:**  
You have a PySpark DataFrame `df` with the following columns:

| transaction_id | customer_id | amount |
|----------------|-------------|--------|
| 1              | 101         | 500    |
| 2              | 102         | 300    |
| 3              | 101         | 700    |
| 4              | 103         | 200    |

**Task:** 
1. Add a new transaction for `customer_id` 104 with an amount of 400.
2. Update the amount for `customer_id` 102 to 350.
3. Calculate the total amount spent by each customer.


In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import when, sum as spark_sum

spark = SparkSession.builder.appName("example").getOrCreate()

data = [
    (1, 101, 500),
    (2, 102, 300),
    (3, 101, 700),
    (4, 103, 200)
]
columns = ['transaction_id', 'customer_id', 'amount']
df = spark.createDataFrame(data, columns)

new_data = [(5, 104, 400)]
new_df = spark.createDataFrame(new_data, columns)
df = df.union(new_df)

df = df.withColumn(
    'amount',
    when(df.customer_id == 102, 350).otherwise(df.amount)
)

result = df.groupBy('customer_id').agg(spark_sum('amount').alias('total_amount'))
result.show()
