# Uploading sample e-commerce CSV file
### This file is taken from the kaggle data sets "ecommerce_transactions.csv" and uploaded using Data Ingestion

#### Read data into DataFrame using the notebooks magic command "%python"

In [0]:
%python
events = spark.table("default.ecommerce_transactions")

In [0]:
%python
events.printSchema()

root
 |-- Transaction_ID: long (nullable = true)
 |-- User_Name: string (nullable = true)
 |-- Age: long (nullable = true)
 |-- Country: string (nullable = true)
 |-- Product_Category: string (nullable = true)
 |-- Purchase_Amount: double (nullable = true)
 |-- Payment_Method: string (nullable = true)
 |-- Transaction_Date: date (nullable = true)



#### Show the dataset in DataFrame table format

In [0]:
%python
display(events)

Transaction_ID,User_Name,Age,Country,Product_Category,Purchase_Amount,Payment_Method,Transaction_Date
1,Ava Hall,63,Mexico,Clothing,780.69,Debit Card,2023-04-14
2,Sophia Hall,59,India,Beauty,738.56,PayPal,2023-07-30
3,Elijah Thompson,26,France,Books,178.34,Credit Card,2023-09-17
4,Elijah White,43,Mexico,Sports,401.09,UPI,2023-06-21
5,Ava Harris,48,Germany,Beauty,594.83,Net Banking,2024-10-29
6,Elijah Harris,51,India,Toys,966.5,Cash on Delivery,2025-01-18
7,Oliver Clark,27,Germany,Home & Kitchen,341.73,Credit Card,2024-03-13
8,Olivia Allen,46,Canada,Home & Kitchen,11.33,Debit Card,2024-01-04
9,Liam Harris,54,France,Beauty,279.43,Cash on Delivery,2023-12-06
10,Liam Allen,60,Canada,Beauty,223.9,Cash on Delivery,2023-08-07


#### Performing basic operations : select, filter, groupBy, orderBy

Selecting specific columns

In [0]:
%python
events.select("Product_Category", "Payment_Method", "Purchase_Amount").show(10)

+----------------+----------------+---------------+
|Product_Category|  Payment_Method|Purchase_Amount|
+----------------+----------------+---------------+
|        Clothing|      Debit Card|         780.69|
|          Beauty|          PayPal|         738.56|
|           Books|     Credit Card|         178.34|
|          Sports|             UPI|         401.09|
|          Beauty|     Net Banking|         594.83|
|            Toys|Cash on Delivery|          966.5|
|  Home & Kitchen|     Credit Card|         341.73|
|  Home & Kitchen|      Debit Card|          11.33|
|          Beauty|Cash on Delivery|         279.43|
|          Beauty|Cash on Delivery|          223.9|
+----------------+----------------+---------------+
only showing top 10 rows



Filter by country

In [0]:
%python
events.filter("Country = 'India'").show(10)

+--------------+-----------------+---+-------+----------------+---------------+----------------+----------------+
|Transaction_ID|        User_Name|Age|Country|Product_Category|Purchase_Amount|  Payment_Method|Transaction_Date|
+--------------+-----------------+---+-------+----------------+---------------+----------------+----------------+
|             2|      Sophia Hall| 59|  India|          Beauty|         738.56|          PayPal|      2023-07-30|
|             6|    Elijah Harris| 51|  India|            Toys|          966.5|Cash on Delivery|      2025-01-18|
|            20|   James Thompson| 43|  India|            Toys|         849.67|Cash on Delivery|      2023-04-07|
|            39|    Elijah Harris| 50|  India|          Sports|         452.37|          PayPal|      2024-12-14|
|            51|      James Clark| 32|  India|         Grocery|          242.3|Cash on Delivery|      2023-06-01|
|            53|      James Lewis| 26|  India|        Clothing|         331.66|         

GROUP BY - Transactions per Payment Method

In [0]:
%python
events.groupBy("Payment_Method").count().show()

+----------------+-----+
|  Payment_Method|count|
+----------------+-----+
|      Debit Card| 8355|
|          PayPal| 8250|
|     Credit Card| 8310|
|             UPI| 8477|
|     Net Banking| 8174|
|Cash on Delivery| 8434|
+----------------+-----+



ORDER BY - Highest spending countries

In [0]:
%python
country_spend = (
    events.groupBy("Country")
    .sum("Purchase_Amount")
    .orderBy("sum(Purchase_Amount)", ascending=False)
)

country_spend.show()

+---------+--------------------+
|  Country|sum(Purchase_Amount)|
+---------+--------------------+
|   France|   2545739.189999998|
|   Canada|  2544335.1200000006|
|      USA|   2541220.219999997|
|   Mexico|  2534475.6700000037|
|Australia|   2514911.650000008|
|   Brazil|  2507287.5400000005|
|    India|   2503542.709999989|
|  Germany|   2502442.199999994|
|    Japan|  2492312.1999999983|
|       UK|   2471723.150000001|
+---------+--------------------+

