### Setup and initialization

In [None]:
from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("Spendbycustomer")
sc = SparkContext(conf= conf)

##### 1. Importing the libraries:
* `SparkConf` and `SparkContext` from pyspark for Spark configuration and context creation.

##### 2. Configuring Spark
* `SparkConf().setMaster("local")`: Configures Spark to run locally with a single threat
* `setAppName("Spendbycustomer")`: Names the spark application

##### 3. Create SprakContext
* `sc = SparkContext(conf= conf)`: Initialize the Spark context with the specified configuration.

### Function to extract Customer-Price pairs

In [None]:
def extractCustomerPricePairs(line):
    fields = line.split(',')
    return (int(fields[0]), float(fields[2]))

##### 4. Define `extractCustomerPricePairs` Function:
* This function take a line of text and text as input, splits it by commas, return a tuple with the customer ID (as an integer) and the price as float

### Reading Input Data and Transformations

In [None]:
input = sc.textFile("./source/customer-orders.csv")
mappedInput = input.map(extractCustomerPricePairs)
totalByCustomer = mappedInput.reduceByKey(lambda x, y: x + y)

##### 5. Loading data
* `sc.textFile("..csv")`: Reads the csv file into an RDD called `input`. Each element is a line from the CSV file in local file

##### 6. Mapping input
* `input.map(extractCustomerPricePairs)`: Applies extract function to each line of the input RDD, resulting in an RDD of (customer, price) pairs.

##### 7. Reducing by key
* `mappedInput.reduceByKey(lambda x, y: x + y)`: Sums up the prices for each customer ID. This results in an RDD of (customerID, totalSpent) pairs.

### Flipping and Sorting 

In [None]:
flipped = totalByCustomer.map(lambda x: (x[1], x[0]))
totalByCustomersorted = flipped.sortByKey()

##### 8. Flipping Key-Value Pairs:

* `totalByCustomer.map(lambda x: (x[1], x[0]))`: Switches the positions of the key and value in each pair. This results in an RDD of (totalSpent, customerID) pairs.

##### 9. Sorting by Key:

* `flipped.sortByKey()`: Sorts the RDD by the total spent (key) in ascending order.

### Collecting and printing results

In [None]:
results = totalByCustomerSorted.collect()
for result in results:
    print(result)