### Import statements

In [None]:
from pyspark import SparkConf, SparkContext

### Spark Configuration and Context

In [None]:
conf = SparkConf().setMaster("local").setAppName("MinTemperatures")
sc = SparkContext(conf = conf)

### Function Definition

In [None]:
def parseLine(line):
    fields = line.split(',')
    stationID = fields[0]
    entryType = fields[2]
    temperature = float(fields[3]) * 0.1 * (9.0 / 5.0) + 32.0
    return (stationID, entryType, temperature)

* Input: `line` is a string representing a line from the input CSV file.
* Split the Line: `fields = line.split(',')` splits the line into a list of strings based on the comma delimiter.
* Extract Fields:
  * `stationID = fields[0]`: Extracts the station ID.
  * `entryType = fields[2]`: Extracts the entry type
  * `temperature = float(fields[3]) * 0.1 * (9.0 / 5.0) + 32.0`: Convert the temperature from tenths of degrees Celsius to Fahrenheit.
* Return Tuples

### Parsing Lines

In [None]:
lines = sc.textFile("./1800.csv")
parsedLines = lines.map(parseLine)

* Applies the `pareLine` function function to rach element(line) in the `lines` RDD.

Example:
* Before map: `["ITE00100554,18000101,TMIN,-148", "ITE00100554,18000101,TMAX,-75", "EZE00100082,18000101,TMIN,-135"]`

</br>

* After map: `[("ITE00100554", "TMIN", 5.36), ("ITE00100554", "TMAX", 18.5), ("EZE00100082", "TMIN", 7.7)]`


### Filter for Min or Max Temperatures:

In [None]:
fil = "TMIN" # Min temperature or "TMAX" for max temperature
minTemps = parsedLines.filter(lambda x: fil in x[1])

### MApping to (StationID, Temperature)

In [None]:
stationTemps = minTemps.map(lambda x: (x[0], x[2]))

* `minTemps.map(lambda x: (x[0], x[2]))`: Transforms each element in the minTemps RDD to a tuple containing only the station ID and the temperature. This results in a new RDD called stationTemps.

Example:
* Before map: `[("ITE00100554", "TMIN", 5.36), ("EZE00100082", "TMIN", 7.7)]`

* After map: `[("ITE00100554", 5.36), ("EZE00100082", 7.7)]`

### Reducing to Minimum Temperature

In [None]:
minTemps = stationTemps.reduceByKey(lambda x, y: min(x, y))

* `stationTemps.reduceByKey(lambda x, y: min(x, y))`: Reduces the stationTemps RDD by key (stationID) to find the minimum temperature for each station. This results in a new RDD called minTemps.

In [None]:
results = minTemps.collect()
for result in results:
    print(result[0] + "\t{:.2f}F".format(result[1]))