# Spark and Minimum/Maximum Temperatures
***
<a href='https://github.com/pick1'> <img src='sparkjupyter.png' /></a>
***

## This project explored Spark's .filter methods to filter minimum and maximum temperatures from a historic weather database.

## Importing Spark

In [1]:
from pyspark import SparkConf, SparkContext
import collections

## Setting up Spark

In [2]:
conf = SparkConf().setMaster("local").setAppName("RatingsHistogram")
sc = SparkContext(conf = conf)

## Loading the Dataset

In [3]:
lines = sc.textFile("1800.csv")

## Defining function for slicing and calculating temp.
**Converting to Fahrenheit from 10ths degrees Celsius.**

In [4]:
def lineParse(line):
    fields = line.split(',')
    stationID = fields[0]
    entryType = fields[2]
    temp = float(fields[3]) * 0.1 *(9.0 / 5.0) + 32.0
    return (stationID, entryType, temp)

In [5]:
parse = lines.map(lineParse)

## Filtering for MinTemps

In [6]:
minTemps = parse.filter(lambda x: 'TMIN' in x[1])

## Removing the TMIN feature; Keeping only StationId and Temp

In [7]:
stationTempsMin = minTemps.map(lambda x: (x[0], x[2]))

## Getting the lowest temperate with .reduceByKey

In [9]:
minTemps = stationTempsMin.reduceByKey(lambda x, y: min(x,y))

## Filtering for MaxTemps

In [10]:
maxTemps = parse.filter(lambda x: 'TMAX' in x[1])

## Removing the TMAX feature; Keeping only StationId and Temp

In [11]:
stationTempsMax = maxTemps.map(lambda x: (x[0], x[2]))

## Getting the highest temperate with .reduceByKey

In [12]:
maxTemps = stationTempsMax.reduceByKey(lambda x, y: min(x,y))

## Spark is Go!

In [14]:
print('Minimum Temperatures were: ', '\n')
resultMin = minTemps.collect();
for result in resultMin:
    print(result[0] + "\t{:.2f}F".format(result[1]))
print('\n'*2)    
print('Maximun Temperatures were: ', '\n')
resultMax = maxTemps.collect();
for result in resultMax:
    print(result[0] + "\t{:.2f}F".format(result[1]))

Minimum Temperatures were:  

ITE00100554	5.36F
EZE00100082	7.70F



Maximun Temperatures were:  

ITE00100554	18.50F
EZE00100082	16.52F


## Conclusion:
**ITE00100554 is the ID for a weather station in Milan and EZE00100082 is for Prague. This project used the .filter method to find both the minimum and maximum temperatures recorded at each weather station.**