## Travel Startup
After graduating from USF, you found a startup that aims to provide personalized travel itineraries using big data analysis. Given your own personal preferences, build a plan for a year of travel across 5 locations. Or, in other words: pick 5 regions. What is the best time of year to visit them based on the dataset?

Part of this involves determining the comfort index for a region. You could incorporate several features: not too hot, not too cold, dry, humid, windy, etc. There are several different ways of calculating this available online, and you could also analyze how well your own metrics do.

Another part of this involves presentation. You have to convince your potential customers that your travel itinerary is better than something they could come up with themselves with a little Googling. You can use pictures, information about local points of interest, etc.

According to [Britannica's report](https://www.britannica.com/science/temperature-humidity-index), the formula of calculating The discomprt index is 15 + 0.4(dry-bulb temperature + web-bulb temperature(F)). However, in the NAM dataset wo do not have the air temperature(equals to dry-bulb temperature), we took the surface temperature as the dry-bulb(well, they are similar enough). Most people are quite comfortable when the index is below 70 and very uncomfortable when the index is above 80 to 85.  

There are a lot of approaches to cacluate the wet-bulb temperature. The [How to calcylate the wet bulb temperature](https://www.omnicalculator.com/physics/wet-bulb#how-to-calculate-the-wet-bulb-temperature) gives us a way, by dry-bulb(Td) and relative humidity(rh), to simply obtain the wet-bulb-temperature(Tw).  
Tw = Td * arctan[0.151977 * (rh + 8.313659)^(1/2)] + arctan(Td + rh) - arctan(rh - 1.676331) + 0.00391838 *(rh)^(3/2) * arctan(0.023101 * rh) - 4.686035   

Within the wet/dry-bulb temperature, the discomprt index indicates 


In [1]:
import math
import geohash

def wetBuldCalculator(temp_celsius, rh):
    return temp_celsius \
    * math.atan(0.151977 * (rh + 8.313659)**(1/2)) \
    + math.atan(temp_celsius + rh) \
    - math.atan(rh - 1.676331) \
    + 0.00391838 * (rh)**(3/2) \
    * math.atan(0.023101 * rh) \
    - 4.686035

def confortIndex(dry_buld, rh):
    dry_buld_c = dry_buld - 273.15
    wet_buld_c = wetBuldCalculator(dry_buld_c, rh)
    ci = 15 + 0.4 * ((dry_buld_c + wet_buld_c) * (9.0 / 5.0) + 32)
    return ci



In [2]:
#geohash.encode(42.392469, -71.215895)

In [3]:
import datetime

def parseLine(line):
    variables = line.split("\t")
    try:
        milliseconds = int(variables[0])
        dt = datetime.datetime.fromtimestamp(milliseconds/1000.0)
        lat = float(variables[1])
        lon = float(variables[2])
        rh = float(variables[8])
        temperatureK = float(variables[10])
        ci = confortIndex(temperatureK, rh)
        gh = geohash.encode(lat, lon)
        return (gh, dt.month, ci)
    except:
        return ('', 0, 0)
    
text_file = sc.textFile("hdfs://orion11:12001/pj3/3hr_sample/sampled_2015/")


def month_ci_location(gh):
    parsed_data = text_file \
    .map(lambda line: parseLine(line)) \
    .filter(lambda data: data[0].startswith(gh)) 
    month_ci_goruped = parsed_data.map(lambda data: (data[1], data[2])).groupByKey()
    result = []
    for element in month_ci_goruped.collect():
        count = 0
        ci_sum = 0
        for ci in element[1]:
            count += 1
            ci_sum += ci
        result.append((element[0], ci_sum/count))
    return result


In [4]:
boston = month_ci_location("9q8y")

for data in boston:
    print(data)


(1, 41.94664656612848)
(2, 43.56754931646432)
(3, 41.400634301655266)
(4, 41.125913633577795)
(5, 40.30798040814995)
(6, 43.03409235851835)
(7, 48.04026539833832)
(8, 47.453832664177305)
(9, 47.36085540383962)
(10, 47.544040385811535)
(11, 43.14175256378289)
(12, 43.385680187793625)
