# Get 10 day Weather forecast 
- Jim Haskin

- GA-Data Science
- Dec 2015

- 2/17/2016

## Method
This 10 day forecast will be used to run through the predictive model to get the projected crime level for the next 10 days
- Use the Weather Underground API to get their forecast.
- Use the OpenWeatherMap API and the pyOWM wrapper to get their prediction for fields that were not in the Weather Underground API
- Calculate additional fields that were also calculated for the Training data.
- Write to file to be run with model

## Working notes
- TEMP : added both status fields. Need to see what works best with the historical 'events'
- additional features still to add (See Below)

## Sections

- [Weather Underground json](#Weather-Underground-json)
- [OWM-OpenWeatherMap](#OWM-OpenWeatherMap)
- [Combine both forecasts](#Combine-both-forecasts)
- [Add in Features](#Add-in-Features)
- [Write forecast data to file](#Write-forecast-data-to-file)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sb
%matplotlib inline
import pyowm
from datetime import timedelta
from sf_sun_moon import sf_sun_moon
from pywws import conversions as cv

In [2]:
import requests
# use sleep to stop us from getting timed out
from time import sleep

# Weather Underground json

[[back to top](#Sections)]

### Use json and request to pull in a 10 day forcast from Weather Underground

In [3]:
api_key = '3bfa0f909a078622'
#api_key = 3bfa0f909a078622

In [4]:
# use requests library to interact with a URL
r = requests.get('http://api.wunderground.com/api/3bfa0f909a078622/forecast10day/q/CA/San_Francisco.json')

In [5]:
# check the status: 200 means success, 4xx means error
r.status_code

200

In [6]:
data = r.json()['forecast']['simpleforecast']['forecastday']

In [7]:
#data[0]

## Create a data frame from the 10 day forcasts
Fields
- date
- day of week
- day
- month
- year
- max temp
- mean temperature
- min temp
- temperature change
- mean humidity
- max wind speed
- mean wind speed
- precipitation
- conditions/events
- mean wind direction
- Sun minutes
- Sun units
- Moon phase


In [8]:
rows = []
for day in data:
    ddd = []
    dayr = day['date']['day']
    monthr = day['date']['month']
    yearr = day['date']['year']
    ddd.append(str(yearr)+'-'+str(monthr)+'-'+str(dayr))
    ddd.append(day['date']['weekday'].lower())    
    ddd.append(dayr)    
    ddd.append(monthr)    
    ddd.append(yearr)
    temp_h = int(day['high']['fahrenheit'])
    temp_l = int(day['low']['fahrenheit'])
    temp_d = temp_h - temp_l
    temp_m = (temp_h + temp_l) / 2
    
    ddd.append(temp_h)
    ddd.append(temp_m)
    ddd.append(temp_l)
    ddd.append(temp_d)
# ALWAYS 0     ddd.append(day['maxhumidity'])
    ddd.append(day['avehumidity'])
# ALWAYS 0    ddd.append(day['minhumidity'])
    ddd.append(day['maxwind']['mph'])
    ddd.append(day['avewind']['mph'])
    ddd.append(day['qpf_allday']['in'])
    ddd.append(day['conditions'])
    ddd.append(day['avewind']['degrees'])
    rt = pd.Series(ddd)
    rows.append(rt)
colnames = ['date', 'dayofweek', 'day', 'month', 'year', 
            'temp_max', 'temp_mean', 'temp_min', 'temp_delta',
            'humidity_mean', 
            'wind_speed_max', 'wind_speed_mean',
            'precipitation', 'events', 'wind_direction']
   
wu_forecast = pd.DataFrame(rows)
wu_forecast.columns = colnames

# additional fields that can be used    
#    ddd.append(day['icon'])
#    ddd.append(day['avewind']['dir'])
#    ddd.append(day['maxwind']['degrees'])
#    ddd.append(day['maxwind']['dir'])
#    ddd.append(day['qpf_day']['in'])
#    ddd.append(day['qpf_night']['in'])
#    ddd.append(day['skyicon'])
#    ddd.append(day['period'])
#    ddd.append(day['pop'])


    

### Clean features

In [9]:
#wu_forecast['temp_max'] = wu_forecast['temp_max'].astype(int)
#wu_forecast['temp_min'] = wu_forecast['temp_min'].astype(int)
#wu_forecast['dayofweek'] = wu_forecast['dayofweek'].str.lower()
wu_forecast.head(10)



Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,humidity_mean,wind_speed_max,wind_speed_mean,precipitation,events,wind_direction
0,2016-3-7,monday,7,3,2016,58,53,48,10,66,25,18,0.01,Chance of a Thunderstorm,318
1,2016-3-8,tuesday,8,3,2016,58,55,52,6,59,20,13,0.11,Partly Cloudy,290
2,2016-3-9,wednesday,9,3,2016,64,60,56,8,80,15,10,0.01,Chance of Rain,197
3,2016-3-10,thursday,10,3,2016,63,59,55,8,79,20,14,0.4,Chance of Rain,193
4,2016-3-11,friday,11,3,2016,59,55,51,8,74,20,14,0.31,Rain,213
5,2016-3-12,saturday,12,3,2016,59,56,54,5,67,25,17,0.57,Chance of Rain,190
6,2016-3-13,sunday,13,3,2016,59,56,54,5,75,20,15,0.73,Rain,198
7,2016-3-14,monday,14,3,2016,59,55,51,8,73,20,16,0.29,Rain,234
8,2016-3-15,tuesday,15,3,2016,58,53,49,9,65,25,18,0.0,Clear,290
9,2016-3-16,wednesday,16,3,2016,65,58,51,14,57,20,13,0.0,Clear,317


In [10]:
wu_forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 15 columns):
date               10 non-null object
dayofweek          10 non-null object
day                10 non-null int64
month              10 non-null int64
year               10 non-null int64
temp_max           10 non-null int64
temp_mean          10 non-null int64
temp_min           10 non-null int64
temp_delta         10 non-null int64
humidity_mean      10 non-null int64
wind_speed_max     10 non-null int64
wind_speed_mean    10 non-null int64
precipitation      10 non-null float64
events             10 non-null object
wind_direction     10 non-null int64
dtypes: float64(1), int64(11), object(3)
memory usage: 1.2+ KB


In [11]:
wu_forecast.describe()

Unnamed: 0,day,month,year,temp_max,temp_mean,temp_min,temp_delta,humidity_mean,wind_speed_max,wind_speed_mean,precipitation,wind_direction
count,10.0,10,10,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,11.5,3,2016,60.2,56.0,52.1,8.1,69.5,21.0,14.8,0.243,244.0
std,3.02765,0,0,2.699794,2.357023,2.601282,2.643651,7.947746,3.162278,2.529822,0.262427,53.686332
min,7.0,3,2016,58.0,53.0,48.0,5.0,57.0,15.0,10.0,0.0,190.0
25%,9.25,3,2016,58.25,55.0,51.0,6.5,65.25,20.0,13.25,0.01,197.25
50%,11.5,3,2016,59.0,55.5,51.5,8.0,70.0,20.0,14.5,0.2,223.5
75%,13.75,3,2016,62.0,57.5,54.0,8.75,74.75,23.75,16.75,0.3775,290.0
max,16.0,3,2016,65.0,60.0,56.0,14.0,80.0,25.0,18.0,0.73,318.0


# OWM-OpenWeatherMap


[[back to top](#Sections)]

### Use pyown wrapper to pull in a 10 day forecast from OWM


In [12]:
owm = pyowm.OWM('76ca531bdb080cdf04be66ddc063d249')
city = "SanFrancisco,usa"
fc = owm.daily_forecast(city, limit=10)
f = fc.get_forecast()
forecasts = f.get_weathers()

## Create a data frame from the 10 day forecasts
Fields
- mean pressure (in mb, convert to in.)
- cloud cover (in percent, convert to 1 - 8 scale)
- OWM Status (rain, cloudy, etc)

In [13]:
rows = []
for day in forecasts:
    ddd = []
    ddd.append(day.get_reference_time('iso'))
    ddd.append(day.get_pressure()['press']*0.02953)
    ddd.append(day.get_clouds())
    ddd.append(day.get_status())
#    ddd.append(day.get_heat_index())
#    ddd.append(day.get_dewpoint())   Always 0
#    ddd.append(day.get_sunrise_time())   Always 0
    rt = pd.Series(ddd)
    rows.append(rt)
colnames = ['date', 'sl_pressure_mean', 'cloud_cover', 'owm_status'] 
owm_forecast = pd.DataFrame(rows)
owm_forecast.columns = colnames
owm_forecast

# other fields available from the api

#    ddd.append(w.get_weather_icon_name())
#    ddd.append(w.get_temperature(unit='fahrenheit')['max'])
#    ddd.append(w.get_temperature(unit='fahrenheit')['min'])
#    ddd.append(w.get_wind()['speed'])
#    ddd.append(day['avewind']['degrees'])
#    ddd.append(day['avewind']['dir'])
#    ddd.append(day['maxwind']['mph'])
#    ddd.append(day['maxwind']['degrees'])
#    ddd.append(day['maxwind']['dir'])
#    ddd.append(day['maxhumidity'])
#    ddd.append(day['minhumidity'])
#    if w.get_rain() != {}:
#        ddd.append(w.get_rain()['all'])
#    else:
#        ddd.append(0.0)
#    ddd.append(day['qpf_day']['in'])
#    ddd.append(day['qpf_night']['in'])
#    ddd.append(w.get_visibility_distance())

#df['high_temp'] = df['high_temp'].astype(int)
#df['low_temp'] = df['low_temp'].astype(int)    

Unnamed: 0,date,sl_pressure_mean,cloud_cover,owm_status
0,2016-03-06 20:00:00+00,29.988306,88,Rain
1,2016-03-07 20:00:00+00,30.062426,0,Clear
2,2016-03-08 20:00:00+00,30.278585,32,Rain
3,2016-03-09 20:00:00+00,30.30221,92,Rain
4,2016-03-10 20:00:00+00,30.225136,76,Rain
5,2016-03-11 20:00:00+00,29.677945,75,Rain
6,2016-03-12 20:00:00+00,29.979447,19,Rain
7,2016-03-13 20:00:00+00,29.886722,78,Rain
8,2016-03-14 20:00:00+00,30.024037,41,Rain
9,2016-03-15 20:00:00+00,29.936333,37,Rain


## Combine both forecasts

[[back to top](#Sections)]

In [14]:
forecast = wu_forecast.copy()
forecast.insert(9, 'sl_pressure_mean', owm_forecast['sl_pressure_mean'])
forecast.insert(13, 'cloud_cover', owm_forecast['cloud_cover'])
forecast['owm_status'] = owm_forecast['owm_status']
forecast



Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,sl_pressure_mean,humidity_mean,wind_speed_max,wind_speed_mean,cloud_cover,precipitation,events,wind_direction,owm_status
0,2016-3-7,monday,7,3,2016,58,53,48,10,29.988306,66,25,18,88,0.01,Chance of a Thunderstorm,318,Rain
1,2016-3-8,tuesday,8,3,2016,58,55,52,6,30.062426,59,20,13,0,0.11,Partly Cloudy,290,Clear
2,2016-3-9,wednesday,9,3,2016,64,60,56,8,30.278585,80,15,10,32,0.01,Chance of Rain,197,Rain
3,2016-3-10,thursday,10,3,2016,63,59,55,8,30.30221,79,20,14,92,0.4,Chance of Rain,193,Rain
4,2016-3-11,friday,11,3,2016,59,55,51,8,30.225136,74,20,14,76,0.31,Rain,213,Rain
5,2016-3-12,saturday,12,3,2016,59,56,54,5,29.677945,67,25,17,75,0.57,Chance of Rain,190,Rain
6,2016-3-13,sunday,13,3,2016,59,56,54,5,29.979447,75,20,15,19,0.73,Rain,198,Rain
7,2016-3-14,monday,14,3,2016,59,55,51,8,29.886722,73,20,16,78,0.29,Rain,234,Rain
8,2016-3-15,tuesday,15,3,2016,58,53,49,9,30.024037,65,25,18,41,0.0,Clear,290,Rain
9,2016-3-16,wednesday,16,3,2016,65,58,51,14,29.936333,57,20,13,37,0.0,Clear,317,Rain


In [15]:
forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 18 columns):
date                10 non-null object
dayofweek           10 non-null object
day                 10 non-null int64
month               10 non-null int64
year                10 non-null int64
temp_max            10 non-null int64
temp_mean           10 non-null int64
temp_min            10 non-null int64
temp_delta          10 non-null int64
sl_pressure_mean    10 non-null float64
humidity_mean       10 non-null int64
wind_speed_max      10 non-null int64
wind_speed_mean     10 non-null int64
cloud_cover         10 non-null int64
precipitation       10 non-null float64
events              10 non-null object
wind_direction      10 non-null int64
owm_status          10 non-null object
dtypes: float64(2), int64(12), object(4)
memory usage: 1.5+ KB


## Add in Features

[[back to top](#Sections)]

Still to do

- deltas


### Create end of week from dayofweek
- Friday, Saturday and Sunday will be considered end_of_week. Others are not

In [16]:
list_eow = ['friday', 'saturday', 'sunday']
forecast.loc[:,'end_of_week'] = forecast['dayofweek'].apply(lambda x : x in list_eow)
forecast['end_of_week'].value_counts()

False    7
True     3
Name: end_of_week, dtype: int64

In [17]:
def eow(s):
    if s =='friday' or s == 'saturday' :
        return True
    else:
        return False

In [18]:
#forecast.loc[:,'end_of_week'] = forecast['dayofweek'].map(eow)


### Create 'winter' feature for if month is Nov, Dec, Jan

In [19]:
forecast.loc[:,'winter'] = forecast['month'].apply(lambda x : x in [1, 11, 12])

### Create rain from status field
- Is there rain in the forecast

In [20]:
forecast.loc[:,'rain'] = forecast['owm_status'] == 'Rain'

### Sun time and Moon phase
- These features use the sf_sum_moon routine that I made using 'import ephem as ep' package.

In [21]:
forecast['sun_length'], forecast['moon_phase'] = zip(*forecast['date'].apply(sf_sun_moon))

### Sun Units

In [22]:
forecast['sun_units'] = forecast['sun_length'] * (1.0 - (forecast['cloud_cover']  * .01))

### Dew points
The following Features are calculated using the pywws conversions package
- Formula DewpointTemp = Temp - (9/25(100-Humidity))
- Weather underground does not return values for humidity max and min

In [23]:
#pywws.conversions.dew_point(temp, hum)[source]
forecast.loc[:,'dew_point_mean'] = forecast.apply( lambda x : cv.dew_point(x['temp_mean'], x['humidity_mean']), axis=1)

### Wind Chill
- Temperature and wind speed

In [24]:
#pywws.conversions.wind_chill(temp, wind)
forecast.loc[:,'wind_chill_mean'] = forecast.apply( lambda x : cv.wind_chill(x['temp_mean'], x['wind_speed_mean']), axis=1)
forecast.loc[:,'wind_chill_max'] = forecast.apply( lambda x : cv.wind_chill(x['temp_max'], x['wind_speed_max']), axis=1)

### Heat Index and Apparent Temperature
- Temperature and Humidity

In [25]:
#pywws.conversions.usaheatindex(temp, humidity, dew)
forecast.loc[:,'heat_index'] = forecast.apply( lambda x : cv.usaheatindex(x['temp_mean'], x['humidity_mean'], x['dew_point_mean']), axis=1)
#pywws.conversions.apparent_temp(temp, rh, wind)
forecast.loc[:,'temp_apparent'] = forecast.apply( lambda x : cv.apparent_temp(x['temp_mean'], x['humidity_mean'], x['wind_speed_mean']), axis=1)



In [26]:
forecast.head(10)

Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,sl_pressure_mean,...,winter,rain,sun_length,moon_phase,sun_units,dew_point_mean,wind_chill_mean,wind_chill_max,heat_index,temp_apparent
0,2016-3-7,monday,7,3,2016,58,53,48,10,29.988306,...,False,True,698.983333,0.94216,83.878,44.690763,53,58,147.427463,67.387108
1,2016-3-8,tuesday,8,3,2016,58,55,52,6,30.062426,...,False,False,701.366667,0.976093,701.366667,44.387522,55,58,145.176234,72.406637
2,2016-3-9,wednesday,9,3,2016,64,60,56,8,30.278585,...,False,True,703.783333,1.010027,478.572667,55.25923,60,64,257.223959,101.351845
3,2016-3-10,thursday,10,3,2016,63,59,55,8,30.30221,...,False,True,706.183333,0.044071,56.494667,54.029759,59,63,242.167759,94.549754
4,2016-3-11,friday,11,3,2016,59,55,51,8,30.225136,...,False,True,708.6,0.07809,170.064,48.847995,55,59,185.28271,79.462562
5,2016-3-12,saturday,12,3,2016,59,56,54,5,29.677945,...,False,True,711.016667,0.11211,177.754167,47.819197,56,59,174.234345,76.437556
6,2016-3-13,sunday,13,3,2016,59,56,54,5,29.979447,...,False,True,713.433333,0.146129,577.881,50.076872,56,59,197.702076,82.176369
7,2016-3-14,monday,14,3,2016,59,55,51,8,29.886722,...,False,True,715.85,0.180148,157.487,48.576113,55,59,182.431986,77.5455
8,2016-3-15,tuesday,15,3,2016,58,53,49,9,30.024037,...,False,True,718.266667,0.214167,423.777333,44.39449,53,58,145.053194,66.917606
9,2016-3-16,wednesday,16,3,2016,65,58,51,14,29.936333,...,False,True,720.7,0.248186,454.041,46.492764,58,65,161.797457,78.879121


In [27]:
forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 29 columns):
date                10 non-null object
dayofweek           10 non-null object
day                 10 non-null int64
month               10 non-null int64
year                10 non-null int64
temp_max            10 non-null int64
temp_mean           10 non-null int64
temp_min            10 non-null int64
temp_delta          10 non-null int64
sl_pressure_mean    10 non-null float64
humidity_mean       10 non-null int64
wind_speed_max      10 non-null int64
wind_speed_mean     10 non-null int64
cloud_cover         10 non-null int64
precipitation       10 non-null float64
events              10 non-null object
wind_direction      10 non-null int64
owm_status          10 non-null object
end_of_week         10 non-null bool
winter              10 non-null bool
rain                10 non-null bool
sun_length          10 non-null float64
moon_phase          10 non-null float64
sun_units      

## Write forecast data to file

[[back to top](#Sections)]

In [28]:
forecast.to_csv('forecast_10day.csv')