# Get 10 day Weather forecast 
- Jim Haskin

- GA-Data Science
- Dec 2015

- 2/17/2016

## Method
This 10 day forecast will be used to run through the predictive model to get the projected crime level for the next 10 days
- Use the Weather Underground API to get their forecast.
- Use the OpenWeatherMap API and the pyOWM wrapper to get their prediction for fields that were not in the Weather Underground API
- Calculate additional fields that were also calculated for the Training data.
- Write to file to be run with model

## Working notes
- TEMP : added both status fields. Need to see what works best with the historical 'events'
- additional features still to add (See Below)

## Sections

- [Weather Underground json](#Weather-Underground-json)
- [OWM-OpenWeatherMap](#OWM-OpenWeatherMap)
- [Combine both forecasts](#Combine-both-forecasts)
- [Add in Features](#Add-in-Features)
- [Write forecast data to file](#Write-forecast-data-to-file)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sb
%matplotlib inline
import pyowm
from datetime import timedelta
from sf_sun_moon import sf_sun_moon
from pywws import conversions as cv

In [2]:
import requests
# use sleep to stop us from getting timed out
from time import sleep

# Weather Underground json

[[back to top](#Sections)]

### Use json and request to pull in a 10 day forcast from Weather Underground

In [3]:
api_key = '3bfa0f909a078622'
#api_key = 3bfa0f909a078622

In [4]:
# use requests library to interact with a URL
r = requests.get('http://api.wunderground.com/api/3bfa0f909a078622/forecast10day/q/CA/San_Francisco.json')

In [5]:
# check the status: 200 means success, 4xx means error
r.status_code

200

In [6]:
data = r.json()['forecast']['simpleforecast']['forecastday']

In [7]:
#data[0]

## Create a data frame from the 10 day forcasts
Fields
- date
- day of week
- day
- month
- year
- max temp
- mean temperature
- min temp
- temperature change
- mean humidity
- max wind speed
- mean wind speed
- precipitation
- conditions/events
- mean wind direction
- Sun minutes
- Sun units
- Moon phase


In [8]:
rows = []
for day in data:
    ddd = []
    dayr = day['date']['day']
    monthr = day['date']['month']
    yearr = day['date']['year']
    ddd.append(str(yearr)+'-'+str(monthr)+'-'+str(dayr))
    ddd.append(day['date']['weekday'].lower())    
    ddd.append(dayr)    
    ddd.append(monthr)    
    ddd.append(yearr)
    temp_h = int(day['high']['fahrenheit'])
    temp_l = int(day['low']['fahrenheit'])
    temp_d = temp_h - temp_l
    temp_m = (temp_h + temp_l) / 2
    
    ddd.append(temp_h)
    ddd.append(temp_m)
    ddd.append(temp_l)
    ddd.append(temp_d)
# ALWAYS 0     ddd.append(day['maxhumidity'])
    ddd.append(day['avehumidity'])
# ALWAYS 0    ddd.append(day['minhumidity'])
    ddd.append(day['maxwind']['mph'])
    ddd.append(day['avewind']['mph'])
    ddd.append(day['qpf_allday']['in'])
    ddd.append(day['conditions'])
    ddd.append(day['avewind']['degrees'])
    rt = pd.Series(ddd)
    rows.append(rt)
colnames = ['date', 'dayofweek', 'day', 'month', 'year', 
            'temp_max', 'temp_mean', 'temp_min', 'temp_delta',
            'humidity_mean', 
            'wind_speed_max', 'wind_speed_mean',
            'precipitation', 'events', 'wind_direction']
   
wu_forecast = pd.DataFrame(rows)
wu_forecast.columns = colnames

# additional fields that can be used    
#    ddd.append(day['icon'])
#    ddd.append(day['avewind']['dir'])
#    ddd.append(day['maxwind']['degrees'])
#    ddd.append(day['maxwind']['dir'])
#    ddd.append(day['qpf_day']['in'])
#    ddd.append(day['qpf_night']['in'])
#    ddd.append(day['skyicon'])
#    ddd.append(day['period'])
#    ddd.append(day['pop'])


    

### Clean features

In [9]:
#wu_forecast['temp_max'] = wu_forecast['temp_max'].astype(int)
#wu_forecast['temp_min'] = wu_forecast['temp_min'].astype(int)
#wu_forecast['dayofweek'] = wu_forecast['dayofweek'].str.lower()
wu_forecast.head(10)



Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,humidity_mean,wind_speed_max,wind_speed_mean,precipitation,events,wind_direction
0,2016-3-1,tuesday,1,3,2016,65,59,53,12,57,15,10,0.0,Clear,259
1,2016-3-2,wednesday,2,3,2016,64,60,56,8,77,20,13,0.22,Overcast,217
2,2016-3-3,thursday,3,3,2016,62,58,55,7,79,10,5,0.04,Chance of Rain,193
3,2016-3-4,friday,4,3,2016,63,60,58,5,82,20,13,0.41,Rain,185
4,2016-3-5,saturday,5,3,2016,64,59,54,10,76,35,25,0.65,Rain,173
5,2016-3-6,sunday,6,3,2016,59,55,51,8,71,30,22,0.86,Rain,226
6,2016-3-7,monday,7,3,2016,58,54,50,8,68,25,18,0.12,Rain,285
7,2016-3-8,tuesday,8,3,2016,60,56,53,7,65,20,15,0.11,Partly Cloudy,290
8,2016-3-9,wednesday,9,3,2016,60,57,54,6,73,15,12,0.29,Chance of Rain,206
9,2016-3-10,thursday,10,3,2016,60,56,53,7,79,20,16,0.72,Rain,187


In [10]:
wu_forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 15 columns):
date               10 non-null object
dayofweek          10 non-null object
day                10 non-null int64
month              10 non-null int64
year               10 non-null int64
temp_max           10 non-null int64
temp_mean          10 non-null int64
temp_min           10 non-null int64
temp_delta         10 non-null int64
humidity_mean      10 non-null int64
wind_speed_max     10 non-null int64
wind_speed_mean    10 non-null int64
precipitation      10 non-null float64
events             10 non-null object
wind_direction     10 non-null int64
dtypes: float64(1), int64(11), object(3)
memory usage: 1.2+ KB


In [11]:
wu_forecast.describe()

Unnamed: 0,day,month,year,temp_max,temp_mean,temp_min,temp_delta,humidity_mean,wind_speed_max,wind_speed_mean,precipitation,wind_direction
count,10.0,10,10,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,5.5,3,2016,61.5,57.4,53.7,7.8,72.7,21.0,14.9,0.342,222.1
std,3.02765,0,0,2.415229,2.1187,2.311805,1.988858,7.645623,7.378648,5.782156,0.305425,42.288559
min,1.0,3,2016,58.0,54.0,50.0,5.0,57.0,10.0,5.0,0.0,173.0
25%,3.25,3,2016,60.0,56.0,53.0,7.0,68.75,16.25,12.25,0.1125,188.5
50%,5.5,3,2016,61.0,57.5,53.5,7.5,74.5,20.0,14.0,0.255,211.5
75%,7.75,3,2016,63.75,59.0,54.75,8.0,78.5,23.75,17.5,0.59,250.75
max,10.0,3,2016,65.0,60.0,58.0,12.0,82.0,35.0,25.0,0.86,290.0


# OWM-OpenWeatherMap


[[back to top](#Sections)]

### Use pyown wrapper to pull in a 10 day forecast from OWM


In [12]:
owm = pyowm.OWM('76ca531bdb080cdf04be66ddc063d249')
city = "SanFrancisco,usa"
fc = owm.daily_forecast(city, limit=10)
f = fc.get_forecast()
forecasts = f.get_weathers()

## Create a data frame from the 10 day forecasts
Fields
- mean pressure (in mb, convert to in.)
- cloud cover (in percent, convert to 1 - 8 scale)
- OWM Status (rain, cloudy, etc)

In [13]:
rows = []
for day in forecasts:
    ddd = []
    ddd.append(day.get_reference_time('iso'))
    ddd.append(day.get_pressure()['press']*0.02953)
    ddd.append(day.get_clouds())
    ddd.append(day.get_status())
#    ddd.append(day.get_heat_index())
#    ddd.append(day.get_dewpoint())   Always 0
#    ddd.append(day.get_sunrise_time())   Always 0
    rt = pd.Series(ddd)
    rows.append(rt)
colnames = ['date', 'sl_pressure_mean', 'cloud_cover', 'owm_status'] 
owm_forecast = pd.DataFrame(rows)
owm_forecast.columns = colnames
owm_forecast

# other fields available from the api

#    ddd.append(w.get_weather_icon_name())
#    ddd.append(w.get_temperature(unit='fahrenheit')['max'])
#    ddd.append(w.get_temperature(unit='fahrenheit')['min'])
#    ddd.append(w.get_wind()['speed'])
#    ddd.append(day['avewind']['degrees'])
#    ddd.append(day['avewind']['dir'])
#    ddd.append(day['maxwind']['mph'])
#    ddd.append(day['maxwind']['degrees'])
#    ddd.append(day['maxwind']['dir'])
#    ddd.append(day['maxhumidity'])
#    ddd.append(day['minhumidity'])
#    if w.get_rain() != {}:
#        ddd.append(w.get_rain()['all'])
#    else:
#        ddd.append(0.0)
#    ddd.append(day['qpf_day']['in'])
#    ddd.append(day['qpf_night']['in'])
#    ddd.append(w.get_visibility_distance())

#df['high_temp'] = df['high_temp'].astype(int)
#df['low_temp'] = df['low_temp'].astype(int)    

Unnamed: 0,date,sl_pressure_mean,cloud_cover,owm_status
0,2016-03-01 20:00:00+00,29.908279,0,Clear
1,2016-03-02 20:00:00+00,29.831501,20,Clouds
2,2016-03-03 20:00:00+00,29.86428,88,Rain
3,2016-03-04 20:00:00+00,29.911232,90,Rain
4,2016-03-05 20:00:00+00,29.784844,90,Rain
5,2016-03-06 20:00:00+00,29.748522,54,Rain
6,2016-03-07 20:00:00+00,29.584631,79,Rain
7,2016-03-08 20:00:00+00,29.885836,98,Rain
8,2016-03-09 20:00:00+00,29.84538,100,Rain
9,2016-03-10 20:00:00+00,29.894105,31,Rain


## Combine both forecasts

[[back to top](#Sections)]

In [14]:
forecast = wu_forecast.copy()
forecast.insert(9, 'sl_pressure_mean', owm_forecast['sl_pressure_mean'])
forecast.insert(13, 'cloud_cover', owm_forecast['cloud_cover'])
forecast['owm_status'] = owm_forecast['owm_status']
forecast



Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,sl_pressure_mean,humidity_mean,wind_speed_max,wind_speed_mean,cloud_cover,precipitation,events,wind_direction,owm_status
0,2016-3-1,tuesday,1,3,2016,65,59,53,12,29.908279,57,15,10,0,0.0,Clear,259,Clear
1,2016-3-2,wednesday,2,3,2016,64,60,56,8,29.831501,77,20,13,20,0.22,Overcast,217,Clouds
2,2016-3-3,thursday,3,3,2016,62,58,55,7,29.86428,79,10,5,88,0.04,Chance of Rain,193,Rain
3,2016-3-4,friday,4,3,2016,63,60,58,5,29.911232,82,20,13,90,0.41,Rain,185,Rain
4,2016-3-5,saturday,5,3,2016,64,59,54,10,29.784844,76,35,25,90,0.65,Rain,173,Rain
5,2016-3-6,sunday,6,3,2016,59,55,51,8,29.748522,71,30,22,54,0.86,Rain,226,Rain
6,2016-3-7,monday,7,3,2016,58,54,50,8,29.584631,68,25,18,79,0.12,Rain,285,Rain
7,2016-3-8,tuesday,8,3,2016,60,56,53,7,29.885836,65,20,15,98,0.11,Partly Cloudy,290,Rain
8,2016-3-9,wednesday,9,3,2016,60,57,54,6,29.84538,73,15,12,100,0.29,Chance of Rain,206,Rain
9,2016-3-10,thursday,10,3,2016,60,56,53,7,29.894105,79,20,16,31,0.72,Rain,187,Rain


In [15]:
forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 18 columns):
date                10 non-null object
dayofweek           10 non-null object
day                 10 non-null int64
month               10 non-null int64
year                10 non-null int64
temp_max            10 non-null int64
temp_mean           10 non-null int64
temp_min            10 non-null int64
temp_delta          10 non-null int64
sl_pressure_mean    10 non-null float64
humidity_mean       10 non-null int64
wind_speed_max      10 non-null int64
wind_speed_mean     10 non-null int64
cloud_cover         10 non-null int64
precipitation       10 non-null float64
events              10 non-null object
wind_direction      10 non-null int64
owm_status          10 non-null object
dtypes: float64(2), int64(12), object(4)
memory usage: 1.5+ KB


## Add in Features

[[back to top](#Sections)]

Still to do

- deltas


### Create end of week from dayofweek
- Friday, Saturday and Sunday will be considered end_of_week. Others are not

In [16]:
def eow(s):
    if s =='friday' or s == 'saturday' or s == 'sunday':
        return True
    else:
        return False

In [17]:
forecast.loc[:,'end_of_week'] = forecast['dayofweek'].map(eow)


### Create rain from status field
- Is there rain in the forecast

In [18]:
forecast.loc[:,'rain'] = forecast['owm_status'] == 'Rain'

### Sun time and Moon phase
- These features use the sf_sum_moon routine that I made using 'import ephem as ep' package.

In [19]:
forecast['sun_length'], forecast['moon_phase'] = zip(*forecast['date'].apply(sf_sun_moon))

### Sun Units

In [20]:
forecast['sun_units'] = forecast['sun_length'] * (1.0 - (forecast['cloud_cover']  * .01))

### Dew points
The following Features are calculated using the pywws conversions package
- Formula DewpointTemp = Temp - (9/25(100-Humidity))
- Weather underground does not return values for humidity max and min

In [21]:
#pywws.conversions.dew_point(temp, hum)[source]
forecast.loc[:,'dew_point_mean'] = forecast.apply( lambda x : cv.dew_point(x['temp_mean'], x['humidity_mean']), axis=1)

### Wind Chill
- Temperature and wind speed

In [22]:
#pywws.conversions.wind_chill(temp, wind)
forecast.loc[:,'wind_chill_mean'] = forecast.apply( lambda x : cv.wind_chill(x['temp_mean'], x['wind_speed_mean']), axis=1)
forecast.loc[:,'wind_chill_max'] = forecast.apply( lambda x : cv.wind_chill(x['temp_max'], x['wind_speed_max']), axis=1)

### Heat Index and Apparent Temperature
- Temperature and Humidity

In [23]:
#pywws.conversions.usaheatindex(temp, humidity, dew)
forecast.loc[:,'heat_index'] = forecast.apply( lambda x : cv.usaheatindex(x['temp_mean'], x['humidity_mean'], x['dew_point_mean']), axis=1)
#pywws.conversions.apparent_temp(temp, rh, wind)
forecast.loc[:,'temp_apparent'] = forecast.apply( lambda x : cv.apparent_temp(x['temp_mean'], x['humidity_mean'], x['wind_speed_mean']), axis=1)



In [24]:
forecast.head(10)

Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,sl_pressure_mean,...,end_of_week,rain,sun_length,moon_phase,sun_units,dew_point_mean,wind_chill_mean,wind_chill_max,heat_index,temp_apparent
0,2016-3-1,tuesday,1,3,2016,65,59,53,12,29.908279,...,False,False,684.65,0.738557,684.65,47.416326,59,65,169.385613,83.606785
1,2016-3-2,wednesday,2,3,2016,64,60,56,8,29.831501,...,False,False,687.016667,0.772491,549.613333,54.462311,60,64,245.930729,97.288651
2,2016-3-3,thursday,3,3,2016,62,58,55,7,29.86428,...,False,True,689.416667,0.806424,82.73,53.062927,58,62,231.18451,97.593869
3,2016-3-4,friday,4,3,2016,63,60,58,5,29.911232,...,True,True,691.783333,0.840358,69.178333,55.776392,60,63,264.894854,100.560641
4,2016-3-5,saturday,5,3,2016,64,59,54,10,29.784844,...,True,True,694.183333,0.874292,69.418333,53.229334,59,64,231.450143,84.975713
5,2016-3-6,sunday,6,3,2016,59,55,51,8,29.748522,...,True,True,696.566667,0.908226,320.420667,48.022593,55,59,176.806377,72.311377
6,2016-3-7,monday,7,3,2016,58,54,50,8,29.584631,...,False,True,698.983333,0.94216,146.7865,46.219325,54,58,160.281144,70.909694
7,2016-3-8,tuesday,8,3,2016,60,56,53,7,29.885836,...,False,True,701.366667,0.976093,14.027333,47.21864,56,60,168.626767,76.752853
8,2016-3-9,wednesday,9,3,2016,60,57,54,6,29.84538,...,False,True,703.783333,1.010027,0.0,50.489002,57,60,201.187193,86.114673
9,2016-3-10,thursday,10,3,2016,60,56,53,7,29.894105,...,False,True,706.183333,0.044071,487.2665,51.128936,56,60,210.058395,83.645775


In [25]:
forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 28 columns):
date                10 non-null object
dayofweek           10 non-null object
day                 10 non-null int64
month               10 non-null int64
year                10 non-null int64
temp_max            10 non-null int64
temp_mean           10 non-null int64
temp_min            10 non-null int64
temp_delta          10 non-null int64
sl_pressure_mean    10 non-null float64
humidity_mean       10 non-null int64
wind_speed_max      10 non-null int64
wind_speed_mean     10 non-null int64
cloud_cover         10 non-null int64
precipitation       10 non-null float64
events              10 non-null object
wind_direction      10 non-null int64
owm_status          10 non-null object
end_of_week         10 non-null bool
rain                10 non-null bool
sun_length          10 non-null float64
moon_phase          10 non-null float64
sun_units           10 non-null float64
dew_point_me

## Write forecast data to file

[[back to top](#Sections)]

In [26]:
forecast.to_csv('forecast_10day.csv')