# Get 10 day Weather forecast 
- Jim Haskin

- GA-Data Science
- Dec 2015

- 2/17/2016

## Method
This 10 day forecast will be used to run through the predictive model to get the projected crime level for the next 10 days
- Use the Weather Underground API to get their forecast.
- Use the OpenWeatherMap API and the pyOWM wrapper to get their prediction for fields that were not in the Weather Underground API
- Calculate additional fields that were also calculated for the Training data.
- Write to file to be run with model

## Working notes
- TEMP : added both status fields. Need to see what works best with the historical 'events'
- additional features still to add (See Below)

## Sections

- [Weather Underground json](#Weather-Underground-json)
- [OWM-OpenWeatherMap](#OWM-OpenWeatherMap)
- [Combine both forecasts](#Combine-both-forecasts)
- [Add in Features](#Add-in-Features)
- [Write forecast data to file](#Write-forecast-data-to-file)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sb
%matplotlib inline
import pyowm
from datetime import timedelta
from sf_sun_moon import sf_sun_moon
from pywws import conversions as cv

In [2]:
import requests
# use sleep to stop us from getting timed out
from time import sleep

# Weather Underground json

[[back to top](#Sections)]

### Use json and request to pull in a 10 day forcast from Weather Underground

In [3]:
api_key = '3bfa0f909a078622'
#api_key = 3bfa0f909a078622

In [4]:
# use requests library to interact with a URL
r = requests.get('http://api.wunderground.com/api/3bfa0f909a078622/forecast10day/q/CA/San_Francisco.json')

In [5]:
# check the status: 200 means success, 4xx means error
r.status_code

200

In [6]:
data = r.json()['forecast']['simpleforecast']['forecastday']

In [7]:
#data[0]

## Create a data frame from the 10 day forcasts
Fields
- date
- day of week
- day
- month
- year
- max temp
- mean temperature
- min temp
- temperature change
- mean humidity
- max wind speed
- mean wind speed
- precipitation
- conditions/events
- mean wind direction
- Sun minutes
- Sun units
- Moon phase


In [8]:
rows = []
for day in data:
    ddd = []
    dayr = day['date']['day']
    monthr = day['date']['month']
    yearr = day['date']['year']
    ddd.append(str(yearr)+'-'+str(monthr)+'-'+str(dayr))
    ddd.append(day['date']['weekday'].lower())    
    ddd.append(dayr)    
    ddd.append(monthr)    
    ddd.append(yearr)
    temp_h = int(day['high']['fahrenheit'])
    temp_l = int(day['low']['fahrenheit'])
    temp_d = temp_h - temp_l
    temp_m = (temp_h + temp_l) / 2
    
    ddd.append(temp_h)
    ddd.append(temp_m)
    ddd.append(temp_l)
    ddd.append(temp_d)
# ALWAYS 0     ddd.append(day['maxhumidity'])
    ddd.append(day['avehumidity'])
# ALWAYS 0    ddd.append(day['minhumidity'])
    ddd.append(day['maxwind']['mph'])
    ddd.append(day['avewind']['mph'])
    ddd.append(day['qpf_allday']['in'])
    ddd.append(day['conditions'])
    ddd.append(day['avewind']['degrees'])
    rt = pd.Series(ddd)
    rows.append(rt)
colnames = ['date', 'dayofweek', 'day', 'month', 'year', 
            'temp_max', 'temp_mean', 'temp_min', 'temp_delta',
            'humidity_mean', 
            'wind_speed_max', 'wind_speed_mean',
            'precipitation', 'events', 'wind_direction']
   
wu_forecast = pd.DataFrame(rows)
wu_forecast.columns = colnames

# additional fields that can be used    
#    ddd.append(day['icon'])
#    ddd.append(day['avewind']['dir'])
#    ddd.append(day['maxwind']['degrees'])
#    ddd.append(day['maxwind']['dir'])
#    ddd.append(day['qpf_day']['in'])
#    ddd.append(day['qpf_night']['in'])
#    ddd.append(day['skyicon'])
#    ddd.append(day['period'])
#    ddd.append(day['pop'])


    

### Clean features

In [9]:
#wu_forecast['temp_max'] = wu_forecast['temp_max'].astype(int)
#wu_forecast['temp_min'] = wu_forecast['temp_min'].astype(int)
#wu_forecast['dayofweek'] = wu_forecast['dayofweek'].str.lower()
wu_forecast.head(10)



Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,humidity_mean,wind_speed_max,wind_speed_mean,precipitation,events,wind_direction
0,2016-3-10,thursday,10,3,2016,61,58,56,5,85,15,10,0.67,Rain,170
1,2016-3-11,friday,11,3,2016,58,55,53,5,76,30,20,0.48,Rain,201
2,2016-3-12,saturday,12,3,2016,58,56,55,3,74,30,21,0.42,Rain,196
3,2016-3-13,sunday,13,3,2016,59,56,54,5,84,25,18,0.54,Rain,200
4,2016-3-14,monday,14,3,2016,58,54,50,8,66,20,13,0.0,Partly Cloudy,276
5,2016-3-15,tuesday,15,3,2016,60,55,51,9,58,15,11,0.0,Clear,325
6,2016-3-16,wednesday,16,3,2016,65,59,54,11,58,15,11,0.0,Clear,333
7,2016-3-17,thursday,17,3,2016,66,60,55,11,56,10,7,0.0,Clear,26
8,2016-3-18,friday,18,3,2016,66,60,55,11,58,10,7,0.0,Clear,337
9,2016-3-19,saturday,19,3,2016,66,60,54,12,62,10,9,0.08,Partly Cloudy,211


In [10]:
wu_forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 15 columns):
date               10 non-null object
dayofweek          10 non-null object
day                10 non-null int64
month              10 non-null int64
year               10 non-null int64
temp_max           10 non-null int64
temp_mean          10 non-null int64
temp_min           10 non-null int64
temp_delta         10 non-null int64
humidity_mean      10 non-null int64
wind_speed_max     10 non-null int64
wind_speed_mean    10 non-null int64
precipitation      10 non-null float64
events             10 non-null object
wind_direction     10 non-null int64
dtypes: float64(1), int64(11), object(3)
memory usage: 1.2+ KB


In [11]:
wu_forecast.describe()

Unnamed: 0,day,month,year,temp_max,temp_mean,temp_min,temp_delta,humidity_mean,wind_speed_max,wind_speed_mean,precipitation,wind_direction
count,10.0,10,10,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,14.5,3,2016,61.7,57.3,53.7,8.0,67.7,18.0,12.7,0.219,227.5
std,3.02765,0,0,3.622461,2.359378,1.888562,3.265986,11.195733,7.888106,5.186521,0.273677,95.225172
min,10.0,3,2016,58.0,54.0,50.0,3.0,56.0,10.0,7.0,0.0,26.0
25%,12.25,3,2016,58.25,55.25,53.25,5.0,58.0,11.25,9.25,0.0,197.0
50%,14.5,3,2016,60.5,57.0,54.0,8.5,64.0,15.0,11.0,0.04,206.0
75%,16.75,3,2016,65.75,59.75,55.0,11.0,75.5,23.75,16.75,0.465,312.75
max,19.0,3,2016,66.0,60.0,56.0,12.0,85.0,30.0,21.0,0.67,337.0


# OWM-OpenWeatherMap


[[back to top](#Sections)]

### Use pyown wrapper to pull in a 10 day forecast from OWM


In [12]:
owm = pyowm.OWM('76ca531bdb080cdf04be66ddc063d249')
city = "SanFrancisco,usa"
fc = owm.daily_forecast(city, limit=10)
f = fc.get_forecast()
forecasts = f.get_weathers()

## Create a data frame from the 10 day forecasts
Fields
- mean pressure (in mb, convert to in.)
- cloud cover (in percent, convert to 1 - 8 scale)
- OWM Status (rain, cloudy, etc)

In [13]:
rows = []
for day in forecasts:
    ddd = []
    ddd.append(day.get_reference_time('iso'))
    ddd.append(day.get_pressure()['press']*0.02953)
    ddd.append(day.get_clouds())
    ddd.append(day.get_status())
#    ddd.append(day.get_heat_index())
#    ddd.append(day.get_dewpoint())   Always 0
#    ddd.append(day.get_sunrise_time())   Always 0
    rt = pd.Series(ddd)
    rows.append(rt)
colnames = ['date', 'sl_pressure_mean', 'cloud_cover', 'owm_status'] 
owm_forecast = pd.DataFrame(rows)
owm_forecast.columns = colnames
owm_forecast

# other fields available from the api

#    ddd.append(w.get_weather_icon_name())
#    ddd.append(w.get_temperature(unit='fahrenheit')['max'])
#    ddd.append(w.get_temperature(unit='fahrenheit')['min'])
#    ddd.append(w.get_wind()['speed'])
#    ddd.append(day['avewind']['degrees'])
#    ddd.append(day['avewind']['dir'])
#    ddd.append(day['maxwind']['mph'])
#    ddd.append(day['maxwind']['degrees'])
#    ddd.append(day['maxwind']['dir'])
#    ddd.append(day['maxhumidity'])
#    ddd.append(day['minhumidity'])
#    if w.get_rain() != {}:
#        ddd.append(w.get_rain()['all'])
#    else:
#        ddd.append(0.0)
#    ddd.append(day['qpf_day']['in'])
#    ddd.append(day['qpf_night']['in'])
#    ddd.append(w.get_visibility_distance())

#df['high_temp'] = df['high_temp'].astype(int)
#df['low_temp'] = df['low_temp'].astype(int)    

Unnamed: 0,date,sl_pressure_mean,cloud_cover,owm_status
0,2016-03-10 20:00:00+00,30.246102,92,Rain
1,2016-03-11 20:00:00+00,29.914481,92,Rain
2,2016-03-12 20:00:00+00,30.274451,80,Rain
3,2016-03-13 20:00:00+00,30.214505,92,Rain
4,2016-03-14 20:00:00+00,29.982104,71,Rain
5,2016-03-15 20:00:00+00,30.093137,11,Rain
6,2016-03-16 20:00:00+00,29.964386,12,Clear
7,2016-03-17 20:00:00+00,29.862508,0,Clear
8,2016-03-18 20:00:00+00,29.642509,0,Clear
9,2016-03-19 20:00:00+00,29.885836,0,Clear


## Combine both forecasts

[[back to top](#Sections)]

In [14]:
forecast = wu_forecast.copy()
forecast.insert(9, 'sl_pressure_mean', owm_forecast['sl_pressure_mean'])
forecast.insert(13, 'cloud_cover', owm_forecast['cloud_cover'])
forecast['owm_status'] = owm_forecast['owm_status']
forecast



Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,sl_pressure_mean,humidity_mean,wind_speed_max,wind_speed_mean,cloud_cover,precipitation,events,wind_direction,owm_status
0,2016-3-10,thursday,10,3,2016,61,58,56,5,30.246102,85,15,10,92,0.67,Rain,170,Rain
1,2016-3-11,friday,11,3,2016,58,55,53,5,29.914481,76,30,20,92,0.48,Rain,201,Rain
2,2016-3-12,saturday,12,3,2016,58,56,55,3,30.274451,74,30,21,80,0.42,Rain,196,Rain
3,2016-3-13,sunday,13,3,2016,59,56,54,5,30.214505,84,25,18,92,0.54,Rain,200,Rain
4,2016-3-14,monday,14,3,2016,58,54,50,8,29.982104,66,20,13,71,0.0,Partly Cloudy,276,Rain
5,2016-3-15,tuesday,15,3,2016,60,55,51,9,30.093137,58,15,11,11,0.0,Clear,325,Rain
6,2016-3-16,wednesday,16,3,2016,65,59,54,11,29.964386,58,15,11,12,0.0,Clear,333,Clear
7,2016-3-17,thursday,17,3,2016,66,60,55,11,29.862508,56,10,7,0,0.0,Clear,26,Clear
8,2016-3-18,friday,18,3,2016,66,60,55,11,29.642509,58,10,7,0,0.0,Clear,337,Clear
9,2016-3-19,saturday,19,3,2016,66,60,54,12,29.885836,62,10,9,0,0.08,Partly Cloudy,211,Clear


In [15]:
forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 18 columns):
date                10 non-null object
dayofweek           10 non-null object
day                 10 non-null int64
month               10 non-null int64
year                10 non-null int64
temp_max            10 non-null int64
temp_mean           10 non-null int64
temp_min            10 non-null int64
temp_delta          10 non-null int64
sl_pressure_mean    10 non-null float64
humidity_mean       10 non-null int64
wind_speed_max      10 non-null int64
wind_speed_mean     10 non-null int64
cloud_cover         10 non-null int64
precipitation       10 non-null float64
events              10 non-null object
wind_direction      10 non-null int64
owm_status          10 non-null object
dtypes: float64(2), int64(12), object(4)
memory usage: 1.5+ KB


## Add in Features

[[back to top](#Sections)]

Still to do

- deltas


### Create end of week from dayofweek
- Friday, Saturday and Sunday will be considered end_of_week. Others are not

In [16]:
list_eow = ['friday', 'saturday', 'sunday']
forecast.loc[:,'end_of_week'] = forecast['dayofweek'].apply(lambda x : x in list_eow)
forecast['end_of_week'].value_counts()

True     5
False    5
Name: end_of_week, dtype: int64

In [17]:
def eow(s):
    if s =='friday' or s == 'saturday' :
        return True
    else:
        return False

In [18]:
#forecast.loc[:,'end_of_week'] = forecast['dayofweek'].map(eow)


### Create 'winter' feature for if month is Nov, Dec, Jan

In [19]:
forecast.loc[:,'winter'] = forecast['month'].apply(lambda x : x in [1, 11, 12])

### Create rain from status field
- Is there rain in the forecast

In [20]:
forecast.loc[:,'rain'] = forecast['owm_status'] == 'Rain'

### Sun time and Moon phase
- These features use the sf_sum_moon routine that I made using 'import ephem as ep' package.

In [21]:
forecast['sun_length'], forecast['moon_phase'] = zip(*forecast['date'].apply(sf_sun_moon))

### Sun Units

In [22]:
forecast['sun_units'] = forecast['sun_length'] * (1.0 - (forecast['cloud_cover']  * .01))

### Dew points
The following Features are calculated using the pywws conversions package
- Formula DewpointTemp = Temp - (9/25(100-Humidity))
- Weather underground does not return values for humidity max and min

In [23]:
#pywws.conversions.dew_point(temp, hum)[source]
forecast.loc[:,'dew_point_mean'] = forecast.apply( lambda x : cv.dew_point(x['temp_mean'], x['humidity_mean']), axis=1)

### Wind Chill
- Temperature and wind speed

In [24]:
#pywws.conversions.wind_chill(temp, wind)
forecast.loc[:,'wind_chill_mean'] = forecast.apply( lambda x : cv.wind_chill(x['temp_mean'], x['wind_speed_mean']), axis=1)
forecast.loc[:,'wind_chill_max'] = forecast.apply( lambda x : cv.wind_chill(x['temp_max'], x['wind_speed_max']), axis=1)

### Heat Index and Apparent Temperature
- Temperature and Humidity

In [25]:
#pywws.conversions.usaheatindex(temp, humidity, dew)
forecast.loc[:,'heat_index'] = forecast.apply( lambda x : cv.usaheatindex(x['temp_mean'], x['humidity_mean'], x['dew_point_mean']), axis=1)
#pywws.conversions.apparent_temp(temp, rh, wind)
forecast.loc[:,'temp_apparent'] = forecast.apply( lambda x : cv.apparent_temp(x['temp_mean'], x['humidity_mean'], x['wind_speed_mean']), axis=1)



In [26]:
forecast.head(10)

Unnamed: 0,date,dayofweek,day,month,year,temp_max,temp_mean,temp_min,temp_delta,sl_pressure_mean,...,winter,rain,sun_length,moon_phase,sun_units,dew_point_mean,wind_chill_mean,wind_chill_max,heat_index,temp_apparent
0,2016-3-10,thursday,10,3,2016,61,58,56,5,30.246102,...,False,True,706.183333,0.044071,56.494667,54.57839,58,61,252.393389,97.670619
1,2016-3-11,friday,11,3,2016,58,55,53,5,29.914481,...,False,True,708.6,0.07809,56.688,49.382408,55,58,191.059997,76.296685
2,2016-3-12,saturday,12,3,2016,58,56,55,3,30.274451,...,False,True,711.016667,0.11211,142.203333,49.806332,56,58,194.677835,77.434017
3,2016-3-13,sunday,13,3,2016,59,56,54,5,30.214505,...,False,True,713.433333,0.146129,57.074667,52.381465,56,59,226.087344,84.957533
4,2016-3-14,monday,14,3,2016,58,54,50,8,29.982104,...,False,True,715.85,0.180148,207.5965,45.63432,54,58,155.188045,73.424115
5,2016-3-15,tuesday,15,3,2016,60,55,51,9,30.093137,...,False,True,718.266667,0.214167,639.257333,44.05655,55,60,142.704708,73.289576
6,2016-3-16,wednesday,16,3,2016,65,59,54,11,29.964386,...,False,False,720.7,0.248186,634.216,47.761145,59,65,172.401817,83.531465
7,2016-3-17,thursday,17,3,2016,66,60,55,11,29.862508,...,False,False,723.116667,0.282205,723.116667,47.987308,60,66,174.038738,87.746292
8,2016-3-18,friday,18,3,2016,66,60,55,11,29.642509,...,False,False,725.533333,0.316224,725.533333,48.686703,60,66,180.345707,89.055088
9,2016-3-19,saturday,19,3,2016,66,60,54,12,29.885836,...,False,False,727.95,0.350243,727.95,50.02539,60,66,193.300625,90.27268


In [27]:
forecast.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 29 columns):
date                10 non-null object
dayofweek           10 non-null object
day                 10 non-null int64
month               10 non-null int64
year                10 non-null int64
temp_max            10 non-null int64
temp_mean           10 non-null int64
temp_min            10 non-null int64
temp_delta          10 non-null int64
sl_pressure_mean    10 non-null float64
humidity_mean       10 non-null int64
wind_speed_max      10 non-null int64
wind_speed_mean     10 non-null int64
cloud_cover         10 non-null int64
precipitation       10 non-null float64
events              10 non-null object
wind_direction      10 non-null int64
owm_status          10 non-null object
end_of_week         10 non-null bool
winter              10 non-null bool
rain                10 non-null bool
sun_length          10 non-null float64
moon_phase          10 non-null float64
sun_units      

## Write forecast data to file

[[back to top](#Sections)]

In [28]:
forecast.to_csv('forecast_10day.csv')