## Step 1 - `Weather` class

Begin by importing the Weather class. Weather data is accessed through an instantation of the provided `Weather` class.

In [1]:
from Weather import *

## Step 2 - Load Weather data

Create a new object (e.g. `weather`) of type `Weather`. As part of the instantiation provide:

- weatherFile: The filename (either basic.txt or advanced.txt)
- fileSlice: The number of lines to read from the chosen input file (0 is all)

Use the `fileSlice` to limit the sample size for early evaluation

In [2]:
weatherFile = 'data/basic.txt'
fileSlice = 0
weather = Weather(weatherFile, fileSlice)

## Step 3 - Inspect the Data

In [3]:
print '#'*50
print '# Step 3: Inspect the Data'
print '#'*50
print '\n'

# print data
print 'Weather Data:'
print weather.data

# print number of entries
print 'Number of entries: %s' % (weather.getNrEntries())

# print target names
print 'Number of targets: %s' % (weather.getNrTargets())

print 'Target names: %s' % (weather.getTargetNames())

# print features
print 'Number of features: %s' % (weather.getNrFeatures())

print 'Feature names: %s' % (weather.getFeatures())

# uncomment below to print station data
# print 'Number of weather stations: %s' % (weather.getNrStations())
# print 'Stations (ID, Name, Latitude, Longitude)'
# print weather.getStationData('all')

# Edinburgh and Shap station details
print 'Station data for EDINBURGH/GOGARBANK: %s' % (weather.getStationData('EDINBURGH/GOGARBANK'))
print 'Station data for ID 3225: %s' % (weather.getStationData('3225'))

# get data from one feature
print 'Temperature data: %s' % (weather.getFeatureData('Temperature'))

##################################################
# Step 3: Inspect the Data
##################################################


Weather Data:
[['3002' 'BALTASOUND' '15.0' ..., '9.9' '97.4' '2']
 ['3002' 'BALTASOUND' '15.0' ..., '10.1' '97.4' '2']
 ['3002' 'BALTASOUND' '15.0' ..., '10.3' '97.4' '1']
 ..., 
 ['99081' 'NORTH_WYKE' '177.0' ..., '6.1' '88.2' '1']
 ['99081' 'NORTH_WYKE' '177.0' ..., '3.5' '79.3' '0']
 ['99081' 'NORTH_WYKE' '177.0' ..., '3.8' '86.9' '1']]
Number of entries: 46706
Number of targets: 3
Target names: ['Clear' 'Cloudy' 'Precipitation']
Number of features: 17
Feature names: ['Station ID' 'Station Name' 'Elevation' 'Latitude' 'Longitude' 'Date'
 'Time since midnight' 'Gust' 'Temperature' 'Visibilty' 'Wind Direction'
 'Wind Speed' 'Pressure' 'Pressure Trend' 'Dew Point' 'Humidity'
 'Weather Type']
Station data for EDINBURGH/GOGARBANK: ['3166' 'EDINBURGH/GOGARBANK' '55.928' '-3.343']
Station data for ID 3225: ['3225' 'SHAP' '54.501' '-2.684']
Temperature data: ['1

## Step 4 - Recovering Incomplete Data

Some of the observation values have a value of `-99999`. This is a default value I inserted to indicate that the feature data was either not collected at the time of the observation or had a null value. 

Any data points that contain null observations need to be corrected to avoid problems with subsequent filtering and modifications.

In some cases null values can either be interpolated or set to a default value.

The large majority of null data is from the `Gust` measurement Here I assume than no observation is the same as a zero value.

In [4]:
# zero any null gust measurements
newG = ['0' if g == '-99999' else g for g in weather.getFeatureData('Gust')]
weather.modify('Gust', newG)
#print weather.getFeatureData('Gust')

0

## Step 5 - Removing Incomplete Data

After recovering any data ensure you run the `discard()` method to
remove any data with remaining null observations.

In [5]:
weather.discard()

## Step 6 - Data Conversion

Some of the features have observation values that will be difficult for a machine learning estimator to interpret correctly (e.g. Wind Direction).

You should ensure that all the features selected for a machine learning classification have a numeric value.

In example 1 the pressure trend is changed from Falling, Static, Rising to 0,1,2
In example 2 the Wind Direction is changed to a 16 point index starting from direction NNE.

**Important**: Due to the limitations with the `Weather` class ensure that any observation data remains type `string` (e.g store '1' **not** 1). The `export()` method will convert all the values from `string` to `float` just before the export.

In [6]:
# Example 1 - Enumerate Pressure Trend (-1 falling, 0 static, 1 rising)

# define types
pTType = ['F', 'S', 'R']

# generate new pressure trend values
newPT = [str(pTType.index(p) - 1) for p in weather.getFeatureData('Pressure Trend') ]

# modify dataset
weather.modify('Pressure Trend', newPT)

#print 'Pressure Trend: %s' % (weather.getFeatureData('Pressure Trend'))

# Example 2 - Enumerate Wind direction (use 16 point compass index)

# define types
compassRose = ['NNE', 'NE', 'ENE', 'E', 'ESE', 'SE', 'SSE', 'S', 'SSW', 'SW', 'WSW', 'W', 'WNW', 'NW', 'NNW', 'N']

# generate and modify Wind direction
weather.modify('Wind Direction', [compassRose.index(w) for w in weather.getFeatureData('Wind Direction')])

#print 'Wind Direction: %s' % (weather.getFeatureData('Wind Direction'))


0

## Step 7 - Data Extraction

The `getObservations()` method will enable you to filter the available data by Station ID, date, time and a selected feature.

This may be helpful if you want to build an additional input feature for classification based on contextual information

The example below retrieves the temperature and dew point for Edinburgh for 24th October.

In [7]:
print '\n'
print '#'*50
print '# Step 7: Data Extraction'
print '#'*50
print '\n'

stationId = weather.getStationData('EDINBURGH/GOGARBANK')
features = ['Time since midnight', 'Temperature', 'Dew Point']
print 'Temperature and Dew Point measurements for Edinburgh 24th October'
print '(Time since midnight (min), Temperature, Dew Point)'
print weather.getObservations('3166', obsDate='2017-10-24', features=features)



##################################################
# Step 7: Data Extraction
##################################################


Temperature and Dew Point measurements for Edinburgh 24th October
(Time since midnight (min), Temperature, Dew Point)
[['0' '10.5' '8.0']
 ['60' '9.8' '7.2']
 ['120' '9.9' '6.8']
 ['180' '9.7' '6.3']
 ['240' '10.9' '6.0']
 ['300' '11.8' '6.7']
 ['360' '11.1' '8.2']
 ['420' '10.8' '8.6']
 ['480' '11.7' '9.4']
 ['540' '12.1' '10.0']
 ['600' '12.9' '10.1']
 ['660' '13.5' '11.1']
 ['720' '13.9' '11.3']
 ['780' '14.1' '10.2']
 ['840' '13.6' '9.5']
 ['900' '13.3' '9.2']
 ['960' '12.7' '9.2']
 ['1020' '12.3' '8.8']
 ['1020' '12.3' '8.8']
 ['1080' '12.0' '8.6']
 ['1140' '12.1' '8.1']
 ['1200' '11.5' '6.5']
 ['1260' '11.0' '7.3']
 ['1320' '10.5' '7.6']
 ['1380' '10.2' '8.1']]


This can then be combined with location data. Here, the Pressure, Pressure Trend and Wind direction from the nearest weather station 100km NW of Edinburgh for 24th October is shown.

In [8]:
stationId = weather.getStationData('EDINBURGH/GOGARBANK')

# get nearest stations 100k NW of Edinburgh station within a 75km threshold
nearestStations = weather.findStations([stationId[2], stationId[3]], ['100', '-45'], maxThreshold=75)

print '\n'
print 'Nearest stations 100km NW of EDINBURGH/GOGARBANK'
for s in nearestStations:
    print s

# use nearest station (index 0 )
nearStationId = nearestStations[0]

# get observations from nearest station on 24/10
obsDate='2017-10-24'
print '\n'
print 'Using station %s on %s' % (nearStationId[1], obsDate)
features = ['Time since midnight', 'Pressure', 'Pressure Trend', 'Wind Direction']
print '(Time since midnight (min), Pressure, Pressure Trend, Wind Direction)'
print weather.getObservations(nearStationId[0], obsDate=obsDate, features=features)



Nearest stations 100km NW of EDINBURGH/GOGARBANK
['3047', 'TULLOCH_BRIDGE', '56.867', '-4.708', 36.611322084864284]
['3144', 'STRATHALLAN', '56.326', '-3.729', 53.831339422981124]
['3134', 'GLASGOW/BISHOPTON', '55.907', '-4.533', 72.4878425483826]


Using station TULLOCH_BRIDGE on 2017-10-24
(Time since midnight (min), Pressure, Pressure Trend, Wind Direction)
[['0' '1009' '-1' '9']
 ['60' '1009' '-1' '7']
 ['120' '1008' '-1' '8']
 ['180' '1007' '-1' '7']
 ['240' '1006' '-1' '8']
 ['300' '1005' '-1' '1']
 ['360' '1004' '-1' '8']
 ['420' '1003' '-1' '7']
 ['480' '1003' '-1' '7']
 ['540' '1003' '-1' '10']
 ['600' '1004' '1' '12']
 ['660' '1004' '1' '9']
 ['720' '1005' '1' '9']
 ['780' '1005' '1' '9']
 ['840' '1006' '1' '8']
 ['900' '1006' '1' '8']
 ['960' '1006' '1' '8']
 ['1020' '1006' '1' '8']
 ['1020' '1006' '1' '8']
 ['1080' '1006' '1' '8']
 ['1140' '1006' '-1' '8']
 ['1200' '1005' '-1' '8']
 ['1260' '1005' '-1' '8']
 ['1320' '1005' '-1' '8']
 ['1380' '1005' '-1' '8']]


## Step 8 - Add new features

You may get better insights into underlying patterns in the observations by extacting the provided data to generate new features.

An example using the wind direction is shown below. The direction *relative* to the North and to the West is generated and appended to the dataset.

In [9]:
# set relative direction (assume 16 points)
points = 16
north = [abs((points / 2) - int(w))%(points / 2) for w in weather.getFeatureData('Wind Direction')]
west = [abs((points / 2) - int(w) - (points / 4))%(points / 2) for w in weather.getFeatureData('Wind Direction')]

# append to dataset
weather.append('Wind Relative North', north)
weather.append('Wind Relative West', west)

0

## Step 9 - Select features

To finish create an array of strings containing a subset of the features you feel will perform best in the classification. Call the select() method to filter the data

In [10]:
features = ['Temperature', 'Visibilty', 'Pressure', 'Pressure Trend', 'Humidity']
weather.select(features)

0

## Step 10 - Export data

Run the `export()` method to write the data of your selected features to file as a `pickle` object.

This will move the target data ('Weather Type') into a new variable (`target`).

**Note**: It is assumed that the *Station ID*  and *Station Name* will not be used as features for classification and are automatically stripped. The `export()` method will also strip out incomplete data before exporting to file.

In [11]:
weather.export('data/mldata.p')

0