-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle diffuse environmental sensor data #41
Comments
Playing around with ideas for an Environment class: class Enviroment(object):
"""Store and process environmental data from multiple SensorStations.
Attributes
----------
sensor_stations : dict of SensorStations
Local store of environmental sensor data.
Each key is a (latitude, longitude) tuple representing the location of
the sensor station; where a `sensor station` is a set of sensors all
installed at the same physical location; either by a national
meteorological office or by interested individuals.
Each value is a SensorStation storing environmental data using
standard physical units and standard column names. e.g.
'temperature' | 'rainfall' | 'sunshine'
targets : list
A list of tuples representing target locations and measurements in the
form (<latitude>, <longitude>, <measurements>).
Usage
-----
> london = (51.51125, -0.10849) # latitude, longitude
> enviro = Environment()
> enviro.register_target(london, ['temperature'])
enviro will now asynchronously find suitable data. We can help by
loading some data from disk:
> enviro.load_sensor_station_from_disk('heathrow_metoffice.h5')
Now we can get data:
> data, distance = enviro.estimate_measurement_at(london)
"""
def register_target(self, target_location, measurements,
start=None, end=None, k=3):
"""Register a target of interest.
This function returns immediately. Asynchronously, `Environment` will
search thingful.net for sensors near the target's geo location.
Arguments
---------
target_location : (latitude, longitude) pair
measurements : list of strings
'temperature' | 'rainfall' | 'sunshine' | etc...
start, end : DateTime, optional
Specify the start and end time of interest
k : int, optional
The maximum number of nearest `sensor_stations` to
use in the interpolation.
"""
return
def get_estimate_measurement_at(self, target_location, measurement=None,
start=None, end=None):
"""Get a timeseries of sensor data nearest target_location.
This function finds the `k` nearest `sensor_stations` to
`target_location` and interpolates readings for a
specific `measurement` and a specific time `period`. It searches local
data stored in `sensors_stations` first; if it can't find suitable data
then it queries known sources of data from the network and tries to
find new sources using thingful.net (an index of public IoT sensors).
Arguments
---------
target_location: (latitude, longitude) pair
measurement : string
'temperature' | 'rainfall' | 'sunshine' | etc...
start, end : DateTime, optional
Specify the start and end time of interest
Returns
-------
data, distance
data : pandas.Series
The interpolated data.
distance : float
The distance (in km) from `target_location` to the
nearest `sensor_station`.
"""
# Could use geopy to measure distances
# https://github.com/geopy/geopy
# To start with, let's just use simple linear interpolation.
# Later down the line (perhaps as an MSc project?) we could
# explore more sophisticated interpolation, e.g.:
#
# Osborne, Michael A., Roberts, Stephen J., Rogers, Alex and
# Jennings, Nicholas R. (2012) Real-time information
# processing of environmental sensor network data using
# Bayesian Gaussian processes. ACM Transactions on Sensor
# Networks, 9, (1), 1:1-1:32. (doi:10.1145/2379799.2379800)
#
# See this twitter conversation:
# https://twitter.com/acr_ecs/status/410891244983058433
return
def load_sensor_station_from_disk(self, filename):
return
def plot_sensor_stations_on_map(self):
# Could use basemap:
# http://matplotlib.org/basemap/
return
def update_sensor_data_from_network(self):
return
class SensorStation(object):
"""Store and process environmental data from a sensor station
Attributes
----------
data : DataFrame
Store environmental data using
standard physical units and standard column names. e.g.
'temperature' | 'rainfall' | 'sunshine'
source_url : string
Local or network uniform resource locator for source of data
last_updated: DateTime
The date and time we last pulled data from `source_url`
measurements_available_from_source : list of strings
"""
def update_from_source(self):
return
def save_to_disk(self):
return
def load_from_disk(self, filename):
return
class UKMetOffice(SensorStation):
def load_native_from_disk(self, filename):
"""Loading and convert UK Met Office CSV files.""" For reference, here's my (ugly) code from PDA for importing data from a UK metoffice .xls file. For converting between Thoughts? |
I've just updated the code sketch above. Some new features:
I've spoken to some folks about the idea of having a web service which aggregates live, public, environmental sensor data (e.g. from Xively, national metoffices, smart phones etc). Users would be able to query to service to ask “give me for over at ”. e.g. “give me temperature for SE15 over the last month at hourly resolution” The service would find the K nearest measurement stations (using thingful.net) to the target location and then interpolate spatially and temporarily to produce the output the user wants. The service would take data from as many heterogeneous sources as possible and handle dodgy input data. It might also use simple models to make the spatial and temporal interpolation vaguely smart. I might propose this project as an MSc individual project over summer. |
In my own work, I'm interested in using, for example, weather data recorded from the local metoffice weather station to improve disaggregation performance.
The question is: how do we store this diffuse environmental sensor data? It feels that this data doesn't belong in a
Building
. And, of course, we were scratching our heads a little bit over how to represent external data in buildings (#12).I wonder if we should have a new class for Environment data (would we call this class 'Environment' or 'External' or 'ExternalSensors' or 'Weather' or something else?).
Some use cases:
Some data sources:
For the third option: I'd propose that we don't store any external environmental data inside
Building
. Instead, if a dataset happens to provide external environmental data recorded at the same geo location as a building, then I'd propose that we put that environmental data into ourEnvironment
object (tagged with the geo location of the building) and then provide a reference from thebuilding
to the environmental object.I guess our Environment class would need to store:
Then we could pass this environment object into our
disaggregator.train()
anddisaggregator.disaggregate()
methods, as well as variousnilmtk.stats
functions.What do you think?!
The text was updated successfully, but these errors were encountered: