Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle diffuse environmental sensor data #41

Closed
JackKelly opened this issue Dec 10, 2013 · 2 comments
Closed

Handle diffuse environmental sensor data #41

JackKelly opened this issue Dec 10, 2013 · 2 comments

Comments

@JackKelly
Copy link
Contributor

In my own work, I'm interested in using, for example, weather data recorded from the local metoffice weather station to improve disaggregation performance.

The question is: how do we store this diffuse environmental sensor data? It feels that this data doesn't belong in a Building. And, of course, we were scratching our heads a little bit over how to represent external data in buildings (#12).

I wonder if we should have a new class for Environment data (would we call this class 'Environment' or 'External' or 'ExternalSensors' or 'Weather' or something else?).

Some use cases:

  1. Generate correlations of appliance activity against weather variables (e.g. see Jack's UKERC 2013 poster for some examples)
  2. Improve disaggregation performance by using weather data

Some data sources:

  1. weather data from national weather service
  2. crowd-sourced environmental data
  3. external weather data which happens to be recorded with building's power dataset

For the third option: I'd propose that we don't store any external environmental data inside Building. Instead, if a dataset happens to provide external environmental data recorded at the same geo location as a building, then I'd propose that we put that environmental data into our Environment object (tagged with the geo location of the building) and then provide a reference from the building to the environmental object.

I guess our Environment class would need to store:

  • a collection (list? dict?) of sub-objects. each sub-object represents a sensing installation and would need to store geo-location of the sensing station and timeseries for the sensor data.

Then we could pass this environment object into our disaggregator.train() and disaggregator.disaggregate() methods, as well as various nilmtk.stats functions.

What do you think?!

@JackKelly
Copy link
Contributor Author

Playing around with ideas for an Environment class:

class Enviroment(object):
    """Store and process environmental data from multiple SensorStations.

    Attributes
    ----------
    sensor_stations : dict of SensorStations
        Local store of environmental sensor data.
        Each key is a (latitude, longitude) tuple representing the location of
        the sensor station;  where a `sensor station` is a set of sensors all
        installed at the same physical location; either by a national
        meteorological office or by interested individuals.
        Each value is a SensorStation storing environmental data using
        standard physical units and standard column names.  e.g.
        'temperature' | 'rainfall' | 'sunshine'

    targets : list
        A list of tuples representing target locations and measurements in the
        form (<latitude>, <longitude>, <measurements>).


    Usage
    -----

    > london = (51.51125, -0.10849) # latitude, longitude
    > enviro = Environment()
    > enviro.register_target(london, ['temperature'])

    enviro will now asynchronously find suitable data.  We can help by
    loading some data from disk:

    > enviro.load_sensor_station_from_disk('heathrow_metoffice.h5')

    Now we can get data:

    > data, distance = enviro.estimate_measurement_at(london)

    """

    def register_target(self, target_location, measurements,
                        start=None, end=None, k=3):
        """Register a target of interest.

        This function returns immediately.  Asynchronously, `Environment` will
        search thingful.net for sensors near the target's geo location.

        Arguments
        ---------
        target_location : (latitude, longitude) pair

        measurements : list of strings
           'temperature' | 'rainfall' | 'sunshine' | etc...

        start, end : DateTime, optional
            Specify the start and end time of interest

        k : int, optional
            The maximum number of nearest `sensor_stations` to 
            use in the interpolation.

        """
        return

    def get_estimate_measurement_at(self, target_location, measurement=None, 
                                    start=None, end=None):
        """Get a timeseries of sensor data nearest target_location.

        This function finds the `k` nearest `sensor_stations` to
        `target_location` and interpolates readings for a
        specific `measurement` and a specific time `period`.  It searches local
        data stored in `sensors_stations` first; if it can't find suitable data
        then it queries known sources of data from the network and tries to
        find new sources using thingful.net (an index of public IoT sensors).

        Arguments
        ---------
        target_location: (latitude, longitude) pair

        measurement : string
           'temperature' | 'rainfall' | 'sunshine' | etc...

        start, end : DateTime, optional
            Specify the start and end time of interest

        Returns
        -------
        data, distance

        data : pandas.Series
            The interpolated data.

        distance : float
            The distance (in km) from `target_location` to the
            nearest `sensor_station`.
        """

        # Could use geopy to measure distances
        # https://github.com/geopy/geopy

        # To start with, let's just use simple linear interpolation.
        # Later down the line (perhaps as an MSc project?) we could
        # explore more sophisticated interpolation, e.g.:
        #
        # Osborne, Michael A., Roberts, Stephen J., Rogers, Alex and
        # Jennings, Nicholas R. (2012) Real-time information
        # processing of environmental sensor network data using
        # Bayesian Gaussian processes. ACM Transactions on Sensor
        # Networks, 9, (1), 1:1-1:32. (doi:10.1145/2379799.2379800)
        #
        # See this twitter conversation:
        # https://twitter.com/acr_ecs/status/410891244983058433

        return

    def load_sensor_station_from_disk(self, filename):
        return

    def plot_sensor_stations_on_map(self):
        # Could use basemap:
        # http://matplotlib.org/basemap/
        return

    def update_sensor_data_from_network(self):
        return


class SensorStation(object):
    """Store and process environmental data from a sensor station

    Attributes
    ----------
    data : DataFrame
        Store environmental data using
        standard physical units and standard column names.  e.g.
        'temperature' | 'rainfall' | 'sunshine'

    source_url : string
        Local or network uniform resource locator for source of data

    last_updated: DateTime
        The date and time we last pulled data from `source_url`

    measurements_available_from_source : list of strings
    """

    def update_from_source(self):
        return

    def save_to_disk(self):
        return

    def load_from_disk(self, filename):
        return


class UKMetOffice(SensorStation):
    def load_native_from_disk(self, filename):
        """Loading and convert UK Met Office CSV files."""

For reference, here's my (ugly) code from PDA for importing data from a UK metoffice .xls file.

For converting between (latitude, longitude) and human-readable addresses; and for calculating distances between two points, we could use geopy: "geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources." and can do distance calcs. MIT Licensed and is on PyPI.

Thoughts?

@JackKelly
Copy link
Contributor Author

I've just updated the code sketch above. Some new features:

  • usage example!
  • users can now register targets of interest and system with asynchronously find nearby sensor stations from thingful.net
  • new SensorStation class

I've spoken to some folks about the idea of having a web service which aggregates live, public, environmental sensor data (e.g. from Xively, national metoffices, smart phones etc). Users would be able to query to service to ask “give me for over at ”. e.g. “give me temperature for SE15 over the last month at hourly resolution” The service would find the K nearest measurement stations (using thingful.net) to the target location and then interpolate spatially and temporarily to produce the output the user wants. The service would take data from as many heterogeneous sources as possible and handle dodgy input data. It might also use simple models to make the spatial and temporal interpolation vaguely smart.

I might propose this project as an MSc individual project over summer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant