Skip to content
Marco Bonvini edited this page Feb 27, 2016 · 5 revisions

Analyze energy data with Pandas

In this example I'd like to show something that can help handling a common problem: making sense of a dataset with multiple time series of the same type. I'll show how to solve such a problem using a data structure provided by Pandas, the Panel.

Pandas data structures

Before talking about the Panel I'd like to take a few steps back and briefly describe other common data structures that are available in Pandas and that are leveraged by the Panels.

First things first, because we deal with time series we have the Series. A Series is an object that contains an array of values with an index associated to them. For a time series the index can contain date and time values but it could be anything.

Second we have the DataFrame. If you think the Series as a single column, the DataFrame is a table. The DataFrame is an object that groups multiple Series that share the same index. The DataFrame is one of the most common data structures in Pandas. DataFrames let you work as if you had a very fast in memory data base, perfect for analyzing data sets up to few GB.

Now we're ready for the Panel. The Panel is the natural extension of the DataFrame and can be seen as a 3D table, or a collection of multiple DataFrames.

Power consumption by the houses

Using Pandas' Panel

I'd like to show how to use a Panel in order to quickly visualize data and explore a data set. Given my recent involvement with energy metering solutions I'll take an example from the electrical domain.

Imagine we're monitoring five different houses, and for every house we have a data set that contains voltage current and power with a time resolution of two minutes. We're given five different CSV files containing the data. Every file looks like this

date,Vrms,Irms,Power
Sat Jan 09 2016 00:00:00 GMT-0800 (PST),122.89474233,0.751786349902,92.2478478392
Sat Jan 09 2016 00:02:00 GMT-0800 (PST),122.89474233,0.751786349902,92.2478478392
Sat Jan 09 2016 00:04:00 GMT-0800 (PST),122.89474233,0.751786349902,92.2478478392
Sat Jan 09 2016 00:06:00 GMT-0800 (PST),122.89474233,0.751786349902,92.2478478392
...

and named like house<#house>.csv.

This is a quite common situation. We have N data sets containing time series data with a homogeneous data structure. If you think about it we have have to deal with three dimensions, and the Panel seems just the right data structure for this kind of job. The dimensions we'll consider are

  1. the different houses (items in Panel-lingo)
  2. time index (major-axis in Panel-lingo)
  3. the measured values: voltage, current and power (the minor-axis in Panel-lingo)

Organizing the data in a Panel makes easy to look at different variables for individual houses or compare the same variables across different houses. The most noticeable thing is that we'll write just few lines of code to do this. This is one of the advantages of using Pandas!

About lambda and map

A few "Pythonic" notes before diving into the script. I'll make use of two concepts that come handy: lambda functions and the map operator.

Let's go with an example for lambda functions

f = lambda x: x+1

f is a lambda function that is equivalent to

def f(x):
  return x+1

So lambda functions are just functions, but more convenient to be declared without writing too many lines of code.

Now let's see the map operator. The map operator maps the element of an iterable object (e.g., an array or a list) to an other. Imagine you have a list that contains power measurements in W, and you want to convert them to kW, you can do it with a map operator

values_W = [1000.0, 1200.0, 3050.0]
values_kW = map(lambda x: x/1000.0, values_W)

the results is

values_W = [1.0, 1.2, 3.05]

basically every value of the original list has been mapped via the lambda function to a new value (watts to kilowatts).

Code and results

Now I suggest you to look at the script and how the data are processed. Below there are few images created by the script.

Power consumption of the houses

This plot shows the power consumption of the different houses over time period being analyzed.

Power consumption by the houses

Energy used by the houses

This plot shows the energy used by the different houses over the time period being analyzed.

Energy used by the houses

Power consumption per hour of the day

This plot shows the power consumption grouped by hour of the day. Such a plot can be helpful in identifying patterns of usage or hours when the loads are generally higher.

Power consumption per hour

Load duration curve

The load duration curve is a plot that help understand how much time is spent when the power consumption is in a certain range. Such a plot helps understand which operational conditions should be improved to obtain substantial savings.

Load duration curve