# Introduction

This notebooks provides the minimum necessary code to access data from egauge's API, the software currently used in both DHHL and the Frogs Project. For an overview of the architecture, see the readme.md file of the project where is notebook is hosted.



## 1. Required Input Parameters

### 1.1 Host

Currently eGauge API is [hosted by the eGauge company](https://www.egauge.net/eguard/). Contrary to the usual setup where the data resides in a database and the API exposes access to it in a server, the eGauge API does **not** store any eGauge data. Instead, the data is stored in the sensor itself. The egaugeAPI serves only as a in between layer between this Python Notebook and the sensor, in order to abstract how to communicate with the sensor directly. Because of that, **the sensors must be connected to the internet, regardless of where they are deployed, in order for the data to be available via API at any point in time**. 

While the eGauge website requires authentication to visualize the sensor IDs (required to request data from the API) and their physical location, once the sensor ID is known the access to the data itself does not require authentication. As such, provided the sensor ID is known, any user in the world can request data from the sensor via the API.

The API website where sensor IDs is are available is shown below for reproducibility purposes:


![egauge_portfolio](img/egauge_portfolio.png)

As it can be observed, on the **Status** and **Available** columns, because the data is available only in the sensor, requesting data from a particular sensor may require several attempts if the requests are automated. Accordingly, the specified time range upon every request must be updated over every failed attempt. 

Stored data is updated once a minute, and it can be fetched using the following host template [1]:

``` 
http://egauge{}.egaug.es/cgi-bin/egauge-show?
```

In [220]:
host = 'http://egauge{}.egaug.es/cgi-bin/egauge-show?'

### 1.2 Sensor 

You should contact a member of ERDL if you are interested in data by any specific sensor managed by our group. In this example notebook, we will use the eGauge whose sensor ID `31871`. 

In [221]:
sensor_id = 34111 #ACC House 1 ; Purpose ID 136

The list of available parameters [1] at the time this Python notebook was made is shown below for reproducibility. 

![eguard_api_parameters](img/egauge_parameters.png)

We only use 5 of the ones available above.

### 1.3 Start and End Date

The start and end date are specified by parameters **t** (14th row top-bottom) and **f** (13th row top-bottom) respectively, and must be specified in [unix timestamp](https://en.wikipedia.org/wiki/Unix_time).  

The following function converts a timestamp in the human readable format to unix timestamp by using the following format: `<date>T<time><timezone>`, where:

 * `<date>`: YYYY-MM-DD
 * `<time>`: hh:mm:ss
 * `<timezone>`: (-)zz
 
Hawaii Timezone is -10. 

In [222]:
import arrow
iso_start_datetimez = '2017-09-13T15:19:00-10'
iso_end_datetimez = '2017-09-13T15:25:00-10'

start_timestamp = arrow.get(iso_start_datetimez).timestamp
end_timestamp = arrow.get(iso_end_datetimez).timestamp

### 1.4. Time Resolution 

The time resolution, as specified by parameter **m** (6th row top-bottom), specifies the output resolution will be in **minutes**. 

In [223]:
minutes = 'm'

### 1.5. Export Format 

The export format, as specified by parameter **c** (4th row top-bottom), specifies the output will be in **CSV**.

In [224]:
output_csv = 'c'

### 1.6. Compression

The compression method, as specified by parameter `C` (10th row top-bottom), specifies the use of **delta-compression**. In practice, the use of compression option return the readings in **Power (kW)**  units, whereas omitting this parameter returns the table in **Energy (kWh)**. 

Retrieving the data as **Power** is preferable, since we can simple sum **Power** over the time window of interest to derive **Energy (kWh)**, but the inverse is not true unless **Power** was constant, which is not in the case for the usage of eGauge in ERDL.

**However**, the usage of **Power (kW)**, contrary to intuition, require some caution. The details are discussed in section 3.

In [225]:
delta_compression = 'C'

## 2. Request Readings 

In [226]:
import requests

host = host.format(sensor_id) + '&' + minutes + '&' + output_csv + '&' + delta_compression 
time_window = {'t': start_timestamp, 'f': end_timestamp}

print(host)

http://egauge34111.egaug.es/cgi-bin/egauge-show?&m&c&C


In [233]:
r = requests.get(host,params=time_window)
print(r)

<Response [200]>


The returned .csv will be in the format of a string (including `\n`, so we interpret it accordingly.

In [229]:
import pandas as pd
from io import StringIO #To parse the String as a .csv file
df = pd.read_csv(StringIO(r.text))
df

Unnamed: 0,Date & Time,Usage [kW],Generation [kW],Other [kW],Water heater [kW],Whole house [kW],ACC [kW],AHU [kW],Dryer [kW],Range [kW],Kitchen refridgerator [kW]
0,1505352240,-0.0,-0.0,0.47775,0.069967,1.95325,1.252933,0.138517,0.012717,0.000867,0.0005
1,1505352180,-0.0,-0.0,0.477533,0.100517,1.976467,1.25335,0.139433,0.004317,0.000833,0.000483
2,1505352120,-0.0,-0.0,0.477367,0.063583,1.978717,1.243667,0.1396,0.001483,0.0009,0.052117
3,1505352060,-0.0,-0.0,0.534883,0.003533,2.026917,1.24665,0.139833,0.0015,0.000917,0.0996
4,1505352000,-0.0,-0.0,0.54565,0.003567,2.0377,1.246667,0.139633,0.0015,0.00085,0.099833
5,1505351940,-0.0,-0.0,0.549133,0.00355,2.042467,1.24755,0.139583,0.001483,0.000917,0.10025


At this point, this notebook showcased how to obtain data from any eGauge sensor in ERDL. However, the interpretation of **Power** above, when the choice of the granularity is **minute** lead to some complications that may comprimise the interpretation of the data. The following section discuss them in more detail. 

## 3. Power (kW) Interpretation - An Example

Introducing the complications behind Power (kW) at **minute** granularity (parameter 1.4) is easier by first explaining how **delta compression** (parameter 1.6) works at **hourly** granularity, and then demostrating how the mechanism, contrary to intuition, **does not** work at **minute** level. The section concludes with the necessary workaround to use **Power** at **minute** level.

Let's first extract some data from the same sensor but in a **6 hour time window** (parameter 1.3) at a **hour** resolution, contrary to above, where our time window was **6 minutes** at a **minute resolution**. 

The other parameters remain as previously specified. 

### 3.1 Power at Hourly Resolution

We modify the time window from 6 minutes to 4 hours, and change the resolution to hour: 

In [242]:
iso_start_datetimez = '2017-09-13T15:25:00-10' 
iso_end_datetimez = '2017-09-13T21:25:00-10' #hour (after T) was adjusted from 15 to 21. 

start_timestamp = arrow.get(iso_start_datetimez).timestamp
end_timestamp = arrow.get(iso_end_datetimez).timestamp

In [243]:
hour = 'h' #previously minutes = 'm'

With the modified parameters, we again request for the Power information in a hour granularity for a different wime window:

In [244]:
host = host + '&' + hour + '&' + output_csv + '&' + delta_compression 
time_window = {'t': start_timestamp, 'f': end_timestamp}
r = requests.get(host,params=time_window)
df = pd.read_csv(StringIO(r.text))
df

Unnamed: 0,Date & Time,Usage [kW],Generation [kW],Other [kW],Water heater [kW],Whole house [kW],ACC [kW],AHU [kW],Dryer [kW],Range [kW],Kitchen refridgerator [kW]
0,1505368800,-0.0,-0.0,0.591079,0.003283,1.833674,1.110716,0.087945,0.002367,0.001001,0.037283
1,1505365200,-0.0,-0.0,0.540458,0.003281,1.841766,1.130589,0.137627,0.001687,0.000921,0.027204
2,1505361600,-0.0,-0.0,0.527071,0.003948,1.852676,1.162414,0.137493,0.001637,0.000946,0.019168
3,1505358000,-0.0,-0.0,0.443819,0.007247,1.823622,1.198753,0.13818,0.0016,0.000939,0.033084
4,1505354400,-0.0,-0.0,0.424791,0.020889,2.204102,1.232093,0.138607,0.338874,0.000944,0.047904


Next, let's do the same request but **without delta compression**, so we obtain **Energy (kWh)** for the same time window. 

### 3.2 Energy at Hourly Resolution

In [246]:
host = 'http://egauge{}.egaug.es/cgi-bin/egauge-show?'
host = host.format(sensor_id) + '&' + hour + '&' + output_csv ## delta_compression variable removed
r = requests.get(host,params=time_window)
string_csv_file = r.text    
df = pd.read_csv(StringIO(r.text))
df

Unnamed: 0,Date & Time,Usage [kWh],Generation [kWh],Other [kWh],Water heater [kWh],Whole house [kWh],ACC [kWh],AHU [kWh],Dryer [kWh],Range [kWh],Kitchen refridgerator [kWh]
0,1505372400,0.0,0.0,914.677966,40.66743,3374.120953,1631.121273,179.117595,326.280238,207.852151,74.4043
1,1505368800,0.0,0.0,914.086887,40.664147,3372.287279,1630.010558,179.02965,326.277871,207.85115,74.367017
2,1505365200,0.0,0.0,913.546429,40.660866,3370.445513,1628.879968,178.892024,326.276184,207.850229,74.339813
3,1505361600,0.0,0.0,913.019358,40.656918,3368.592838,1627.717554,178.75453,326.274547,207.849284,74.320645
4,1505358000,0.0,0.0,912.575539,40.649671,3366.769215,1626.518802,178.61635,326.272947,207.848344,74.287562
5,1505354400,0.0,0.0,912.150748,40.628782,3364.565113,1625.286709,178.477743,325.934073,207.8474,74.239658


First, note the column names are now expressed in Energy (kWh). It is not specified explicitly in the API what is the time window in which Energy was calculated from **Power**, or how it handles missing readings. However, we can observe how **delta-compression** works by comparing both the Power and Energy table of this section 3.

Consider the following image that illustrates how **delta-compression** works:

![delta_compression](img/delta_compression.png)

Considering the image above, we can consider the **original data stream** as the equivalent of any **column** of the **Energy Table** (let's consider for example **ACC [kWh]**).

Accordingly, we can consider **delta encoded** as any **column** of the **Power Table** (consistently, let's consider the column **ACC [kWh]**).

Since parmaeter `C` express **delta compression**, we should expect the mechanism illustrated above to hold true. Indeed, at **the hour granularity** we can observe this happens (but we will soon see, contrary to intuition, it **does not** for minute level).

Because time is ordered from bottom-up in the table (bottom row is the oldest, and top row is the most recent), the figure above using **ACC [kWh]** is as follows:

**Energy (original data stream) **: 

```
1625.286709 | 1626.518802 | 1627.717554 ....
```

**Power (delta compression) **: 

```
1.232093 | 1.198753 | 1.162414
```

By applying the reverse mechanism as the figure, we can see ** the next position value in Energy** (from left to right) is ** the current position value in Energy** + ** the current position value in Power**,i.e. 

$Energy_{2} = Energy_{1} + Power_{1}$, 

e.g.:

1626.518802 = 1625.286709 + 1.232093

### 3.3 Power at -minute- Resolution

Having observe how delta-compression works at **hourly** granularity, we can now turn attention to Delta Compression at minute granularity. Although we would expect the mechanic to be the same, it unfortunately does not. To see what happens, let's perform the same example, but at **minute** granularity: We will consider the **original time window** explained in the Notebook, as well as show the equivalent for **Energy** at the minute level. 

First, let's repeat the table from the original notebook, for easiness of comparison. 

In [251]:
# As originally used in the first request by the notebook

iso_start_datetimez = '2017-09-13T15:19:00-10'
iso_end_datetimez = '2017-09-13T15:25:00-10'

start_timestamp = arrow.get(iso_start_datetimez).timestamp
end_timestamp = arrow.get(iso_end_datetimez).timestamp

time_window = {'t': start_timestamp, 'f': end_timestamp}



host = 'http://egauge{}.egaug.es/cgi-bin/egauge-show?'
host = host.format(sensor_id) + '&' + minutes + '&' + output_csv + '&' + delta_compression 
r = requests.get(host,params=time_window)
string_csv_file = r.text    
df = pd.read_csv(StringIO(r.text))
df

Unnamed: 0,Date & Time,Usage [kW],Generation [kW],Other [kW],Water heater [kW],Whole house [kW],ACC [kW],AHU [kW],Dryer [kW],Range [kW],Kitchen refridgerator [kW]
0,1505352240,-0.0,-0.0,0.47775,0.069967,1.95325,1.252933,0.138517,0.012717,0.000867,0.0005
1,1505352180,-0.0,-0.0,0.477533,0.100517,1.976467,1.25335,0.139433,0.004317,0.000833,0.000483
2,1505352120,-0.0,-0.0,0.477367,0.063583,1.978717,1.243667,0.1396,0.001483,0.0009,0.052117
3,1505352060,-0.0,-0.0,0.534883,0.003533,2.026917,1.24665,0.139833,0.0015,0.000917,0.0996
4,1505352000,-0.0,-0.0,0.54565,0.003567,2.0377,1.246667,0.139633,0.0015,0.00085,0.099833
5,1505351940,-0.0,-0.0,0.549133,0.00355,2.042467,1.24755,0.139583,0.001483,0.000917,0.10025


### 3.4 Energy at -minute- Resolution

In [252]:
host = 'http://egauge{}.egaug.es/cgi-bin/egauge-show?'
host = host.format(sensor_id) + '&' + minutes + '&' + output_csv # No delta-compression
r = requests.get(host,params=time_window)
string_csv_file = r.text    
df = pd.read_csv(StringIO(r.text))
df

Unnamed: 0,Date & Time,Usage [kWh],Generation [kWh],Other [kWh],Water heater [kWh],Whole house [kWh],ACC [kWh],AHU [kWh],Dryer [kWh],Range [kWh],Kitchen refridgerator [kWh]
0,1505352300,0.0,0.0,911.864649,40.613186,3361.452666,1624.559923,178.396924,323.931772,207.846867,74.239346
1,1505352240,0.0,0.0,911.856686,40.61202,3361.420112,1624.539041,178.394615,323.93156,207.846852,74.239338
2,1505352180,0.0,0.0,911.848727,40.610344,3361.387171,1624.518152,178.392291,323.931488,207.846838,74.23933
3,1505352120,0.0,0.0,911.840771,40.609285,3361.354192,1624.497424,178.389965,323.931463,207.846823,74.238461
4,1505352060,0.0,0.0,911.831856,40.609226,3361.320411,1624.476647,178.387634,323.931438,207.846808,74.236801
5,1505352000,0.0,0.0,911.822762,40.609166,3361.286449,1624.455869,178.385307,323.931413,207.846794,74.235137
6,1505351940,0.0,0.0,911.81361,40.609107,3361.252408,1624.435076,178.382981,323.931388,207.846779,74.233467


Let's consider again the same example as before on delta-compression for the **ACC (kWh)** column on both tables. As such, we would expect: 

$Energy_{2} = Energy_{1} + Power_{1}$.

However:

$ 1624.455869 \neq 1624.435076 + 1.247550 = 1,625.682626 $


What is going on? 

The correct equation is now:

$Energy_{2} = Energy_{1} + Power_{1}/60$.

While the reason is not written in the documentation, the current belief is the parameter `m` (which specifies resolution in **minutes**) does **not** work for **Power**, and as such, dividing the **hourly power** by **60** properly adjust (under the assumption power is held constant) the energy increment by the minute. 

In conclusion, if you are using the API data at minute level, and intend to use the data for the purposes of calculating **Energy (kWh)** at various time windows and/or translate it to **cost** from it:

** Remember to divide by 60 before summing it in different time windows!! **

## 4. XML

As an adenum, recall .csv is not the default format the data is exported, but **XML**. An example of the output is shown below, but the explanation of the output format is covered in the associated documentation, and will not be discussed here as it is not used by ERDL. 

In [260]:
import xml.dom.minidom

iso_start_datetimez = '2017-09-13T15:19:00-10'
iso_end_datetimez = '2017-09-13T15:20:00-10'

start_timestamp = arrow.get(iso_start_datetimez).timestamp
end_timestamp = arrow.get(iso_end_datetimez).timestamp

time_window = {'t': start_timestamp, 'f': end_timestamp}


host = 'http://egauge{}.egaug.es/cgi-bin/egauge-show?'
host = host.format(sensor_id) + '&' + minutes
r = requests.get(host,params=time_window)
string_xml_file = r.text    

In [261]:
import xml.etree.ElementTree as etree
from xml.dom import minidom

x = etree.fromstring(string_xml_file)
def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = etree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")
print(prettify(x))

<?xml version="1.0" ?>
<group serial="0x464feb35">
  

  <data columns="7" epoch="0x58b8529c" time_delta="60" time_stamp="0x59b9d940">
    
 
    <cname t="P">Water heater</cname>
    
 
    <cname t="P">Whole house</cname>
    
 
    <cname t="P">ACC</cname>
    
 
    <cname t="P">AHU</cname>
    
 
    <cname t="P">Dryer</cname>
    
 
    <cname t="P">Range</cname>
    
 
    <cname t="P">Kitchen refridgerator</cname>
    
 
    <r>
      <c>146192999</c>
      <c>12100631216</c>
      <c>5848041128</c>
      <c>642187105</c>
      <c>1166153087</c>
      <c>748248458</c>
      <c>267246495</c>
    </r>
    
 
    <r>
      <c>146192786</c>
      <c>12100508668</c>
      <c>5847966275</c>
      <c>642178730</c>
      <c>1166152998</c>
      <c>748248403</c>
      <c>267240480</c>
    </r>
    

  </data>
  

</group>



# 5. References

 * [1] See [eGauge XML API](https://www.egauge.net/docs/egauge-xml-api.pdf) for a exaustive list of available parameters for the eGauge API. Section 3, stored data, was used to create this Python Notebook.