# Retrieve domestic wind turbine generation data 

<img src='../images/Evance_9000_WindTurbine.JPG' align='right' alt='Evance R9000 wind turbine and rainbow'>
Orkney Rewnewable Energy Forum (OREF) http://www.oref.co.uk/ own and run an Evance R9000 5kW domestic wind turbine.

The production data for this turbine has been recorded since 2012 and is publically available on [sunnyportal.com](https://www.sunnyportal.com/Templates/PublicPageOverview.aspx?plant=7c8677c0-4e37-42d6-a115-d356f13a3120)

There is no API for the data so it must be screenscraped from the relevant webpage that displays the data in tabular form.

https://www.sunnyportal.com/Templates/PublicChartValues.aspx?ID=aa6b2b6e-f836-4ff5-b90c-91754955b988&endTime=21/05/2019%2010:59:59&splang=en-GB&plantTimezoneBias=60&name=

The url is made up of a unit id `ID=aa6b2b6e-f836-4ff5-b90c-91754955b988` and a datetime `endTime=21/05/2019%2023:59:59` marking the end of the dataset required.

* The returned data spans from midnight two days prior to the requested day until midnight of the requested day.
* Note that the final value is 00:00 of the following day rather than 24:00 of the requested day.
* Note also that ony the date of the datetime submitted in the url is used by the website, the time value has no effect. The url also works with only a date submitted.

an example of returned data is shown below

<table style="border-collapse:collapse;" cellspacing="0" cellpadding="0">
	<tbody>
        <tr>
            <td></td>
            <td>Quarry House<br>Power<br>Mean values  [kW]</td>
            <td>Quarry House<br>Total yield<br>Meter change  [kWh]</td>
        </tr><tr>
		<td>01:00/ 20</td><td>0.014</td><td>0.009</td></tr><tr>
		<td>02:00/ 20</td><td>0.072</td><td>0.064</td></tr><tr>
		<td>03:00/ 20</td><td>0.005</td><td>0.010</td></tr><tr>
        <td>...</td><td>...</td><td>...</td></tr><tr>
		<td>21:00/ 20</td><td>0.000</td><td>0.000</td></tr><tr>
		<td>22:00/ 20</td><td>0.000</td><td>0.000</td></tr><tr>
		<td>23:00/ 20</td><td>0.003</td><td>0.001</td></tr><tr>
		<td>00:00/ 21</td><td>0.058</td><td>0.062</td></tr><tr>
		<td>01:00/ 21</td><td>0.159</td><td>0.203</td></tr><tr>		
        <td>...</td><td>...</td><td>...</td></tr><tr>
		<td>20:00/ 21</td><td>2.378</td><td>2.366</td></tr><tr>
		<td>21:00/ 21</td><td>2.425</td><td>2.322</td></tr><tr>
		<td>22:00/ 21</td><td>2.025</td><td>0.024</td></tr><tr>
		<td>23:00/ 21</td><td></td><td></td></tr><tr>
		<td>00:00/ 22</td><td></td><td></td></tr>
</tbody></table>

In [1]:
#Setup the kernel with the required libraries
# and set some constants

import urllib.request, json
import datetime
from bs4 import BeautifulSoup
import re

URL ="https://www.sunnyportal.com/Templates/PublicChartValues.aspx\
?ID=aa6b2b6e-f836-4ff5-b90c-91754955b988\
&splang=en-GB\
&plantTimezoneBias=0\
&name=\
&endTime="

recDate =datetime.datetime(2019, 5, 12, 23, 59, 59)

In [2]:
#get the page
callurl = URL+recDate.strftime('%Y/%m/%d%%20%H:%M:%S')
with urllib.request.urlopen(callurl) as url:
  page = url.read()

In [3]:
# display an example of the page data
page[3000:5000]

b' = function onKeyPress(e) {\n   if (e.which == 13) {\n\tif (!isMultiLineTextElement(e.target) && !isAnchor(e.target)) {\n\t\te.cancelBubble = true;\n\t\te.returnValue = false;\n\t}\n\t}\n}\n\tfunction isMultiLineTextElement(elem) {\n\t\treturn (elem.tagName == "TEXTAREA");}\n\tfunction isAnchor(elem) {\n\t\treturn (elem.tagName == "A");}\n// -->\n</script>\n\r\n            \r\n    <div id="content" style="border:0;padding:0;margin:5px">\r\n        <div style="float:left">\r\n            <h2><span id="ctl00_ContentPlaceHolder1_title">Diagram values</span></h2>\r\n        </div>\r\n        <div style="float:right">\r\n            <div id="ctl00_ContentPlaceHolder1_MyDivClose" style="FLOAT: right; MARGIN-RIGHT: 15px; text-align:right;"><a href="javascript:window.close()"><img src="../Images/window-close16.png" width="16" height="16" alt="" title="Close"/></a></div>\r\n            <div id="ctl00_ContentPlaceHolder1_MyDivPrint" style="FLOAT: right; MARGIN-RIGHT: 15px; text-align:right;"><

The page contains a single table, easily identified with `<table> </table>` tags.  Each row of the table, within `<tr> <\tr>` tags, represents a single record.  The 'BeautifulSoup' library is used to strip this information out. 

In [4]:
soup = BeautifulSoup(page, 'html.parser')
DataTable = soup.find('table',)
rows = DataTable.find_all('tr')

In [5]:
# display an example of row data
rows[:5]

[<tr>
 <td align="right" class="base-grid-header-cell"></td><td align="right" class="base-grid-header-cell">Quarry House<br/>Power<br/>Mean values  [kW]</td><td align="right" class="base-grid-header-cell">Quarry House<br/>Total yield<br/>Meter change  [kWh]</td>
 </tr>, <tr class="base-grid-item">
 <td class="base-grid-item-cell">01:00/ 11</td><td align="right" class="base-grid-item-cell"></td><td align="right" class="base-grid-item-cell"></td>
 </tr>, <tr class="base-grid-item-alternate">
 <td class="base-grid-item-cell">02:00/ 11</td><td align="right" class="base-grid-item-cell">0.307</td><td align="right" class="base-grid-item-cell">0.078</td>
 </tr>, <tr class="base-grid-item">
 <td class="base-grid-item-cell">03:00/ 11</td><td align="right" class="base-grid-item-cell">0.029</td><td align="right" class="base-grid-item-cell">0.001</td>
 </tr>, <tr class="base-grid-item-alternate">
 <td class="base-grid-item-cell">04:00/ 11</td><td align="right" class="base-grid-item-cell">0.000</td>

The values within each of the rows can be retrieved

In [6]:
for row in rows[1:10]:
  entry = row.find_all('td')
  print(entry[0].text,entry[1].text, entry[2].text)

01:00/ 11  
02:00/ 11 0.307 0.078
03:00/ 11 0.029 0.001
04:00/ 11 0.000 0.028
05:00/ 11  
06:00/ 11  
07:00/ 11 0.257 0.025
08:00/ 11 0.308 0.511
09:00/ 11 0.593 0.479


There are a few point to note:  
* This text data needs to be converted to date and numeric types:
* The first column of date and time is in an unusual format consisting of the time followed by the day of month. The month and year are not present anywhere on the page and so must be inferred from the date used in the query.
* Some periods of inactivity result in missing values: these need to be converted to zero (of type float): '0.0' 

It would perhaps be expected that the complete date for each record could be generated for the returned two days of data using a simple formula such as:

In [7]:
# this simple formula does not allow for date ranges that span the month begining/end
#  e.g. when recDate is 2019-04-01 the previous date is 2019-03-31  
print(datetime.date(recDate.year,recDate.month,int('09')))

2019-05-09


However, the above formula will return erroneous values when the date used in the url requested is either the first or last day of the month. In these cases, due to the data spanning three days, the preceding or following day will be in a different month to the date requested. The above formula does not allow for this.
Hence a more thorough algorithm is used:

In [8]:
# function to convert 2 digit date(datestring)) scraped from table
# to datetime based on supplied date (inputDate)
# timedelta is used to ensure correct month is rendered.

def datestamp(inputDate, datestring):
    prevdate = inputDate - datetime.timedelta(days=1)
    nextdate = inputDate + datetime.timedelta(days=1)
    
    if (int(datestring)== prevdate.date().day): returndate=prevdate
    elif (int(datestring)== nextdate.date().day): returndate=nextdate
    else: returndate = inputDate
        
    return returndate

In [9]:
#function to convert the unusual format datetime(datestring) scraped from table
# to a conventional format datetime string
# based on the supplied date (inputDate)

def datestamp_convert(inputDate, datestring):
    day = datestring.text[7:9]
    time = datestring.text[:5]
    
    return str(datestamp(inputDate, day).date())+' '+time

In [10]:
# convert power readings to float and fill any gaps with 0.0 

def textToFloat(string):
    if (string):
        return float(string)
    else:
        return 0.0

In [11]:
data=[]

for row in rows[1:]:
  entry = row.find_all('td')
  timestamp = datestamp_convert(recDate, entry[0])
  data.append((timestamp, textToFloat(entry[1].text),textToFloat(entry[2].text)))

In [12]:
# display example of resultant data
data[20:25], data[45:]

([('2019-05-11 21:00', 0.077, 0.071),
  ('2019-05-11 22:00', 0.001, 0.001),
  ('2019-05-11 23:00', 0.016, 0.018),
  ('2019-05-12 00:00', 0.002, 0.0),
  ('2019-05-12 01:00', 0.0, 0.0)],
 [('2019-05-12 22:00', 0.0, 0.0),
  ('2019-05-12 23:00', 0.165, 0.21),
  ('2019-05-13 00:00', 0.531, 0.625)])

## Conclusion
This data is now fully processed and can be analysed further or stored in a database, for example.

By repeatedly calling the URL for different, sequential (2days) dates the required amount of data can be obtained. 