New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SG: don't fail when unable to OCR #1281
Changes from 2 commits
de72d36
2469f63
d6e9be1
3e39d5f
42bb016
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
#!/usr/bin/env python3 | ||
|
||
from collections import defaultdict | ||
import logging | ||
import re | ||
|
||
import arrow | ||
from PIL import Image | ||
from pytesseract import image_to_string | ||
import re | ||
import requests | ||
|
||
TIMEZONE = 'Asia/Singapore' | ||
|
@@ -40,18 +41,10 @@ | |
|
||
For Electricity Map, we map CCGT and GT to gas, and ST to "unknown". | ||
|
||
There appears to be no real-time data for solar production. | ||
Installed solar has been rising rapidly, but from a very low base. | ||
Per Singapore Energy Statistics 2016 pg 96, total installed solar PV capacity at end of 2015 was 45.8 MWac. | ||
Per https://www.ema.gov.sg/cmsmedia/Publications_and_Statistics/Statistics/47RSU.pdf | ||
total installed solar capacity in 2017Q1 was 129.8 MWp / 99.9 MWac, so capacity doubled during 2016. | ||
However, when producing at max capacity this would only be about 2% of summer mid-night demand of around 5 GW. | ||
So for now this won't introduce a big inaccuracy. | ||
|
||
There exists an interconnection to Malaysia. Its capacity is apparently 2x 200 MW, | ||
which is potentially 5-10% of normal use. I was unable to find data on its use | ||
(not even historical, let along real-time). The Singapore Energy Statistics 2016 document | ||
does not note any electricity exports or imports. | ||
The Energy Market Authority estimates current solar production and publishes it at | ||
https://www.ema.gov.sg/solarmap.aspx | ||
|
||
There exists an interconnection to Malaysia, it is implemented in MY_WM.py. | ||
""" | ||
|
||
TYPE_MAPPINGS = { | ||
|
@@ -61,52 +54,61 @@ | |
} | ||
|
||
|
||
def get_solar(session=None): | ||
def get_solar(session, logger): | ||
""" | ||
Fetches a graphic showing estimated solar production data. | ||
Uses OCR (tesseract) to extract MW value. | ||
Returns a float or None. | ||
""" | ||
|
||
s = session or requests.Session() | ||
url = 'https://www.ema.gov.sg/cmsmedia/irradiance/plot.png' | ||
solar_image = Image.open(s.get(url, stream=True).raw) | ||
solar_image = Image.open(session.get(url, stream=True).raw) | ||
|
||
gray = solar_image.convert('L') | ||
threshold_filter = lambda x: 0 if x<77 else 255 | ||
threshold_filter = lambda x: 0 if x < 77 else 255 | ||
black_white = gray.point(threshold_filter, '1') | ||
|
||
text = image_to_string(black_white, lang='eng') | ||
|
||
pattern = r'Est. PV Output: (.*)MWac' | ||
val = re.search(pattern, text, re.MULTILINE).group(1) | ||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Much better handling, we shouldn't throw away data just because solar is not working. |
||
pattern = r'Est. PV Output: (.*)MWac' | ||
val = re.search(pattern, text, re.MULTILINE).group(1) | ||
|
||
time_pattern = r'\d+-\d+-\d+\s+\d+:\d+' | ||
time_string = re.search(time_pattern, text, re.MULTILINE).group(0) | ||
except AttributeError: | ||
msg = 'Unable to get values for SG solar from OCR text: {}'.format(text) | ||
logger.warning(msg, extra={'key': 'SG'}) | ||
return None | ||
|
||
time_pattern = r'\d+-\d+-\d+\s+\d+:\d+' | ||
time_string = re.search(time_pattern, text, re.MULTILINE).group(0) | ||
solar_dt = arrow.get(time_string).replace(tzinfo='Asia/Singapore') | ||
singapore_dt = arrow.now('Asia/Singapore') | ||
diff = singapore_dt - solar_dt | ||
|
||
# Need to be sure we don't get old data if image stops updating. | ||
if diff.seconds > 3600: | ||
print('Singapore solar data is too old to use.') | ||
msg = ('Singapore solar data is too old to use, ' | ||
'parsed data timestamp was {}.').format(solar_dt) | ||
logger.warning(msg, extra={'key': 'SG'}) | ||
return None | ||
|
||
# At night format changes from 0.00 to 0 | ||
# tesseract cannot distinguish singular 0 and O in font provided by image. | ||
# This try/except will make sure no invalid data is returned. | ||
try: | ||
solar = float(val) | ||
except ValueError as err: | ||
except ValueError: | ||
if len(val) == 1 and 'O' in val: | ||
solar = 0.0 | ||
else: | ||
print("Singapore solar data is unreadable - got {}.".format(val)) | ||
solar = None | ||
msg = "Singapore solar data is unreadable - got {}.".format(val) | ||
logger.warning(msg, extra={'key': 'SG'}) | ||
return None | ||
else: | ||
if solar > 200.0: | ||
print("Singapore solar generation is way over capacity - got {}".format(val)) | ||
solar = None | ||
msg = "Solar generation is way over capacity - got {}".format(val) | ||
logger.warning(msg, extra={'key': 'SG'}) | ||
return None | ||
|
||
return solar | ||
|
||
|
@@ -159,7 +161,8 @@ def sg_data_to_datetime(data): | |
return data_datetime | ||
|
||
|
||
def fetch_production(zone_key='SG', session=None, target_datetime=None, logger=None): | ||
def fetch_production(zone_key='SG', session=None, target_datetime=None, | ||
logger=logging.getLogger(__name__)): | ||
"""Requests the last known production mix (in MW) of Singapore. | ||
|
||
Arguments: | ||
|
@@ -202,12 +205,12 @@ def fetch_production(zone_key='SG', session=None, target_datetime=None, logger=N | |
|
||
else: | ||
# unrecognized - log it, then add into unknown | ||
print( | ||
'Singapore has unrecognized generation type "{}" with production share {}%'.format( | ||
gen_type, gen_percent)) | ||
msg = ('Singapore has unrecognized generation type "{}" ' | ||
'with production share {}%').format(gen_type, gen_percent) | ||
logger.warning(msg) | ||
generation_by_type['unknown'] += gen_mw | ||
|
||
generation_by_type['solar'] = get_solar(session=None) | ||
generation_by_type['solar'] = get_solar(requests_obj, logger) | ||
|
||
# some generation methods that are not used in Singapore | ||
generation_by_type.update({ | ||
|
@@ -216,12 +219,16 @@ def fetch_production(zone_key='SG', session=None, target_datetime=None, logger=N | |
'hydro': 0 | ||
}) | ||
|
||
source = 'emcsg.com' | ||
if generation_by_type['solar']: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice idea. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if it's too clever. When there is no solar production, the data that solar is 0 also comes from EMA... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe both should be included by default? |
||
source += ', ema.gov.sg' | ||
|
||
return { | ||
'datetime': sg_data_to_datetime(data), | ||
'zoneKey': zone_key, | ||
'production': generation_by_type, | ||
'storage': {}, # there is no known electricity storage in Singapore | ||
'source': 'emcsg.com' | ||
'source': source | ||
} | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, since this only used once maybe it should be
from logging import getLogger
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is a style thing (there is no perf difference). My very personal reason for liking
logging.getLogger()
is because I don't have to scroll up to see ifgetLogger
was imported and from which module, or maybe a function somewhere within this file. To give an extreme example, if we were to dofrom re import search
, it's impossible for someone diving straight into the code to guess whatsearch
is without reading all imports.I don't think we have particular consistency in the codebase on this right now. At the same time I myself did the
from collections import defaultdict
right above :DThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current codebase:
import logging
9 times including SG (and in all cases it is only used forgetLogger
),from logging import getLogger
7 times. So if I change it, it'll be a tie. Do we have a style guide? :DThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I thought there was a difference, never mind then it doesn't matter. If there was an EM style guide it would simply read "Chaos is good". 😈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Carbon is bad, chaos is good"