Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From Emporia: Users of PyEmVue need to avoid too many fetches or repeatedly fetching the same data #19

Open
TedGrahamEmporia opened this issue Jun 15, 2021 · 47 comments

Comments

@TedGrahamEmporia
Copy link

Dear Customers:

We appreciate the enthusiasm for Emporia devices but a minority of customers are causing a majority of the load on our servers. Some users of the PyEmVue library are requesting years of Monthly usage data every 2 seconds. That is unnecessary since previous month data won’t change and current month data only changes every hour.

We aren’t cutting off PyEmVue completely, but we plan to introduce limits in the cloud to prevent a few users from overloading the cloud. In the meantime, please be respectful by fetching data less frequently and avoiding re-fetching unchanging data.

best,
Ted

@magico13
Copy link
Owner

Good afternoon Ted,

Thank you for your understanding and I'm sorry that people have been abusing the API. I would love to change this library over to an official API with support for API tokens and associated rate limits so I'd like to hear from Emporia if that's something in the works/estimated timelines. If you notice any other issues as a result of this library please let me know so I can address them where possible, the community definitely appreciates the work Emporia has done and the products you make.

Thank you,
Mike

@jertel
Copy link

jertel commented Jun 15, 2021

Vuegraf uses this library to populate InfluxDB metrics. There are two API calls that are executed every cycle. By default each cycle occurs on 60 second intervals.

  1. get_devices_usage() - Called to fetch the list of channels in the account. The reason it's called on each cycle is to ensure that Vuegraf is aware of the latest list of devices. This could be loosened to only call once a day if it's contributing to the problem. However, the docs show that since I'm specifying a scale=DAY that it should only be going back for the past DAY and collecting the total usage over that period, not for years.

  2. get_chart_usage() - Called for each device channel, to collect the metric for each second since the last invocation. Theoretically this yields 60 data points per call since it's called once every 60 seconds.

One Vuegraf user, @andygodber, started a discussion a little over a week ago on asking for the ability to update more frequently: jertel/vuegraf#53. Specifically the request was for a 5 second update interval and I advised against it for the same reasons that Emporia is now reporting. Andy, can you comment on what interval you are using, and why you needed it more frequently than the default?

Thanks from me as well to Emporia for delivering this energy monitoring system. With it, PyEmVue, and Vuegraf, I've been able to detect freezer failures, pump failures, and other issues as they happen, instead of when it's too late.

@magico13
Copy link
Owner

magico13 commented Jun 15, 2021

@jertel you bring up a good question, which API call(s) are seeing this high usage @TedGrahamEmporia? I also maintain the Home Assistant integration which only makes calls to getDevicesUsage, usually once a minute for the minute, hourday, and month scales but optionally once a second for the 1 second scale. Those should only be requesting the single most recent interval, definitely not years worth of historical data, unless there's a bug in the code that's causing it to ask for more data.

I do plan on making the optional 1 second data polling less frequent within that integration (magico13/ha-emporia-vue#39) so if it looks like that's contributing to the problem I can move that feature up in priority.

@TedGrahamEmporia
Copy link
Author

TedGrahamEmporia commented Jun 15, 2021 via email

@TedGrahamEmporia
Copy link
Author

TedGrahamEmporia commented Jun 15, 2021 via email

@magico13
Copy link
Owner

Ted, I will certainly adjust those polling periods. Minor correction, I said hours but meant the day usage. It's very useful to be able to build dashboards like this (picture below) which requires polling data more than once per interval, but I definitely can reduce it to only a few times per interval. The way Home Assistant works you can't just make one request for an entire previous interval's usage and load that data in, it's gathered over time by polling but the polling frequency can be changed. It just means the plots will be less smooth and more of a step-function, the biggest issue is that reducing the polling frequency will impact the triggers for automations. For example, I have an automation that alerts me if my garage door opened based on the usage of my garage circuit.

I'll prioritize reducing the "instant" usage sensor especially. That one is disabled by default to discourage its use unless necessary but it is incredibly useful for triggering automations based on devices turning on/off. It's also roughly the same as leaving the app open on a wall mounted tablet in terms of API calls.

As a rough guide, do you have a target number of calls per customer you'd like us to get down to? I realize that "none" is the ideal answer and "as few as possible" is the next ideal answer but those aren't quite as actionable. Anyone using the API is going to be noticeably higher than a "typical" customer just occasionally checking the app but we definitely don't want to cause you guys issues, it's no good for any of us if we can't get the data anymore because we're cut off or something happens to you guys.

image

@TedGrahamEmporia
Copy link
Author

TedGrahamEmporia commented Jun 15, 2021 via email

@magico13
Copy link
Owner

do you think you can operate with 2k requests per day?

I'll definitely have to cut stuff back but it's better than nothing. That's basically one a minute with some additional overhead, so that's what I'll target for the Home Assistant integration. When the rate limit is hit will we get back a 429 and will there be a Retry-After header? If so I can see about adding something into the library itself to not allow any requests going out until the limit expires, so anything using an updated version of the library will behave better.

magico13 referenced this issue in magico13/ha-emporia-vue Jun 15, 2021
jertel added a commit to jertel/vuegraf that referenced this issue Jun 16, 2021
…ics due to request from Emporia to stop using the chart usage API. See magico13/PyEmVue#19 for more information.
@jertel
Copy link

jertel commented Jun 16, 2021

Based on this thread, and specifically the direct request from Ted at Emporia to cease usage of the chart usage API, I have refactored Vuegraf to stop calling the PyEmVue.get_chart_usage() method. All data in the latest version of Vuegraf will be derived from the PyEmVue.get_device_usages() method, which will be invoked, by default, every 60 seconds, resulting in 1440 API calls per day, which falls under the proposed "2k requests per day". This is unfortunate for users that need the finer per-second metrics to detect short-lived electrical anomalies that are tell-tale signs of equipment about to fail. However, given the above background from Ted, I see no alternative at this time.

Regarding the MainsFromGrid and MainsToGrid errors, @magico13 already handled this last month with logic that skips chart usage lookups for MainsToGrid/MainsFromGrid. But users that are still using an old copy of these projects will continue to issue those bad requests until they upgrade. See

if channel.channel_num in ['MainsFromGrid', 'MainsToGrid']:

Ted, thanks for reaching out to us in advance about the issues you are facing with the Emporia servers, and the upcoming mitigation changes.

@mabrowning
Copy link

@jertel does this still yield second-level historical granularity at minute-level intervals? Or is all data more frequent than 1 minute now inaccessible?

@jertel
Copy link

jertel commented Jun 16, 2021

Per-second data is now inaccessible.

@andygodber
Copy link

andygodber commented Jun 16, 2021

Vuegraf uses this library to populate InfluxDB metrics. There are two API calls that are executed every cycle. By default each cycle occurs on 60 second intervals.

  1. get_devices_usage() - Called to fetch the list of channels in the account. The reason it's called on each cycle is to ensure that Vuegraf is aware of the latest list of devices. This could be loosened to only call once a day if it's contributing to the problem. However, the docs show that since I'm specifying a scale=DAY that it should only be going back for the past DAY and collecting the total usage over that period, not for years.
  2. get_chart_usage() - Called for each device channel, to collect the metric for each second since the last invocation. Theoretically this yields 60 data points per call since it's called once every 60 seconds.

One Vuegraf user, @andygodber, started a discussion a little over a week ago on asking for the ability to update more frequently: jertel/vuegraf#53. Specifically the request was for a 5 second update interval and I advised against it for the same reasons that Emporia is now reporting. Andy, can you comment on what interval you are using, and why you needed it more frequently than the default?

Thanks from me as well to Emporia for delivering this energy monitoring system. With it, PyEmVue, and Vuegraf, I've been able to detect freezer failures, pump failures, and other issues as they happen, instead of when it's too late.

I confirm Im using 10s polling, but am only interested in real-time/ near real-time data - not the history, and do/did not intend to pull months back…..

@TedGrahamEmporia and others - would it be possible to make real-time available locally?
This could potentially eliminate any calls to your cloud infrastructure for this use case, as people would presumably store historic data on their own ‘infrastructure’ if they wanted to.
Im not sure what spare memory capacity is available on the Vue, but real-time would not need to be persistent.

@jertel
Copy link

jertel commented Jun 16, 2021

@andygodber I suggest upgrading to the latest version of Vuegraf, since this change shouldn't affect your use case, but will help reduce the requests to the Emporia servers. If you continue to override the default update interval as you are doing now, you will continue receiving data at that same custom interval you've set to 10s. The only difference is that the per-second data in between those updates will be unavailable, instead you would be receiving data every 10 seconds representing the watts used over the past minute. So you'll still have near real-time data. Yet instead of making 50k API requests per day (assuming multiple devices) you'll be making 8.6k. Still higher than the proposed 2k limit so you might still face throttling down the road, but in the meantime you will be helping the Emporia team out.

@kwdavidson
Copy link

Huh. Perhaps if we had local access to the Vue data, we wouldn't need to access the servers at all. I know Emporia has said that will never happen, but would it be worth reducing their hosting costs to allow those of us who know what we're doing to talk directly to the unit? Novel concept, I know...

@kwdavidson
Copy link

kwdavidson commented Jun 16, 2021

I just updated my Home Assistant to the latest version of the integration and the Vue is now essentially worthless to me. One-day resolution on individual circuits? I can't do anything with that. I was using the power draw on the washer and dryer circuits to sound alerts throughout the house when they finished. The whole reason I bought the unit was because of the Home Assistant integration. I'm ready to send the whole thing back now. Let us have local access to the data and get your servers out of my house.

@magico13
Copy link
Owner

magico13 commented Jun 16, 2021

@kwdavidson

I was using the power draw on the washer and dryer circuits to sound alerts throughout the house when they finished.

You can still do that with the 1 minute sensors, they are updated once a minute with the average draw for the last minute. So the dryer should go from fairly high usage one minute to basically zero usage the next minute when it's complete. The daily and monthly sensors are still there and update hourly but I don't think there are as many time-sensitive automations built around those.

Jertel runs a different application (Vuegraf) so his comments are about the changes he's made in his app, not with the Home Assistant integration, just in case there's any confusion there.

@kwdavidson
Copy link

You can still do that with the 1 minute sensors, they are updated once a minute with the average draw for the last minute.

I would be happy to do that. However, upon reinstalling the integration, the only entities I get are one day and one month usage for each channel. Am I missing something?

@magico13
Copy link
Owner

I would be happy to do that. However, upon reinstalling the integration, the only entities I get are one day and one month usage for each channel. Am I missing something?

If you're still not seeing them after you run through what I'm about to suggest then please open an issue in the integration's github to reduce cross chatter here (https://github.com/magico13/ha-emporia-vue/issues).

Delete and re-add the integration (doesn't have to be removed from HACS). When you go to log in again make sure you have the 1 minute sensors selected. You should see them now, if not try removing the integration, reinstall in HACS to the master version instead of 0.5.0 (minor bugfix merged in today), reboot Home Assistant, and try again. If still having issues, open up a new issue and I'll work with you on it there.

@kwdavidson
Copy link

Well that was weird. I had already deleted and re-added the integration and it only gave me the option of day and month sensors. When I did it this time, I had the option of minute sensors as well, plus it let me set areas for each sensor, which it didn't do the first time. Now it's working and I've update my automations. Thanks.

I still assert that we need local access to the data and dispense with having to go to a server hundreds or thousands of miles away to talk to something a few feet away.

@TedGrahamEmporia
Copy link
Author

I think you can satisfy both kinds of users.
a) for customers that want quick alerts, you can call getDevicesUsage once a minute
b) for customers that want charts, you can call getChartUsage once an hour and fetch an hour of data each time.

I was asking you to stop calling getChartUsage with MainsFromGrid, not to stop calling it completely.

@kpnobvious
Copy link

@jertel @magico13 Thank you both for making these changes so quickly. I've updated and seen the amount of new data points drop signficantly

@efficiencynerd
Copy link

@TedGrahamEmporia For what it's worth, this thread has changed my view of Emporia as a company and I probably won't buy your products again if this pattern continues. Of course I knew this was a potential problem buying your product from the beginning but I liked the price and the quality hardware, so I picked up a Vue 2 - but only because of the ability to get data into Home Assistant, including 1 second data. And 1 hour updates for daily usage is not enough for me, I need it more frequently to accurately track my time of use plan (which you still don't have in your app).

I get that the price of your hardware is reduced because (to my knowledge) your monetize my data, which I'm totally fine with - but why not also allow local access? Forgive me if there's been a previous discussion of this somewhere.

@kwdavidson - is it documented somewhere that Emporia said they would never allow local access, or is that just hearsay?

I suppose I may need to wait for someone to hack the hardware to get data locally...

@sstratoti
Copy link

I’m the opposite of @efficiencynerd - I think that the company reaching out and working with an outside developer is a GREAT sign that they’re willing to work to find a solution.

Other companies have just shutdown access to unsanctioned APIs in the past without warning. At least they’re trying to work with us to figure it out.

As for a local API, if we could maybe access the data a different way? Instead of them building an API in the software and expose end points, what about pushing the data out through an MQTT server? Basically offload the data requests to a local MQTT server? I think that might be a much more simplistic solution than trying to build and test a local API.

@kpnobvious
Copy link

I also agree with @sstratoti, i think it's great that Ted reached out to @magico13 who quickly addressed it and worked with @jertel to update PyEmVue.

I would be interested to hear what people need per second readings for that can't be accomplished via per minute. I think while the default per second was nice, as these tools got greater traction, its a lot of requests to AWS and i'm sure Amazon gets their piece of transactional costs.

I do wish they add MQTT support. That has been talked a lot about in the Emporia forums, but there have been very few replies from Emporia.

@efficiencynerd
Copy link

@sstratoti this is absolutely a great point, and why I would still say I'm impressed with Emporia and will be keeping my Vue for now. Just not happy about the direction it's going, though I absolutely do understand why. I'd just like to see a better solution. But I wholeheartedly agree that this is way better than most other companies would do.

As far as actually using the second data, I had an automation that keeps a running counter of the number of times my sump pump runs in a day. The sump pump spikes to a few hundred watts for 3-5 seconds, then shuts off again. It's on a circuit with a few other things in my basement, so I can't easily pick it out of the minute data, but I can from the second data.

As far as actually using the second data in Home Assistant to run automations, that's it for now. Admittedly not much, but there are some use cases for it (not ones that really warrant 80k/day server requests though).

@kb8nh
Copy link

kb8nh commented Jun 19, 2021

First off, thanks to Emporia for providing and supporting an API to allow pulling data, and thanks for the PyEmVue python routines to make it really easy. I have been developing a python application which pulls data hourly (by second) and stores it in a private MySQL database, and a companion python app which pulls data from this MySQL database and charts selected circuits over various timeframes (e.g. day, week, month, forever). I recently added a "live" mode which uses the get_devices_usage() routine to generate a bar chart display all of the circuits for a device, updating once a second. It is not intended to run for a long period of time, just long enough to quickly see what circuits are drawing power. When I run this on my system, which has both a Vue Gen1 with 8 circuits and Vue Gen2 with 16 circuits, everything is working fine. When I run the same program on my brother's system which has a single Vue Gen2 with 16 circuits, the get_devices_usage() is only returning 15 of the 16 circuits. Either the 8th or 9th circuit is not returning data. The second by second data which I pull hourly, using get_chart_usage() for each channel works fine and retrieves all of the data. [Except I'm seeing fairly frequent error 500 when I try to retrieve data, and I wait 30 seconds and try again]. Any ideas how to troubleshoot this problem?

@derekyle
Copy link

@TedGrahamEmporia Save yourselves and your customers countless bandwidth and open your devices for local access. It's pretty ridiculous that we are having to call remote servers for our local data anyway. Not to mention how we can't access our own local data when either your servers go down or our internet connection is out.

@wz2b
Copy link

wz2b commented Jun 21, 2021

As I said in another thread, local access doesn't have to be complicated - a once-a-second UDP broadcast to the local subnet with some sane format (key-value pairs, csv, something brief so it all fits in a single, short-ish packet and is human-readable so you can decode it with netcat). Then people who need high-rate data can just collect it and store it themselves.

@Blizaine
Copy link

I was using the 1-sec interval to capture how many times a low-power device (3W for 10 sec) would get used, using the smart plug. I set up a counter in HA. The use case was family SodaStream usage. I had I set to notify me when the CO2 tank was getting empty (based on X amount of times used). I've changed the config to use the 1 min interval. However, using 3W for 10 sec doesn't appear to be enough power usage to register any power change in the 1min pull. Any ideas? or am I SOL? Also, I completely understand the load this would put on the servers, but as @derekyle mentioned, open these devices up so we can pull them directly. thanks!

@sstratoti
Copy link

@Blizaine so you’re using their smart plug? If you got a WiFi, ZWave or Zigbee plug that connects to HA - that’s the only thing I can think of to replace it. :/

@TedGrahamEmporia
Copy link
Author

TedGrahamEmporia commented Jun 22, 2021 via email

@magico13
Copy link
Owner

magico13 commented Jun 22, 2021

@Blizaine I can look into if there's a way to load that data once an hour in Home Assistant in a way that's useful. I'm not sure if you can trigger on past data.

I have also seen people mention Tasmota wrt to the smart plugs.

@kpnobvious
Copy link

@efficiencynerd I too have a sump pump and may have the same issue, but it doesn't run that often
I think it would be fairly simple to add a loop to the deviceGIDs, and some channels for the get_devices_usage could be on the per minute scale, others like the soda streams\sump pumps could be on a per second scale.

However it appears that most of the issues have been resolved and people just need to update, which unfortunately i think is usually a manual effort.

Maybe @TedGrahamEmporia could comment, but was the AWS issue more just based on volume of get_devices_usage calls for per second data, or is there an issue also with calling for devices that have no data. For example, i have a Vue2 with 16 channels, but only 9 are monitoring a circuit. I make the get_devices_usage call for data on all 16 channels (regardless the scale), and many of those calls will return no data.

@TedGrahamEmporia
Copy link
Author

TedGrahamEmporia commented Jun 22, 2021 via email

@wz2b
Copy link

wz2b commented Jun 22, 2021

Do we know how this data is stored? Is it blocks (in the spirit of prometheus) in S3, or DynamoDB, or RDS, or something like that?

@kb8nh
Copy link

kb8nh commented Jun 22, 2021

Regarding my earlier post about missing a channel of data when using get_devices_usage(), it turned out to be a loose connection on the plug into the Vue. Strange that there were no problems pulling the data by channel, but the get_devices_usage() only returned 15 of the 16 channels. Difficult to figure out which channel was missing from that data alone. Going back and looking at the longer term history I had collected, I was able to figure out which channel was missing.
Thanks again.

@pelgv
Copy link

pelgv commented Jun 27, 2021

@magico13 @jertel

I hope this gets implemented. Although real-time data is useful to do triggers (which I think can be accomplished on some cases with average per minute data) Having 1-second data even if it is in an hourly basis (with the Chart) is also helpful to see trends of equipment at home.

@TedGrahamEmporia it would be so nice if there was a way of getting the data locally even if it was only the per second data with little formatting and or little options. (like just getting all channels at the same time with a dump into a port or something).

Thank you all for all your work!

I think you can satisfy both kinds of users.
a) for customers that want quick alerts, you can call getDevicesUsage once a minute
b) for customers that want charts, you can call getChartUsage once an hour and fetch an hour of data each time.

I was asking you to stop calling getChartUsage with MainsFromGrid, not to stop calling it completely.

@jertel
Copy link

jertel commented Jun 28, 2021

@TedGrahamEmporia's comment below is the reason that I removed the second data from Vuegraf:

We would appreciate you making the second data polling less frequent or even removing that from your library.

However, Ted has since stated that this is not the case, and polling for second data is ok, provided it's at infrequent intervals.

b) for customers that want charts, you can call getChartUsage once an hour and fetch an hour of data each time.
I was asking you to stop calling getChartUsage with MainsFromGrid, not to stop calling it completely.

Because of this, and because of my own and others' need for second data, I have re-added this data collection back to Vuegraf in the latest build. Vuegraf will now collect the previous hours' second data (roughly 60k data points) once per hour, and back fill those data points into the database at their historic times.

@billtown
Copy link

Personally,
Why not allow us to receive and or process the data ourselves. I was considering one of these devices, and landed here out of curiosity, to see a company complaining about it's api calls being used, and a bunch of people that want to get their data out of the cloud that the company assumed was needed/wanted.

If I got one of these, I'd enjoy the hardware, but this data model just seems wrong. Which is rather apparent, if your aws bills are high only from people that don't want their data there.

Is there a method to poll the device directly, or receive the data directly, instead of this assumed internets, always on, needless cloud (for many users), and imposed control of data? I'm 5 minutes in, but would this be a good time to ask, can this product be used with local network only? Or is this an expensive afterthought?

@cumanzor
Copy link

Ted, I will certainly adjust those polling periods. Minor correction, I said hours but meant the day usage. [...]
image
@magico13 that's a really nice dashboard, do you have a guide on how to set up something similar? I also want to commend @TedGrahamEmporia here, most companies would just go ahead and disallow access to the API when something like this happens, thanks for trying to help the devs come up with sensible solutions.

@matfra
Copy link

matfra commented Aug 12, 2021

@TedGrahamEmporia Thank you for engaging with @magico13 and the community to find a solution to the API overload. One alternative solution to this problem would be to offer a local interface on the device itself so that the data could be polled directly on the LAN in near realtime. A good way of exposing metrics nowadays would be via the Prometheus protocol but anything else txt or json would work. It wouldn't be long until the home assistant integration and other various projects integrate this.
Happy to help testing any new beta firmware, etc...

@ChrisRomp
Copy link

Lack of a local interface is what's keeping me from buying a bunch of their stuff. I'm probably not alone there.

jeremy-compostella added a commit to jeremy-compostella/home-manager that referenced this issue Aug 24, 2021
This is related: magico13/PyEmVue#19

Signed-off-by: Jeremy Compostella <jeremy.compostella@gmail.com>
jeremy-compostella added a commit to jeremy-compostella/home-manager that referenced this issue Aug 24, 2021
In order to comply with magico13/PyEmVue#19,
the service will only poll the by second data if it is actively
charging.

Signed-off-by: Jeremy Compostella <jeremy.compostella@gmail.com>
@JuliensLab
Copy link

@TedGrahamEmporia Thanks so much for engaging with the community. The app and device are really fantastic.

I have 2 needs:

  1. to balance our 3 phases (for this, the app + data export are sufficient), and
  2. to control the charging current of our EV dynamically, following available power. For this I need 1-second API access to all 16 sensors + 3 phase mains.

I would prefer to get data locally to limit the time lag and to be insensitive to any internet connection failure. But there is also value for us when using the app remotely.
Also, we're all aware that you need our data in the cloud for monetization. You could activate a local data broadcasting only when data is successfully pushed to the cloud, ensuring that customers don't bypass your cloud-based business plan.
Alternatively, I'm ready to pay for a premium subscription.

Thanks.

@cybernard
Copy link

https://flaviutamas.com/2021/reversing-emporia-vue-2

HOWEVER, this is when your device is NEW and not yet configured.

This documents that if have a wifi of emporia and password emporia123 the device will connect to it and send power data to 192.168.1.101 it will broadcast power data locally.
This could be emulated on a raspberry pi.

@derekyle
Copy link

Emporia didn't want to offer local API access, and then when everyone started downloading from their cloud, they couldn't handle the load. So someone reverse engineered it and now you can flash your device and cut emporia out completely. Works like a charm. https://gist.github.com/flaviut/93a1212c7b165c7674693a45ad52c512

@cybernard
Copy link

yup, but I am really trying to avoid opening it up to flash it. Also because it involves solidering.

@beamfarms
Copy link

Sorry if this has been brought up but has anyone noticed that they are now doing some kind of averaging of the second data (they started this months ago) so if I'm understanding this right if I pull seconds data once an hour its is going to be this averaged data which will not be use full for me seeing the spikes from motors starting. I guess I will have to open it up one day and do the firmware thing but I currently have 4 and that is a lot of time, I guess after I figure out the first one it shouldn't be that bad to do the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests