-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regular reboots and failing auto updates #134
Comments
Very comprehensive report indeed! The log shows it trying S21 which will fail but it should cycle through some options within a few seconds and find the right one. So odd if taking a long time. I suspect if the rebooting is solved it won't be an issue though. Even so, it should have saved the protocol to use when it next boots. So it is unexplained. I suspect the reboot is the main problem. Can you connect a serial console and get a log of the reboot / crash when it happens. That will be the big clue as to what is wrong. There is both serial and USB on the back of the board. |
Oh. And yes. The upgrade is signed. You can set otahost to point to your server to upgrade to your build. |
Thanks for the pointers. When the unit is working it reports the protocol as S21 in the status message, which I believe is expected. Apparently the only difference between the S403 pinout and the S21 pinout is that the S403 is not isolated. There is an adaptor board available from the manufacturer to provide the S21 signals in the usual isolated fashion. On this point - I'll have to work out the best way to safely capture the data from the unit, no isolation means that I can't directly connect to a laptop. I'll see what I can work out. Just quickly, is there an explanation in the docs somewhere on what the reported "spi" value is that is broadcast as part of the state? When the unit goes through an "unstable patch" I do see this value jump around a bit. What is the best method for setting up a PCB that I have programmed myself to support the auto update features, or is this not really a supported configuration. |
Ah, OK, S403 must be S21 then, sorry, my mistake. So yes, good. The errors could be down to where it crashes, if part way through sending it could be "out of step" in the middle of some message exchange, and take a while to get sorted. And yes, I see the issue, if not isolated, ouch, OK. "spi" is the amount of external RAM (psram connected via SPI). There is internal RAM and external - either getting low is bad. You can set up the auto upgrade using your own host, the URL is just the name of the binary. But it won't load from my updated binaries as they are signed with my key. So you'd have to build whenever I issue new code. Not sure of a good way around that. |
I left a script running last to acquire logs via MQTT from the host, and it seems the periodic reboots are occurring as the OTA update is failing. The timestamps where the uptime resets line up with an attempt to apply an update, which fails and then reboots the device. I've attempted to turn this off this morning by setting |
Ah, bingo, yes, there is a catch for failing to auto update. If there is no update that is different. But any failure causes a reboot. Wow. Well spotted. |
Pull request is fine. Thanks. As for comms, that is odd. An unsupported message has a specific response and my code stops sending unsupported messages. That seems more like some sort of comms problem. It could be interference or more likely a marginal timing of some sort. It may be that some tweaks to the protocol handling are needed. The fact it takes long to recover is also a concern. It may be worth my adding a pause when there is an issue to help things that have got out of step somehow. |
I have issued code with a an extra part flush and pause on any error in S21. |
Thanks very much for that, I've just built and updated the firmware, will see how it goes over the next 12 or 24 hours. FWIW, I'm still seeing the regular |
That is odd, I hope it recovers faster. There is a debug setting and I think a dump setting which may offer more clues. |
Yep, one of the posts above includes the dump output that goes with that error. I'll log all the MQTT data overnight again and see if it yields any differences. I was also reflecting on your comment about timing - as I mentioned, I'm using the non isolated S403 port to get this data, I'm wondering if the few microseconds of difference in timing that may arise from not have optoisolators in the signal path may be a factor here. Is the timing of the protocol analysis likely to be that sensitive? |
The timeouts are quite long, having checked, so that seems less likely now. Other than playing with an oscilloscope it may be hard to tell. |
The bad length looks like a red herring, seems |
Ahh okay, thanks for that! I've not had a chance to go through my device log in detail yet, but the slightly modified code is definitely more reliable at reporting data. The longest period that I've seen is now 15 minutes between data points, which is substantially improved! I do think there may be something in the interference idea too, I'm slowly gaining confidence to deploy a second one of these so will be intrigued if the behaviour is the same. At the moment I'm a bit worried about the very high input impedance of the unprotected FET gate, wondering if some parallel capacitance and a board level pull up may help - have you tested anything like this? |
The input should be driven both ways by the air-con so I was not expecting any problems on that - the problem is the air-con seem to vary from model to model, and the only issue I had was actually with Tx to the Daikin which needed the current circuit. |
I've been following this thread with interest as I have a second AC with only the S403 port and I don't have the S21 adapter board (KRP413AB1S). From what I read you have connected via the S403 and it mostly works? I understand somewhat what non isolated means in electronics but I'm not clear what risks or additional precautions I should take if I wanted to try this. Have you just connected the Faikin / ESP32 directly in the same way as with the S21? |
Everyone please bet careful not to electrocute yourself. And don't blame me if you do! |
Haha, 😊 thanks. I'm not going to electrocute myself more likely I wondered if it's non isolated that mains voltage flows nearby / though so want to avoid a short circuit or damage AC or Faikin. |
Sorry for the delay in getting back to this - I think I have finally worked out a few more things to be useful! I think the periods that I am now missing a few minutes at a time are due to a bit of a quirk in the way I was logging MQTT data. I was only graphing data that was provided by the Also @matt-nz, I'll try and write up some more notes tomorrow on how the connection has worked with these units using the S403 port, but yes, it appears to work as well as the S21 port from what I can tell. I now have two different models connected with only the S403 port. The big difference on this port is that you don't have a direct +5V pin and it is NOT isolated. What this means is:
If you have any doubts at all please don't touch it - the "other" end of the S403 has a pin that is at +327VDC, so it is all very close to some very dangerous pins, and just to reiterate what @revk said - don't electrocute yourself! |
It would be interesting to know if it can work without the pull up, ie it was just a loose connection. My experience so far is the daikin has an internal pull up. The 5V pull up was done "just in case" there is a model without. But good write up. I may update the manuals to explain the non isolated port. So you have a pin out & picture? |
I'll grab some photos tomorrow from the two units that I have - they have slightly different connectors (because who needs standards). I'll also grab the exact model numbers to add to the wiki page too. I do also have some photos of how I have done the install too, one of the S403 connectors I had was a 2.0mm pitch connector that was particularly narrow, but I managed to get it working with a Dremel and a 2mm JST connector to start with. My own adventures have basically entirely been based on this thread: https://community.openenergymonitor.org/t/hack-my-heat-pump-and-publish-data-onto-emoncms/2551/99. Someone has documented the S403 pinout and provided high resolution images of the Daikin conversion board in it. |
Ah, nice. |
Good Day,
Just wondering if you were able to provide some pointers on how to diagnose this one.
FWIW the unit is connected to a CTXM25RVMA unit via the S403 connector, powered from the +14V pin.
I've got a few behaviours that aren't quite right:
The unit sometimes just fails to connect, but if left alone will apparently come good after a few hours. During this time the Web UI is accessible and MQTT is up, it just polls awaiting a response. Even a complete power cycle of the unit doesn't seem to convince it to connect.
Frequent logging of
error/spareroomac/comms {"protocol":"S21","badlength":"3","command":"SD","data":"303030"}
. I assume this is just an unsupported command that hasn't timed out yet, possible related to the frequent reboots?Frequent Reboots:
![image](https://private-user-images.githubusercontent.com/5158705/278852726-79608b35-0603-4340-b0dc-2bf43b2dcbcd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk4MjM2MzcsIm5iZiI6MTcxOTgyMzMzNywicGF0aCI6Ii81MTU4NzA1LzI3ODg1MjcyNi03OTYwOGIzNS0wNjAzLTQzNDAtYjBkYy0yYmY0M2IyZGNiY2QucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcwMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MDFUMDg0MjE3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTA4NjczM2UyNjhlOWRmOGNhZGM5ZGJjMzRmMTg1YTg2YTJjNDY5Njk1NWQ4YzE1ZjFlMGM3M2RjNWM5ZGRlNCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.y71jecCwNSNOQVwibv07WB03oD8XDzbBRrZ7JN-N1Gg)
This is a plot of the reported uptime values from the unit. The unit only seems to report them very infrequently, though it appears to reliably report the initial near zero value reliably?
This includes the reported memory usage, so it doesn't appear to be running out of memory?
Hardware Version
![image](https://private-user-images.githubusercontent.com/5158705/278852811-09055858-ae46-4a58-921c-37983303f8ec.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk4MjM2MzcsIm5iZiI6MTcxOTgyMzMzNywicGF0aCI6Ii81MTU4NzA1LzI3ODg1MjgxMS0wOTA1NTg1OC1hZTQ2LTRhNTgtOTIxYy0zNzk4MzMwM2Y4ZWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcwMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MDFUMDg0MjE3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDE1MmFhZDU3OTFlMGQ1YTA3YmFmZTg0MWY5MGM1ZTNhZWM1NzA2ZTA1MmE4MWQ5ZWI2ZGUyNTkwNDJjNTcwOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Z5jMI7mVGKvrzHPN3L3YTAeY7izVqEGPISi43JsTGRc)
Relatively recent files from git ordered from JLC directly:
Software Version
Built using
make s3
from commit1272a2fdd7793b5dcdab628740e00f53368a868c
. I've tried issuing an update command to force an upgrade to the current version on the cloud but I get an errorESP_ERR_OTA_VALIDATE_FAILED
- is this related to having to generate a self signed certificate as part of setting up the environment for themake
script to run? Or possibly due to using a newer version of the firmware than is currently published?Thanks for any help you are able to provide, and additionally thank you for what appears to be a most excellent project!
The text was updated successfully, but these errors were encountered: