Skip to content

Conversation

heisenberg2980
Copy link

Commenting line 283 out to avoid the issue where switchbot stops responding until HA restart.
If the switchbot is in disc state, it tries to read a response but it will never get one. By removing disc from the while loop, it will fall into BTLEDisconnectError exception and retry properly the next time around, without any deadlocks.

Commenting line 283 out to avoid the issue where switchbot stops responding until HA restart. 
If the switchbot is in disc state, it tries to read a response but it will never get one. By removing disc from the while loop, it will fall into BTLEDisconnectError exception and retry properly the next time around, without any deadlocks.
@heisenberg2980 heisenberg2980 changed the title Update __init__.py Fix issue where switchbot stops responding due to "disc" status Feb 18, 2022
@Danielhiversen
Copy link
Collaborator

@RenierM26 any comment?

@RenierM26
Copy link
Contributor

Hi There,

This will cause a lot of retries if the user has multiple devices. The bluepy helper returns disconnect after any operation finishes...usually an issue if you have 3+ devices. A disconnect could also happen before the scan finishes. (The bluepy library does a lot of connect/disconnects...)

The connect function also has a built in timeout so it won't be stuck waiting for a disconnect (The default is close to a min, can't remember the exact time). It would probably be better just to call the connect function with a shorter timeout. (code is already there)

The actual problem is most likely related to something else and this would just be a workaround.

Just on a side note: I have a raspberry pi 4 with latest firmware (manually updated) running for 2.5 months without any issues.

@heisenberg2980
Copy link
Author

heisenberg2980 commented Feb 19, 2022

I agree this is a workaround, but for all the users having issues from December (i.e. home-assistant/core#61535) this workaround will be really useful as it is really frustrating having to restart HA every time switchbot stops working. In my case that happens just every 3-4 days (I think that is because my switchbot and my HA server are really close), but some people mention they have to restart every day, which is painful.
Happy to test any other fixes or solutions, but unless this PR causes extra issues to people with multiple devices, in the meantime I really think this PR should be merged.

@pascalwinters
Copy link

pascalwinters commented Feb 19, 2022

@RenierM26 I also agree this is a workaround. But for me, Im using Switchbot since this week, and it hasn't been stable for a day.
I seems that after a command (sometimes 2 commands) the integration is broken and a HA reboot is necessary. I have at this moment 2 curtains combined to 1 group. So in HA it's one device.

I'm also happy to test any other fixes, but what @heisenberg2980 says. Can we in the meantime merge this PR?

Just on a side note: I have a raspberry pi 4 with latest firmware (manually updated) running for 2.5 months without any issues.

Which firmware do you mean? The Switchbot curtain firmware? My curtains are running on v3.1.

@RenierM26
Copy link
Contributor

RenierM26 commented Feb 19, 2022

Hi @pascalwinters,

The raspberry itself. There where a few Bluetooth fixes in the firmware and wasn't included in Hass OS at that time. (October/November I think)

My DEV and Main PI both only have the Switchbot bluetooth integration running. (no other bluetooth integrations)

Are you running any other bluetooth integrations in addition to Switchbot? Might explain the lockups.

@pascalwinters
Copy link

Hi @RenierM26 ,

No, I have special bought for this integration a bluetooth dongle. So it's only used for the Switchbot integration.

@RenierM26
Copy link
Contributor

Thank you for the feedback.

I have no objections to this pull request, just trying to find the root cause.

@RenierM26
Copy link
Contributor

Do you perhaps have the make and model of the dongle?

@pascalwinters
Copy link

pascalwinters commented Feb 19, 2022

I agree, the root cause is important to find. Will it help if there is more debug logging added, so we can enable it and provide you with more info?

I have no objections to this pull request, just trying to find the root cause.

@Danielhiversen Can you merge this PR please?

@pascalwinters
Copy link

pascalwinters commented Feb 19, 2022

Do you perhaps have the make and model of the dongle?

TP LINK UB400 in combination with an Ondroid N2+

@heisenberg2980
Copy link
Author

@RenierM26 I also have a Raspberry Pi 4 with the latest version of HA Core and Supervisor and this is the only integration using Bluetooth, but I still had this issue with my switchbot every 3-4 days until I removed the line proposed in this PR. Not sure why yours is not failing, maybe you could try moving your switchbot far away from your Raspberry to see if that force the issue to occur.

@RenierM26
Copy link
Contributor

Hi @heisenberg2980,

Thanks for the info! I'm able to replicate the issue by moving it out of range.

I'm going to test a timeout on the connect function and see if this solves the problem.

@pascalwinters
Copy link

@RenierM26

I'm going to test a timeout on the connect function and see if this solves the problem.

It’s very good news you can reproduce it.

When do you expect you can test this and creating a PR?

@RenierM26
Copy link
Contributor

Looks like timeout won't be a practical fix. There's more methods that need to be overridden before this will work.

Looks like this pull request is best fix for now.

@heisenberg2980
Copy link
Author

Perfect, let´s merge it and if we find a better solution in the future we will implement it.

@Danielhiversen can you merge this PR?

@pascalwinters
Copy link

Looks like timeout won't be a practical fix. There's more methods that need to be overridden before this will work.

Looks like this pull request is best fix for now.

Ok. Thanks for your time to test it. I appreciate it.
@Danielhiversen regarding the info from @RenierM26, is it possible to merge this PR and create a release?

@RenierM26
Copy link
Contributor

RenierM26 commented Feb 19, 2022

Hi All,

Found the reason why the timeout isn't working. I have also added a 40 seconds timeout if the device is not reachable for some or other reason - should prevent the lockup issue.

Mind testing? #40

Nevermind....Looks like PR 40 needs some additional bug fixing.

@RenierM26
Copy link
Contributor

Hi @pascalwinters and @heisenberg2980,

Made a couple of improvements to the handling of disc states. Mind testing #40 ?

@pascalwinters
Copy link

pascalwinters commented Feb 20, 2022

Hi @pascalwinters and @heisenberg2980,

Made a couple of improvements to the handling of disc states. Mind testing #40 ?

HI @RenierM26 , I’m happy to test. But I don’t know how to test this? Because I’m using HA OS. So I can add it as custom component, but I can’t replace the python package. Any tips?
Is it possible to create an pre-release? Because when I added as custom component, I can bump the python package myself.

I'm running the integration on your branch. First impression looks good, sometimes the delay is bigger because of this warning:
2022-02-20 11:12:40 WARNING (SyncWorker_2) [switchbot] Bluepy busy, waiting before retry

I see this warning frequently, I'm not sure if it's a good sign.

@heisenberg2980
Copy link
Author

Happy to test it but haven´t tested a PR before, are there any instructions/tutorial I can follow? do I need to disable the original integration and download the branch to the custom component folder?

@pascalwinters
Copy link

pascalwinters commented Feb 20, 2022

@heisenberg2980

Yes, you can copy the switchbot integration from the core repository to the custom component directory and replace the manifest content with this:

{
  "domain": "switchbot",
  "name": "SwitchBot",
  "documentation": "https://www.home-assistant.io/integrations/switchbot",
  "requirements": ["git+https://github.com/RenierM26/pySwitchbot.git@pyswitchbot_timeout#PySwitchbot==0.13.3"],
  "config_flow": true,
  "codeowners": ["@danielhiversen", "@RenierM26"],
  "iot_class": "local_polling",
  "loggers": ["switchbot"],
  "version": "1.0.0"
}

This all will automatically overwrite the core integration until you delete the custom switchbot component.

@heisenberg2980
Copy link
Author

@pascalwinters when you say "copy the switchbot integration from the core repository" do you mean the original version of the integration (master branch) or the PR (pyswitchbot_timeout branch)? Also where is the manifest file?

@pascalwinters
Copy link

@pascalwinters when you say "copy the switchbot integration from the core repository" do you mean the original version of the integration (master branch) or the PR (pyswitchbot_timeout branch)? Also where is the manifest file?

I mean this repository:
https://github.com/home-assistant/core/tree/dev/homeassistant/components/switchbot

@heisenberg2980
Copy link
Author

Thanks @pascalwinters, testing it now

@pascalwinters
Copy link

Thanks @pascalwinters, testing it now

Nice. Good luck.

@RenierM26
Copy link
Contributor

I see this warning frequently, I'm not sure if it's a good sign.

Bluepy doesn't handle the "disc" status all that well. It captures any disconnect from any device (disconnect is the last stage of any connection to any device) I have added warning logging when this happens to aid in troubleshooting. Could change it to a lower level later on to prevent log spamming (Debugging perhaps)

Connection failures should work a lot better.

I think the btle scanning function, in the bluepy library, might ultimately be the root cause (it has aggressive retries if the helper drops a connection before the scan completes....which kind of leads to a controlled race condition.) You can connect to more than one device at a time but the last bluepy release doesn't handle it all that well.

Sorry for the long explanation.

@pascalwinters
Copy link

@RenierM26 Thank you for explaining. For me it's stable since the beginning of using this fix. Two remarks/questions:

  1. Maybe it's good (before merge this PR), to mark this line as DEBUG:
    2022-02-20 11:12:40 WARNING (SyncWorker_2) [switchbot] Bluepy busy, waiting before retry

  2. I'm missing the log of reading each (I mean) 60 seconds the current status. Is this log event deleted?

@heisenberg2980
Copy link
Author

heisenberg2980 commented Feb 21, 2022

After 24 hours of testing, the fix looks promising, with no lockups yet, but there is something weird I would like to share, my switchbot seems to take in average much longer to respond than before, taking sometimes up to 30 seconds to switch, which very rarely happened before. I don´t have exact numbers, but I would say now it takes 20-30 seconds 50% of the times (taking 5-10 seconds the other 50%), and before it usually took less than 10 seconds most of the times (taking over 20 seconds just 10-15% of the times).

Obviously this could be due to external factors or interferences and not related with this fix, but wanted to mention it just in case @pascalwinters or someone else is also experiencing it.

BTW, even with this performance issue, if I have to decide between now and before the fix I would choose the fix, it is always better to have a response (even if the response is a bit slow) rather than a lockup that requires a HA restart.

@pascalwinters
Copy link

@heisenberg2980 I have the same experience. Sometimes it takes more time to respond. I think it's waiting for a timeout or so?
But for me, it's now priority having a working environment and after that looking for a solution with better performance.

Maybe @RenierM26 can explain the delays?

@RenierM26 Is it an idea to merge this and make a PR for HA (I can do that if you want) so it has been fixed for all the users in HA with the next release. But we must make progress, because the beta is almost there.

@heisenberg2980
Copy link
Author

@heisenberg2980

Yes, you can copy the switchbot integration from the core repository to the custom component directory and replace the manifest content with this:

{
  "domain": "switchbot",
  "name": "SwitchBot",
  "documentation": "https://www.home-assistant.io/integrations/switchbot",
  "requirements": ["git+https://github.com/RenierM26/pySwitchbot.git@pyswitchbot_timeout#PySwitchbot==0.13.3"],
  "config_flow": true,
  "codeowners": ["@danielhiversen", "@RenierM26"],
  "iot_class": "local_polling",
  "loggers": ["switchbot"],
  "version": "1.0.0"
}

This all will automatically overwrite the core integration until you delete the custom switchbot component.

@pascalwinters Out of curiosity, how did you update the requirements line of the manifest to point to the PR? is there any link/document to understand the logic?

@pascalwinters
Copy link

@heisenberg2980
Yes, you can copy the switchbot integration from the core repository to the custom component directory and replace the manifest content with this:

{
  "domain": "switchbot",
  "name": "SwitchBot",
  "documentation": "https://www.home-assistant.io/integrations/switchbot",
  "requirements": ["git+https://github.com/RenierM26/pySwitchbot.git@pyswitchbot_timeout#PySwitchbot==0.13.3"],
  "config_flow": true,
  "codeowners": ["@danielhiversen", "@RenierM26"],
  "iot_class": "local_polling",
  "loggers": ["switchbot"],
  "version": "1.0.0"
}

This all will automatically overwrite the core integration until you delete the custom switchbot component.

@pascalwinters Out of curiosity, how did you update the requirements line of the manifest to point to the PR? is there any link/document to understand the logic?

https://developers.home-assistant.io/docs/creating_integration_manifest/#custom-requirements-during-development--testing

@heisenberg2980
Copy link
Author

@RenierM26 @pascalwinters considering #40 looks promising, unless you have any objection I will close this PR

@pascalwinters
Copy link

@RenierM26 @pascalwinters considering #40 looks promising, unless you have any objection I will close this PR

Seems good to me.

@heisenberg2980
Copy link
Author

Closing this PR as #40 seems to be a better solution

@heisenberg2980 heisenberg2980 deleted the patch-1 branch February 22, 2022 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants