Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Cannot reach homeserver" after homeserver restart #15241

Open
Ezwen opened this issue Sep 19, 2020 · 29 comments
Open

"Cannot reach homeserver" after homeserver restart #15241

Ezwen opened this issue Sep 19, 2020 · 29 comments
Labels
A-Electron T-Other Questions, user support, anything else

Comments

@Ezwen
Copy link

Ezwen commented Sep 19, 2020

(Disclaimer: I am a Synapse admin and I am here describing a problem encountered by many of my users.)

Description

When the used homeserver restarts (eg. when upgrading), element-desktop says that it cannot reach the homeserver (as expected), but keeps saying that even after the homeserver finished restarting (not expected).

Then, if the element-desktop user completely disconnects from the homeserver, and tries to log in once again, she now repeatedly get the error "cannot reach homeserver". Restarting element-destkop or the computer does not solve the problem.

If another client is used on the same computer (eg. element-web or Nheko), it works perfectly. In fact, all users of the homeserver who were not using Element did not even notice a problem.

Workaround: deleting .config/Element solves the problem.

Steps to reproduce

  • Connect to a homeserver with Element-desktop
  • Have the homeserver restart
  • Try to re-login into the homeserver with Element-desktop

The error "cannot reach homeserver" appears.

Logs being sent: no (I have to ask)

Version information

  • Platform: desktop
  • OS: Ubuntu
  • Version: 1.7.7
@Ezwen Ezwen added the T-Defect label Sep 19, 2020
@cmasdf
Copy link

cmasdf commented Sep 21, 2020

We get the same error with Ubuntu clients for our custom homeserver all the time, even without restarting the server before. The provided workaround does not work for us. Ubuntu desktop clients simply can't connect. The server does not show any activity when these clients try to connect. I could not find any debug/log information at the client, if anyone can tell me where to get the logs, i can provide some. All other clients (web, macos, ios, android) are working as expected.

//edit:

server version: 1.19.3
ubuntu client: 1.7.7 (precompiled from https://packages.riot.im/debian/ default main)

@jryans jryans added defect T-Other Questions, user support, anything else A-Electron and removed T-Defect defect labels Sep 21, 2020
@jryans
Copy link
Collaborator

jryans commented Sep 21, 2020

Hmm, very interesting, I haven't heard of this happening before. Logs are available in the console with in the app, which can be accessed via Ctrl-Shift-I.

Please submit debug logs if possible (perhaps @cmasdf can do so), as that will greatly aid debugging the issue.

@cmasdf
Copy link

cmasdf commented Sep 21, 2020

Thanks, for the hint with the dev tools, this pointed me into the right direction. My problem seems to be element-hq/element-desktop#798 - i just didn't notice it because i expected adding the root ca to the OS store to be sufficient. Looks like i cannot be much of a help with this problem. @Ezwen please provide further logs if possible. Thanks @jryans for your quick response.

@Ezwen
Copy link
Author

Ezwen commented Sep 21, 2020

Hmm, very interesting, I haven't heard of this happening before. Logs are available in the console with in the app, which can be accessed via Ctrl-Shift-I.

I've asked my users to do that next time they encounter the problem (probably with the v1.20.0 synapse release), let's hope they remember to do it :)

Thank you both for responding that quickly!

@jryans
Copy link
Collaborator

jryans commented Sep 23, 2020

@Ezwen If possible, please ask users to submit debug logs by going to Settings -> Help -> Submit debug logs and link this issue. That will send a copy of the logs to our private logs server for easier analysis.

@Ezwen
Copy link
Author

Ezwen commented Sep 28, 2020

One of my users (using Element1.7.7) got me some logs! 🎉

  1. Our homeserver was restarted yesterday (27/09/2020) after updating synapse to 1.20.1.
  2. After this restart, this user opened Element, and Element got stuck forever trying to re-sync. Logs when this happens.
  3. After waiting a while and seeing that the resync will never work, this user logged out, and tried to login again ; the message "cannot reach homeserver" popped up. Logs when this happens.
  4. Finally the user closed Element, then deleted the configuration folder of Element (.config/Element), then tried again to login, which worked. Logs when this happens.

Lots of CORS errors in there, which is strange since AFAIK synapse is always adding correct CORS headers to responses?

@t3chguy
Copy link
Member

t3chguy commented Sep 28, 2020

Logs are not helpful to dig into CORS errors as they are basically an umbrella error, the Network tab would have more information.

@Ezwen
Copy link
Author

Ezwen commented Sep 28, 2020

Logs are not helpful to dig into CORS errors as they are basically an umbrella error, the Network tab would have more information.

Oh, good to know. I'll let my users now, and try to gather better data at the next synapse upgrade.

@Ezwen
Copy link
Author

Ezwen commented Sep 28, 2020

Additional information: I have more and more testimony showing that the problem actually does not happen after each synapse restart… but after each synapse upgrade from one version to another. This is getting even more mysterious.

@t3chguy
Copy link
Member

t3chguy commented Sep 29, 2020

Definitely sounds like a Synapse issue, probably it doing internal database migrations during upgrades.

@Ezwen
Copy link
Author

Ezwen commented Sep 29, 2020

Even if that is the case, isn't it worrying that when this happens Element-desktop requires a wipe of .config/Element before being able to connect to the homeserver once more?

@Ezwen
Copy link
Author

Ezwen commented Sep 30, 2020

Update: another of my users encountered a very similar problem, and managed to capture a network log.

The story:

  • The user was logged in using Element-web on Brave (ie. Chrome).
  • The homeserver was routinely restarted for a maintenance task (but not an upgrade of synapse this time).
  • As expected, element-web stated in red letters that the connection was lost. But, not as expected, this message did not disappear even long after the server finished its restart.
  • The user then stopped waiting and pressed F5, which reloaded element-web. This time element-web showed the loading circle animation. But again, this reloading then goes on forever (he tried as long as 40min), same as all the other testimonies I got.
  • At this point when the user tries again to press F5, or to close+reopen the tab, he faces the same situation: a never ending loading circle.

Workaround: The user only managed to reconnect after clearing the Brave browser cache.

Logs: If this can help, the user provided network logs captured when the bug occured using Brave's developpment tools :
app.element.io.zip

@t3chguy
Copy link
Member

t3chguy commented Sep 30, 2020

@Ezwen inform the user to evict that device immediately, the network logs contain their access token.

They also contain 0 failed Matrix requests.

@dbkr
Copy link
Member

dbkr commented Oct 20, 2020

#15509 could possibly be causing this?

@t3chguy
Copy link
Member

t3chguy commented Oct 21, 2020

Doubtful given that the thing which caused #15509 (regression) was after this report

@Ezwen
Copy link
Author

Ezwen commented Nov 1, 2020

@Ezwen inform the user to evict that device immediately, the network logs contain their access token.

I forgot to answer, but I did see your warning and instruct the user to evict his device ASAP. Thank you!

They also contain 0 failed Matrix requests.

The mystery thus remains… unfortunately I have new users coming that start experiencing this problem, and my investigations still don't give me any clue.

I will try to get logs from a user that uses element desktop, just in case Electron is more talkative than Brave.

#15509 could possibly be causing this?

Unfortunately this bug was fixed and AFAIK my users that have the latest Element version still experience this problem.

@Ezwen
Copy link
Author

Ezwen commented Nov 1, 2020

As I mentioned in my last message, I had access to the computer of someone who encountered said problem with our homeserver. Here is what I could observe:

  • The user is using Element-desktop 1.7.12, installed with Flatpak under Linux Mint 20.
  • The user got disconnected after one recent routine restart of the synapse server (probably for an upgrade). The user did not have to manually disconnect in that case, and was simply given the Element-desktop initial startup page one morning after starting Element.
  • In this situation, when on the Element-desktop initial startup page, if the user enters a custom homeserver address (in that case, the address of the problematic homeserver), then presses Enter, a message in red says that the homeserver cannot be joined. However Element-desktop does not complain when https://matrix.org is entered.
  • From there, the same workaround as the one described before can be applied: close Element, delete the cache (with flatpak, the folder is ~/.var/app/im.riot.Riot/config/Element/Cache/), restart Element, and then the custom homeserver can once again be chosen and used.

And here is what I gathered:

  • A screenshot showing the error and showing the logs in Element-desktop (I've redacted all mentions to the homeserver, I prefer to keep it private for now, but I can share the link if needed):
    Capture du 2020-11-01 12-37-07_redacted
  • Network logs when this occurs (also redacted):
    vector.zip

@Ezwen
Copy link
Author

Ezwen commented Mar 12, 2021

Quick update: I'm still getting new users encountering this issue, with a synapse homesever recently upgraded to 1.29.0. The last user was using Element desktop 1.7.22.

@Ezwen
Copy link
Author

Ezwen commented Jun 17, 2021

Another quick update: issue still present, Synapse 1.36.0, Element 1.7.30.
(I hope it does not sound too pushy − it's really not the intent, I only want to document)

@Ezwen
Copy link
Author

Ezwen commented Oct 23, 2021

Problem still present as of today—encountered by some of my users after the homeserver restarterd to upgrade to synapse 1.45.1.

Since no one else is joining this discussion, I suppose this must be a problem specific to my homeserver somehow. Yet, client-wise, the problem only happens with Element-web, and not with other clients such as Element-android, Fractal, Nheko, etc. Therefore I cannot help but think that somehow there must be a small problem in Element-web.

I've never done any Element-web dev. If I were to investigate this problem (eg. using a debugger), any suggestion on how/where to start?

@tcbutler320
Copy link

tcbutler320 commented Dec 7, 2021

I encountered this error when I was initially standing up my element/matrix server, the issue [I think] was that I failed to install an SSL cert for the element subdomain. I initially installed using old riot instructions, so the initial cert was for riot.domain.com, after troubleshooting I made a new cert for element.domain.com and was able to get passed this. hope this helps!

@Ezwen
Copy link
Author

Ezwen commented Dec 7, 2021

@tcbutler320 Thanks for the suggestion! Unfortunately, no self-hosted Element in my case, everything I described also happen with element-desktop :/

@Ezwen
Copy link
Author

Ezwen commented Mar 30, 2022

Update: still happening as of Element web/desktop 1.10.8, and synapse 1.55.2. I might try in the near future to run Element in debug mode using a cache folder provided by a user.

@xuhdev
Copy link

xuhdev commented May 22, 2022

Some info from me: Unchecking "Query OCSP responder servers to confirm the current validity of certificates" in Firefox settings can work around the issue.

@Ezwen
Copy link
Author

Ezwen commented May 23, 2022

Some info from me: Unchecking "Query OCSP responder servers to confirm the current validity of certificates" in Firefox settings can work around the issue.

Interesting. The user I have that helps me the most with this issue is using Brave though, which I believe has no option to disable OCSP. I'll see whether I can try this somehow though.

@Ezwen
Copy link
Author

Ezwen commented Jul 4, 2022

A new interesting piece of information: when element-web reaches the described "bad state" , it only shows the error "Cannot reach homeserver" for a single homeserver, namely the one that I administrate.

In other words, if, in the described situation, a user enters a different valid homeserver URL, then no error is shown.

This means there is in fact clearly something different with our particular homeserver, but only when element-web reaches the described situation, with a (seemingly) faulty cache.

@FrancescoSaverioZuppichini

same

@blaine07
Copy link

Having this issue using in chrome browser element. Issue is 1- I don't know how to delete file referenced here and 2-I have no idea how to fix it. Issue evidently still persists though.

@je-s
Copy link

je-s commented Jan 2, 2023

I can't speak for every case encountered in this thread, but I think I've at least found a simple workaround for my case; Adding the Port in the Homeserver-Field lets me connect instantly (example.com:8448).

I'm having that problem with Element only, despite other clients and https://federationtester.matrix.org/ are working perfectly fine and without any issues.
I could only reproduce the problem by logging out and then logging back in again. After the first installation of Element without any configs present it's working without adding the port.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Electron T-Other Questions, user support, anything else
Projects
None yet
Development

No branches or pull requests

10 participants