Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detecting state #6778

Closed
konne opened this issue Nov 18, 2021 · 23 comments
Closed

detecting state #6778

konne opened this issue Nov 18, 2021 · 23 comments

Comments

@konne
Copy link

konne commented Nov 18, 2021

Bug description

like you see in this picture this service runs perfectly and can be accessed with the browser, but
it is marked with a detecting state.

image

Steps to reproduce

I don't know if you can reproduce it.

We have this situation that the port detection works perfect for node services that are running in docker containers,
but not for the pgadmin that is also running inside a docker container.

Workspace affected

workspace: apricot-scorpion-yeo25j03

Expected behavior

No response

Example repository

No response

Anything else?

No response

@mahmoudajawad
Copy link

This issue does exactly occur to me now randomly. I'm working around this by stopping the workspace and starting it again.

@akosyakov
Copy link
Member

Just to clarify: it happens only when you run docker in Gitpod?

@konne
Copy link
Author

konne commented Nov 22, 2021

@akosyakov for me, we run everything inside docker, so I can confirm that it happened with something that runs inside docker, but no other assumptions around that

@akosyakov
Copy link
Member

@csweichel I looked at supervisor logs when it happens and

err := <-exposing
if err != nil {
if err != context.Canceled {
autoExpose.state = api.PortAutoExposure_failed
log.WithError(err).WithField("localPort", localPort).Warn("cannot auto-expose port")
}
return
}
autoExpose.state = api.PortAutoExposure_succeeded
log.WithField("localPort", localPort).Info("auto-exposed port")
is never executed for such port. The last log is auto-exposing port. There was not changes in exposed ports service besides removing global port, so it seems it blocks on control port request to server. I am not sure. Can we see traces?

@csweichel
Copy link
Contributor

I am not sure. Can we see traces?

Unfortunately we don't trace this interface 🤦

That said, auto-exposure is only neccesary in the "public exposure" case since we've moved away from Kubernetes services. Hence we could make this part be a bit more "tolerant" towards auto-exposure failure.

@akosyakov
Copy link
Member

It actually happened to me very often today and it seems first hanging port blocks all consequent:
Screenshot 2021-11-25 at 14 30 20

@akosyakov akosyakov added the priority: highest (user impact) Directly user impacting label Nov 25, 2021
@akosyakov
Copy link
Member

Hence we could make this part be a bit more "tolerant" towards auto-exposure failure.

I am not sure what do you mean? It does not fail, it hangs forever.

@corneliusludmann
Copy link
Contributor

/assign

@corneliusludmann
Copy link
Contributor

@akosyakov Do you have a workspace ID or a link to the logs for your case?

I had a look in the logs of the workspace apricot-scorpion-yeo25j03 of the issue opener and I cannot find the issue that you observed there (missing auto-exposed port log). For the port that is marked as detecting in the screenshot (3980) there is a auto-exposed port message (log entry).

@akosyakov
Copy link
Member

@akosyakov Do you have a workspace ID or a link to the logs for your case?

I actually was able to reproduce it with new workspaces today. I think the workspace id was ivory-swift-u1uxar3a and port 24678. It happened like 4-6 hours ago.

It was opened for this context: trusktr/solid-lume-starter@35c3426 and start a server with npm i && npm run dev command.

@corneliusludmann
Copy link
Contributor

Here is an update of my findings so far:

  • It's pretty hard to reproduce. I get this in production from time to time but never in a preview env on core-dev. That makes it hard to debug.
  • I don't see the behavior @akosyakov described above. I always see both logs auto-exposing port and auto-exposed port, even in the workspaces of Anton. Either I missed something or Anton skipped the entry by mistake.
  • My current hypothesis is that it is related to the server memory leak. Which makes the server not reliably accessible in prod from time to time. However, I don't fully understand the role of the server in this process (yet).
  • If we want to improve the situation from our side I think we should address @csweichel's comment to be a bit more “tolerant” towards auto-exposure failure. However, haven't found an actual error in the logs yet and I currently don't see the proper leverage right now to address this. Could someone give me some pointers to that?

@akosyakov
Copy link
Member

I don't see the behavior @akosyakov described above. I always see both logs auto-exposing port and auto-exposed port, even in the workspaces of Anton. Either I missed something or Anton skipped the entry by mistake.

I think you are right, I overlooked something by mistake. I could not see them hanging right now. Actually all returns successfully.

@akosyakov
Copy link
Member

@iQQBot you can follow internal investigation here: https://gitpod.slack.com/archives/C01KGM9BH54/p1638873457412200

@geropl
Copy link
Member

geropl commented Dec 8, 2021

Relevant update: #7054 (comment)

@akosyakov
Copy link
Member

@iQQBot supervisor reconnects to the server each 10 mins, during such reconnect we can miss some events, so let's add a mitigation which sync with server on reconnect.

Do you need some code pointers?

@iQQBot
Copy link
Collaborator

iQQBot commented Dec 20, 2021

Why reconnects to server each 10min?

@marcoaleixo
Copy link

I'm having this "detecting..." problem.
By accident I discovered that if we run "gp forward-port any_port" Gitpod automatically is able to detect the state of the ports.
I'm posting this if anyone in future have the same bug until the team doesn't fix it.

@akosyakov
Copy link
Member

By accident I discovered that if we run "gp forward-port any_port" Gitpod automatically is able to detect the state of the ports.

Yeah, it sounds exactly like a problem of missing events. If you forward another port a new event is issued which delivers proper state for first port too.

@akosyakov
Copy link
Member

should be fixed by #7621

@olleyt
Copy link

olleyt commented Apr 29, 2023

@akosyakov the issue is reproducing again after my account was migrated to the new Gitpod version. Port 300 is not opening when I run a Docker container on it.

Could this issue be reproduced in the new release?

@axonasif
Copy link
Member

Hi @olleyt, is this still happening?
Also, could you please share an example workspace ID and a timestamp of the event?

@olleyt
Copy link

olleyt commented May 1, 2023

yes, it is still happening for example this workspace : xcymr2gwvy0.ws-us96b.gitpod.io or this one ny59rvseewn.ws-us96b.gitpod.io

timestamp: Mon 01 May 2023 10:11:20 AM UTC

as a workaround we used gp preview command and then gp url and prepending port in that url

@laushinka
Copy link
Contributor

This was resolved yesterday. See https://www.gitpodstatus.com/incidents/6h1gd447lk7b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Status: Done
Development

Successfully merging a pull request may close this issue.