Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website down #160

Closed
pbronka opened this issue Jan 23, 2023 · 7 comments
Closed

Website down #160

pbronka opened this issue Jan 23, 2023 · 7 comments
Assignees

Comments

@pbronka
Copy link
Contributor

pbronka commented Jan 23, 2023

Hi @BlueReZZ

The journal's website seems to be down since Friday with the 504 Bad Gateway error. I don't know how to check the status of the server / don't think I have access to the server hosting the website - could you help us investigate this?

Thank you,
Patryk

@pbronka
Copy link
Contributor Author

pbronka commented Jan 23, 2023

I found the EC2 instance in the AWS dashboard. It seems to be due to a problem with the AWS itself as both status checks are failing. This instance is also scheduled to terminate in 13 days due to degraded hardware.

I tried rebooting the instance from the AWS console, but it didn't help. This link (https://aws.amazon.com/premiumsupport/knowledge-center/ec2-windows-system-status-check-fail/) suggests that stopping and starting the instance would move it to a different server and possibly fix both these issues, but I'm not sure if it won't cause problems elsewhere, e.g. because it changes the IP. Do you know if it would be ok to stop and restart this EC2 instance?

Edit: I tried stopping the instance and was able to start it again, but the website still doesn't respond.

@BlueReZZ
Copy link
Contributor

Hi Patryk,

I'll speak to @gnott about this when he's online and will take a look myself to see if there's anything obvious.

Paul

@BlueReZZ
Copy link
Contributor

Hi @pbronka ,

We've spent time this afternoon looking into the problem and found it's deeper than we thought. We've attempted to look at the logs on the server itself but cannot connect. Thinking that a new deployment might help, we also attempted that but the deployment itself cannot connect to the server. This suggests we may need to completely recreate the infrastructure which wasn't something that @thewilkybarkid and I had done before, but helpfully, @erkannt is back in the office tomorrow and he may be able to help with this.

At the moment there's not much more we can do so will revisit this tomorrow with @erkannt's assistance.

Paul

@pbronka
Copy link
Contributor Author

pbronka commented Jan 23, 2023

Thank you very much for looking into this, please let me know if we can be useful in any way.

@BlueReZZ
Copy link
Contributor

Hi @pbronka,

The website is back up and running as of 11:10am UTC this morning.

The problem seemed to be with the application on the server but we were not able to connect to the server to ascertain the exact problem. In restarting the machines a new IP address was assigned as you'd mentioned so our initial attempts at redeploying failed as the IP address had changed. The infrastructure code suggests that there is a static IP address so we didn't expect this to be the case. Anyway, in finding the correct IP address and adding this to the GitHub secrets that are used by the deployment Action to find the right server, the deployment was able to upload a clean version of the website, and this has solved the problem.

Thanks to @erkannt and @thewilkybarkid for their help, the website appears to be working normally.

We suspect the cause may have been the degraded machine that you were emailed about, then the remediation for that (rebooting the machine) didn't bring the website back up as expected, hence the deployment fixed it.... once we were able to actually deploy.

Paul

@pbronka
Copy link
Contributor Author

pbronka commented Jan 25, 2023

Thank you very much. Do you think this is something we could handle if it happened again in the future if you outlined the steps you followed to fix this?

@BlueReZZ
Copy link
Contributor

BlueReZZ commented Jan 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants