Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate cms performance #2989

Closed
3 tasks
Tracked by #137
jason-upchurch opened this issue Jun 21, 2019 · 3 comments
Closed
3 tasks
Tracked by #137

Investigate cms performance #2989

jason-upchurch opened this issue Jun 21, 2019 · 3 comments

Comments

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Jun 21, 2019

Summary

It was found that Tenable scans have been correlated with website outage (related issue: #2910). We need to determine if current performance is acceptable, or if Tenable scans / other traffic at observed volumes are capable of causing unavailability of the website.

Expected Behavior

Website availability.

Actual Behavior

Website unavailability, i.e., 504.

Frequency

It was observed that the website was down on two occasions correlated with Tenable scans.

Completion Criteria

  • See if the Tenable scans are indeed the cause of CMS performance issues.
  • See which parts of the CMS were effected (database, static pages, cms application in general)
  • Make implementation ticket for whatever the next step is.
@patphongs
Copy link
Member

On July 2nd, the www.fec.gov had a brief outage between 10pm - 11pm with out of memory errors and needed to reboot. We need to decide what we need to do and what effected the down time.

@jason-upchurch
Copy link
Contributor Author

The Tenable scan over the evening of July 18/19 was correlated with the following New Relic incidents (these have not been verified as outages). One approach may be to look at each of these incidents, and find one that corresponds to a website outage (if any) and matches the IP address for the tenable scan over that evening (see @jason-upchurch for this IP address). The particular activity / volumes associated with that scan may inform performance investigation and help determine 1) whether it is acceptable for observed traffic to be correlated with any website outage, and if not 2) a possible solution path:

Incident 76477192
Incident 76477181
Incident 76477150
Incident 76477033
Incident 76476903
Incident 76476403
Incident 76476336
Incident 76476324
Incident 76476321
Incident 76476295
Incident 76476235
Incident 76476224
Incident 76476164
Incident 76476026
Incident 76476017
Incident 76475924

@PaulClark2
Copy link
Contributor

Closing in favor of #3116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants