Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meet with devops/ Deep dive into AWS setup #20

Closed
davidschober opened this issue Apr 28, 2017 · 2 comments
Closed

Meet with devops/ Deep dive into AWS setup #20

davidschober opened this issue Apr 28, 2017 · 2 comments

Comments

@davidschober
Copy link

Done looks like

@mbklein sets up a meeting with RDC/Devops to go over infrastructure, published cloud formation scripts, and next steps needed (represented in other tickets)

The goal of the meeting is for RDC/Devops to finalize the plan for the pilot and create any new issues/ carve out time for training if necessary.

Topics to be covered:

  • Infrastructure
  • Cloudformation
  • Needs from monitoring
@davidschober
Copy link
Author

Notes from Avalon Meeting

One big VPC / 10. subnets, inaccessible

we are using redis for caching
each major compoennt (zookeeper(solr configuration manager), solr, avalon, fedora)

  • Overview of diagram
    • beanstalks == loadbalancer and ec2 instances

inside vpc

  • zookeeper

    • With solr, you have to load configs, create core, back up, etc
    • Avalon connects to zookeeper, hands over configs
    • zookeeper, tells the solr to share avalon core, handles redundancy, etc, zookeeper, will alert
  • fedora

    • Stores metadata in postgres
    • binaries in an s3 bucket
    • not using spinning disk storage (feature built into f4)
  • Avalon

    • Each beanstalk
    • We have the web application
    • the webapp pushes jobs into the que, the worker picks up jobs to make sure that things don't slow down

outside vpc

  • s3 buckets

    • Masterfiles (store originals and uploaded "dropbox" items)
      • when a user puts an item in a bucket, it kicks off a lambda
      • Takes json, shoves it into notification service
      • Batches, the worker picks it up from the que and does it
      • Batch ingest works on demand rather than as cron
    • Derivatives ( store transcoded items for streaming)
    • Fedora Binary Storage (stores Fedora items)
  • Elastic Transcoder

    • Nothing to maintain
    • Input bucket, output bucket
    • Sets off jobs based on input and output
    • Currently, things are sitting in the buckets . We may want to do something different with them
    • Currently we have 6 derivatives
    • Moving forward, should we just do mpeg dash? and auto, cut down to low/high
  • Streaming

    • CLoudfront
    • Presigned URLs so that cloudfront knows they
  • Code Pipeline

    • Right now it is watching an s3 bucket
    • if avalon.zip updates, it updates the code

Policies

  • Current updates/changes
    • Our current settings are the most conservative
    • It takes ~15 minutes

Cloudformation

  • We can bring up a fully functioning system in cloudformation
  • If we make decisions to change configurations, it gets different
    • If we changed the database

Things we haven't dealt with

  • log streaming, funneling, consolidation
  • AWS has a cloudwatch feature that has a streaming logging (TODO, create issue)
  • We could get Avalon logs to push to cloudwatch

Cloudwatch

  • currently - we send alarms when avalon is !=200
  • We should subscribe healthgenie emails to
  • should monitor
    • Avalon
    • Fedora
    • Solr
    • Zookeeper

What to do when things are bad

  • restart app server (a big button)
  • Rebuild environment

Major avalon issues we've had

  • delayed jobs (SQS, shouldn't be an issue)
  • Matterhorn
  • running out of space for derivatives
  • log files going crazy

TODO

  • MBK will send out a link to the cloudformation

  • MBK and phoung to talk about where the cloiudformation script will live

  • Decide where we want to be geographically

  • Write up how to deal with outages

    • restart server
  • create firedrill

    • carrick plays chaos monkey
    • team responds

@davidschober
Copy link
Author

@carrickr @mbklein @d-venckus @Toputnal Can you edit the minutes/ add to them? And write out any tasks not represented

FYI:
@ccaizzi @rtrautve1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant