Remove redis #1208

Closed
dwradcliffe opened this Issue Feb 27, 2016 · 7 comments

Comments

Projects
None yet
6 participants
@dwradcliffe
Member

dwradcliffe commented Feb 27, 2016

We would like to remove redis from the RubyGems.org infrastructure. It has been painful to run and it's a dependency that we don't need. We want to store the gem download stats in postgres, which will be much easier and faster. This is related to rubygems/rubygems-infrastructure#35 but this is a separate task.

Here's what I think we need to do:

  • add any database columns necessary to store number of downloads per gem and per version (not sure if they are there yet or not)
  • add methods/rake tasks that will sync the gem and version download counts from redis to postgres (this should be able to run many times, to sync things as we work on this)
  • update app views to pull stats from the model instead of redis
  • update FastlyLogProcessor to push to postgres instead of redis
  • should be able to remove all redis?

@arthurnn @evanphx @skottler @indirect @ktheory

@arthurnn arthurnn self-assigned this Feb 27, 2016

@simi

This comment has been minimized.

Show comment
Hide comment
@simi

simi Feb 27, 2016

Contributor

it has been painful to run

Is there more info available related to this part?

Contributor

simi commented Feb 27, 2016

it has been painful to run

Is there more info available related to this part?

@dwradcliffe

This comment has been minimized.

Show comment
Hide comment
@dwradcliffe

dwradcliffe Feb 27, 2016

Member

Our data set was* about 26G which requires a lot of memory to run, and redis doubles memory usage while it saves. It takes about 10-15 minutes to load the data so restarts are really painful. Sometimes when it does start it just OOMs right away and kills the box. We had an hour outage a few weeks back because redis would not boot up properly. Also we don't want the website to be dependent on 2 data stores. Removing redis will improve resiliency and page speed because we're already loading the records from postgres that we need.
* we removed some old per-day data to get redis to start after that outage

Member

dwradcliffe commented Feb 27, 2016

Our data set was* about 26G which requires a lot of memory to run, and redis doubles memory usage while it saves. It takes about 10-15 minutes to load the data so restarts are really painful. Sometimes when it does start it just OOMs right away and kills the box. We had an hour outage a few weeks back because redis would not boot up properly. Also we don't want the website to be dependent on 2 data stores. Removing redis will improve resiliency and page speed because we're already loading the records from postgres that we need.
* we removed some old per-day data to get redis to start after that outage

@arthurnn

This comment has been minimized.

Show comment
Hide comment
@arthurnn

arthurnn Feb 27, 2016

Member

I started working on this , for the stats pipeline.. I will post more info, why I think we can remove redis from there, in my PR.

Member

arthurnn commented Feb 27, 2016

I started working on this , for the stats pipeline.. I will post more info, why I think we can remove redis from there, in my PR.

@nateberkopec

This comment has been minimized.

Show comment
Hide comment
@nateberkopec

nateberkopec Feb 27, 2016

Contributor

add any database columns necessary to store number of downloads per gem and per version (not sure if they are there yet or not)

May be worth doing this in a separate table to minimize locking.

Contributor

nateberkopec commented Feb 27, 2016

add any database columns necessary to store number of downloads per gem and per version (not sure if they are there yet or not)

May be worth doing this in a separate table to minimize locking.

@technoweenie

This comment has been minimized.

Show comment
Hide comment
@technoweenie

technoweenie Feb 27, 2016

You can use the slotted counter pattern to reduce the locking even further. Works well for us.

http://samlambert.com/posts/mysql-slotted-counter

You can use the slotted counter pattern to reduce the locking even further. Works well for us.

http://samlambert.com/posts/mysql-slotted-counter

@skottler

This comment has been minimized.

Show comment
Hide comment
@skottler

skottler Feb 28, 2016

Member

First off, let me just say this: yes. Getting rid of Redis is something I've wanted to do for a long time; it's been a significant burden on the infrastructure and it's finally time for it to go.

Is it necessarily required to do this via Postgres? I'd love to see us map out some of the data storage requirements so we can figure out if there might be a better fit.

Member

skottler commented Feb 28, 2016

First off, let me just say this: yes. Getting rid of Redis is something I've wanted to do for a long time; it's been a significant burden on the infrastructure and it's finally time for it to go.

Is it necessarily required to do this via Postgres? I'd love to see us map out some of the data storage requirements so we can figure out if there might be a better fit.

@dwradcliffe

This comment has been minimized.

Show comment
Hide comment
@dwradcliffe

dwradcliffe Feb 28, 2016

Member

@skottler here's our previous discussion about that rubygems/rubygems-infrastructure#35

Member

dwradcliffe commented Feb 28, 2016

@skottler here's our previous discussion about that rubygems/rubygems-infrastructure#35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment