New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from MongoDB to Postgres (ready!) #623

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
@boblail
Contributor

boblail commented Dec 8, 2013

I've rebased Undev/errbit@d51612f onto master. (c.f. #614)

I squashed @realmyst's commits for two reasons:

  1. many commits left the codebase in a transitional state
  2. to expedite rebasing
@bf4

This comment has been minimized.

Show comment
Hide comment
@bf4

bf4 Dec 8, 2013

Is there any possibility of allowing either db as a backend?

bf4 commented Dec 8, 2013

Is there any possibility of allowing either db as a backend?

@ndbroadbent

This comment has been minimized.

Show comment
Hide comment
@ndbroadbent

ndbroadbent Dec 8, 2013

Member

RE: allowing either db as a backend - I don't think that's a good idea. But we could offer a choice between MySQL, Postgres and SQLite.

Member

ndbroadbent commented Dec 8, 2013

RE: allowing either db as a backend - I don't think that's a good idea. But we could offer a choice between MySQL, Postgres and SQLite.

@bf4

This comment has been minimized.

Show comment
Hide comment
@bf4

bf4 Dec 9, 2013

We use MariaDB (api-compatible fork of MySQL in our office), so that would be great if we could use that. Let me know if you could use any help.

bf4 commented Dec 9, 2013

We use MariaDB (api-compatible fork of MySQL in our office), so that would be great if we could use that. Let me know if you could use any help.

@brblck

This comment has been minimized.

Show comment
Hide comment
@brblck

brblck Dec 9, 2013

@ndbroadbent I think no matter what direction you take you're always going to run into "I use [name of data backend] and it annoys me that I have to setup and scale [name of other data backend] for errbit."

In reality, we've still yet to see anything that shows the move to Postgres actually solves a real, measurable problem except that it makes people already running Postgres happier. However, that in itself is really a legit issue in either direction. People running MongoDB don't want to have to setup Postgres, MariaDB, MySQL or whatever either.

I think the most awesome, best solution to this issue that appeases all the fan boys and girls on all sides would be to abstract errbit's models into some pluggable rails engines for different data backends. It would be more code to support but it would be more modular and allow everyone to get what they want and need.

To address the code maintenance issue, you could lean on the community members requesting support for data backend X to create and maintain that pluggable rails engine. All you'd need to do is define a design and an API that they need to hook into for everything to work.

brblck commented Dec 9, 2013

@ndbroadbent I think no matter what direction you take you're always going to run into "I use [name of data backend] and it annoys me that I have to setup and scale [name of other data backend] for errbit."

In reality, we've still yet to see anything that shows the move to Postgres actually solves a real, measurable problem except that it makes people already running Postgres happier. However, that in itself is really a legit issue in either direction. People running MongoDB don't want to have to setup Postgres, MariaDB, MySQL or whatever either.

I think the most awesome, best solution to this issue that appeases all the fan boys and girls on all sides would be to abstract errbit's models into some pluggable rails engines for different data backends. It would be more code to support but it would be more modular and allow everyone to get what they want and need.

To address the code maintenance issue, you could lean on the community members requesting support for data backend X to create and maintain that pluggable rails engine. All you'd need to do is define a design and an API that they need to hook into for everything to work.

@boblail

This comment has been minimized.

Show comment
Hide comment
@boblail

boblail Dec 9, 2013

Contributor

@brandonblack, fair points.

One fact that favors the move (independent of the popularity of either database 😀) is that Errbit has seen a number of commits (e.g. 40453cc, 24e4fe6, 5d02ba7) that normalize its data.

For better or worse, it's been treating its database more and more like a relational db than a document store.

Contributor

boblail commented Dec 9, 2013

@brandonblack, fair points.

One fact that favors the move (independent of the popularity of either database 😀) is that Errbit has seen a number of commits (e.g. 40453cc, 24e4fe6, 5d02ba7) that normalize its data.

For better or worse, it's been treating its database more and more like a relational db than a document store.

@brblck

This comment has been minimized.

Show comment
Hide comment
@brblck

brblck Dec 12, 2013

@boblail well in 2/3 of those commits data is still being embedded and de-normalized and if you have something that grows fast (like notices for example) its considered wise, not a violation of any unwritten rule, to break that out into its own collection to avoid DB file fragmentation as documents grow in size on disk and are moved around.

I wouldn't consider any of those commits an argument for either a relational or non-relational data storage solution. I really think this whole thing still boils down to simple preference, there's little to no evidence otherwise.

The fact is, something with a flexible schema like MongoDB is way better suited for this kind of use case, but at the same time I completely understand the pain point for people who already have a scaled up cluster of something else and don't want to learn or deal with doing the same on a different data store.

I still think delivering a choice here will be best for everyone and best for errbit in the long-run.

brblck commented Dec 12, 2013

@boblail well in 2/3 of those commits data is still being embedded and de-normalized and if you have something that grows fast (like notices for example) its considered wise, not a violation of any unwritten rule, to break that out into its own collection to avoid DB file fragmentation as documents grow in size on disk and are moved around.

I wouldn't consider any of those commits an argument for either a relational or non-relational data storage solution. I really think this whole thing still boils down to simple preference, there's little to no evidence otherwise.

The fact is, something with a flexible schema like MongoDB is way better suited for this kind of use case, but at the same time I completely understand the pain point for people who already have a scaled up cluster of something else and don't want to learn or deal with doing the same on a different data store.

I still think delivering a choice here will be best for everyone and best for errbit in the long-run.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Dec 15, 2013

Coverage Status

Coverage increased (+0.33%) when pulling 66770ac on concordia-publishing-house:pg into 8528b63 on errbit:master.

coveralls commented Dec 15, 2013

Coverage Status

Coverage increased (+0.33%) when pulling 66770ac on concordia-publishing-house:pg into 8528b63 on errbit:master.

@boblail

This comment has been minimized.

Show comment
Hide comment
@boblail

boblail Dec 15, 2013

Contributor

I've finished! The specs pass and the data migration worked smoothly on my production environment! (10k notices)

I moved the logic for transferring data from a Rake task to a migration. The migration will fail and roll everything back if it's unable to save any copied record. This way, if you're deploying with Capistrano, it's all-or-nothing: you don't start running the ActiveRecord code until everything has been copied to Postgres.

I did have to clean up a few invalid records in my own instance of Errbit first. I ran these commands in the rails console:

Problem.where(environment: {}).delete_all
Problem.where(environment: nil).delete_all
Err.where(:problem_id => { "$nin" => Problem.pluck(:id).map(&:to_s) }).delete_all

We could make house-cleaning of this sort a part of the Mongo -> Postgres migration to make the upgrade smoother; but I wondered if developers might want manual control over how invalid records were handled (it might be worth it to some to repair them).

I also had to reset my secret token after completing the deploy.

Contributor

boblail commented Dec 15, 2013

I've finished! The specs pass and the data migration worked smoothly on my production environment! (10k notices)

I moved the logic for transferring data from a Rake task to a migration. The migration will fail and roll everything back if it's unable to save any copied record. This way, if you're deploying with Capistrano, it's all-or-nothing: you don't start running the ActiveRecord code until everything has been copied to Postgres.

I did have to clean up a few invalid records in my own instance of Errbit first. I ran these commands in the rails console:

Problem.where(environment: {}).delete_all
Problem.where(environment: nil).delete_all
Err.where(:problem_id => { "$nin" => Problem.pluck(:id).map(&:to_s) }).delete_all

We could make house-cleaning of this sort a part of the Mongo -> Postgres migration to make the upgrade smoother; but I wondered if developers might want manual control over how invalid records were handled (it might be worth it to some to repair them).

I also had to reset my secret token after completing the deploy.

@bf4

This comment has been minimized.

Show comment
Hide comment
@bf4

bf4 Dec 15, 2013

Sounds like the way it's written, we'd want to put the site in maintenance mode when deploying.... how long does the db transfer take? Is there any way to run the general migration before the code change, deploy, and then update the postgresql with any changes since the last export?

bf4 commented Dec 15, 2013

Sounds like the way it's written, we'd want to put the site in maintenance mode when deploying.... how long does the db transfer take? Is there any way to run the general migration before the code change, deploy, and then update the postgresql with any changes since the last export?

@arthurnn

This comment has been minimized.

Show comment
Hide comment
@arthurnn

arthurnn Dec 16, 2013

Member

👎 on my side.. i think we should join efforts to make errbit better and faster, and this is totally possible using mongodb. The code base is super stable with mongodb, I dont think would be beneficial to us spending time migrating the project.

Member

arthurnn commented Dec 16, 2013

👎 on my side.. i think we should join efforts to make errbit better and faster, and this is totally possible using mongodb. The code base is super stable with mongodb, I dont think would be beneficial to us spending time migrating the project.

@shingara

This comment has been minimized.

Show comment
Hide comment
@shingara

shingara Dec 16, 2013

Member

I agree with @arthurnn I don't think it's a good idea to change Errbit Database from MongoDB to PostgreSQL. Nothing is really relevant than Postgresql is better than MongoDB in Errbit use case.

Errbit start like a MongoDB Application so I think it's better to go on in this way if there are no BIG limitation in this usage.

I encourage you to create your own fork and maintain it with postgresql database. Good luck.

@ndbroadbent. You are the last Errbit Big commiter to not enter in this debat. What do you think ? We close this issue ?

Member

shingara commented Dec 16, 2013

I agree with @arthurnn I don't think it's a good idea to change Errbit Database from MongoDB to PostgreSQL. Nothing is really relevant than Postgresql is better than MongoDB in Errbit use case.

Errbit start like a MongoDB Application so I think it's better to go on in this way if there are no BIG limitation in this usage.

I encourage you to create your own fork and maintain it with postgresql database. Good luck.

@ndbroadbent. You are the last Errbit Big commiter to not enter in this debat. What do you think ? We close this issue ?

@ndbroadbent

This comment has been minimized.

Show comment
Hide comment
@ndbroadbent

ndbroadbent Dec 16, 2013

Member

Hey, well, I love Postgres, and think it is a much better choice of database for this project. But on the other hand, I don't want to force everyone to go through a migration. | think the change is great, but is pretty unnecessary, since everything seems to work fine for most people. However, if you're running Errbit with a huge volume of errors from many different apps, then I have heard about a lot of problems with Mongo, and agree that Postgres seems to cope far better with this kind of use. I know MongoDB can be tweaked and tuned and whatever else, but Postgres seems to just work without much hassle.

If we move to Postgres, it will be very easy to support any SQL database. We could easily drop in support for MySQL or SQLite. So that would be great. Maybe that would increase Errbit's adoption and contributions.

Another factor is the Undev company who originally did the work. If we move to Postgres, maybe they would be more likely to contribute bug fixes, or continue improving Errbit?

Thanks @shingara for picking up Errbit maintenance recently, you've been doing a great job! So since you've been leading the project recently, I'll leave the final call up to you. If the move to Postgres will just leave us with a big mess, and you stop helping Errbit, and there are no more contributors, then of course it's totally not worth it. But if you are happy with the change, and it attracts the support of Undev and other companies, and more people are excited about contributing, then maybe it's worth a shot.

Member

ndbroadbent commented Dec 16, 2013

Hey, well, I love Postgres, and think it is a much better choice of database for this project. But on the other hand, I don't want to force everyone to go through a migration. | think the change is great, but is pretty unnecessary, since everything seems to work fine for most people. However, if you're running Errbit with a huge volume of errors from many different apps, then I have heard about a lot of problems with Mongo, and agree that Postgres seems to cope far better with this kind of use. I know MongoDB can be tweaked and tuned and whatever else, but Postgres seems to just work without much hassle.

If we move to Postgres, it will be very easy to support any SQL database. We could easily drop in support for MySQL or SQLite. So that would be great. Maybe that would increase Errbit's adoption and contributions.

Another factor is the Undev company who originally did the work. If we move to Postgres, maybe they would be more likely to contribute bug fixes, or continue improving Errbit?

Thanks @shingara for picking up Errbit maintenance recently, you've been doing a great job! So since you've been leading the project recently, I'll leave the final call up to you. If the move to Postgres will just leave us with a big mess, and you stop helping Errbit, and there are no more contributors, then of course it's totally not worth it. But if you are happy with the change, and it attracts the support of Undev and other companies, and more people are excited about contributing, then maybe it's worth a shot.

@boblail

This comment has been minimized.

Show comment
Hide comment
@boblail

boblail Dec 17, 2013

Contributor

I don't know what's best for the Errbit community, but—for whatever it's worth—I had three motives for pursuing this change:

  1. At my company, tools like Errbit run on VMs in-house. When those VMs are upgraded or restarted, I've found I have the most problems with Mongo and the fewest with Postgres. (This is only my experience; but it seems a common sentiment that Postgres is a lower-maintenance choice.)
  2. I pull metrics out of Errbit using its API. When we present a notice from the API, I'd like to present a complete picture (despite the details being stored in different collections). Pull Request #509, for example, required a multi-step manual join on Mongo.
  3. I am more familiar with ActiveRecord than with Mongoid. (Maybe Errbit will attract more contributors if it uses a more familiar ORM... but... who can say? I wouldn't submit a Pull Request for this reason.)

I regret the idea of putting Errbit users through a migration. I think Postgres is a better fit—technologically—for Errbit than Mongo. How much better? Is the change now worth it down the road? I don't know. But if you choose to go ahead with Postgres, I'm willing to help smooth out the migration-experience for other users.

Contributor

boblail commented Dec 17, 2013

I don't know what's best for the Errbit community, but—for whatever it's worth—I had three motives for pursuing this change:

  1. At my company, tools like Errbit run on VMs in-house. When those VMs are upgraded or restarted, I've found I have the most problems with Mongo and the fewest with Postgres. (This is only my experience; but it seems a common sentiment that Postgres is a lower-maintenance choice.)
  2. I pull metrics out of Errbit using its API. When we present a notice from the API, I'd like to present a complete picture (despite the details being stored in different collections). Pull Request #509, for example, required a multi-step manual join on Mongo.
  3. I am more familiar with ActiveRecord than with Mongoid. (Maybe Errbit will attract more contributors if it uses a more familiar ORM... but... who can say? I wouldn't submit a Pull Request for this reason.)

I regret the idea of putting Errbit users through a migration. I think Postgres is a better fit—technologically—for Errbit than Mongo. How much better? Is the change now worth it down the road? I don't know. But if you choose to go ahead with Postgres, I'm willing to help smooth out the migration-experience for other users.

@lowang

This comment has been minimized.

Show comment
Hide comment
@lowang

lowang Dec 17, 2013

AFAIK there are no adapters that support both, but switching to ActiveRecord can expand available databases above postgreesql to all mssql,mysql,sqlite,...
I've personally run into problems with underlying mongo when too many errors were collected, so I've added TTL index and documents are deleted after some time. This helped maintain speed, but it's not straightforward.
For me mongo is ok since already I'm running cluster anyway so another instance is not a problem.
I guess that supporting both adapters (mongoid & AR) will be too time consuming & complicated though.

lowang commented Dec 17, 2013

AFAIK there are no adapters that support both, but switching to ActiveRecord can expand available databases above postgreesql to all mssql,mysql,sqlite,...
I've personally run into problems with underlying mongo when too many errors were collected, so I've added TTL index and documents are deleted after some time. This helped maintain speed, but it's not straightforward.
For me mongo is ok since already I'm running cluster anyway so another instance is not a problem.
I guess that supporting both adapters (mongoid & AR) will be too time consuming & complicated though.

@kirs

This comment has been minimized.

Show comment
Hide comment
@kirs

kirs Dec 20, 2013

I'm looking forward for Errbit with psql support since 2011, that's a really big progress made by Undev team.
So we should drop and burn Mongo from this project.

kirs commented Dec 20, 2013

I'm looking forward for Errbit with psql support since 2011, that's a really big progress made by Undev team.
So we should drop and burn Mongo from this project.

@PlugIN73

This comment has been minimized.

Show comment
Hide comment
@PlugIN73

PlugIN73 Dec 20, 2013

It's awesome!!! Deal it! 👍

PlugIN73 commented Dec 20, 2013

It's awesome!!! Deal it! 👍

@phallstrom

This comment has been minimized.

Show comment
Hide comment
@phallstrom

phallstrom Dec 20, 2013

Contributor

👍 The only thing I use Mongo for is errbit. Being able to use PostreSQL would be great.

Contributor

phallstrom commented Dec 20, 2013

👍 The only thing I use Mongo for is errbit. Being able to use PostreSQL would be great.

@tamird

This comment has been minimized.

Show comment
Hide comment
@tamird

tamird Jan 12, 2014

this is heroic. 👍

tamird commented Jan 12, 2014

this is heroic. 👍

@emq

This comment has been minimized.

Show comment
Hide comment
@emq

emq Jan 14, 2014

Really awesome PR, but will this ever be merged?

emq commented Jan 14, 2014

Really awesome PR, but will this ever be merged?

@jozefvaclavik

This comment has been minimized.

Show comment
Hide comment
@jozefvaclavik

jozefvaclavik Jan 15, 2014

Contributor

We run several Rails apps with MongoDB backend. And only some with PostgreSQL and MariaDB.

@ndbroadbent I wouldn't say that once you support PostgreSQL you can use any other AR DB backend. This is true if you use basic AR features. Once you start using advanced SQL commands (even if you just think about using GROUP or BLOB), there is a difference between MySQL/MariaDB and PostgreSQL. You need to be really careful with your code then.

I agree that Errbit should stick with MongoDB as it started, but I wouldn't get mad if it would move to AR backend.

Contributor

jozefvaclavik commented Jan 15, 2014

We run several Rails apps with MongoDB backend. And only some with PostgreSQL and MariaDB.

@ndbroadbent I wouldn't say that once you support PostgreSQL you can use any other AR DB backend. This is true if you use basic AR features. Once you start using advanced SQL commands (even if you just think about using GROUP or BLOB), there is a difference between MySQL/MariaDB and PostgreSQL. You need to be really careful with your code then.

I agree that Errbit should stick with MongoDB as it started, but I wouldn't get mad if it would move to AR backend.

@realmyst

This comment has been minimized.

Show comment
Hide comment
@realmyst

realmyst Feb 3, 2014

Hello guys, what about merging this PR?

realmyst commented Feb 3, 2014

Hello guys, what about merging this PR?

@arthurnn

This comment has been minimized.

Show comment
Hide comment
@arthurnn

arthurnn Feb 3, 2014

Member

@realmyst , thanks for the amazing work on this.. However we are not merging this. as discussed above people are using errbit in production for a while already, and I dont fell like changing that drastically would cause that many benefits.
Anyhow, thanks for your help/support.

Member

arthurnn commented Feb 3, 2014

@realmyst , thanks for the amazing work on this.. However we are not merging this. as discussed above people are using errbit in production for a while already, and I dont fell like changing that drastically would cause that many benefits.
Anyhow, thanks for your help/support.

@arthurnn arthurnn closed this Feb 3, 2014

@kirs

This comment has been minimized.

Show comment
Hide comment
@kirs

kirs Feb 8, 2014

@arthurnn that's too sad 😕

kirs commented Feb 8, 2014

@arthurnn that's too sad 😕

@bf4

This comment has been minimized.

Show comment
Hide comment
@bf4

bf4 Feb 9, 2014

@arthurnn Would it be possible to have a postgresql branch that @boblail or others could maintain within the repo as a compromise for including work that's probably going to be done anyway in a way that reduces friction or unnecessary forking?

bf4 commented Feb 9, 2014

@arthurnn Would it be possible to have a postgresql branch that @boblail or others could maintain within the repo as a compromise for including work that's probably going to be done anyway in a way that reduces friction or unnecessary forking?

@bf4

This comment has been minimized.

Show comment
Hide comment
@bf4

bf4 Feb 9, 2014

I should add that the sooner some compromise is reached, the better, so as to reduce divergence of the work in this PR from master, else, a reasonable goal of being able to switch out the backend would be lost, aka No 12-factor for you.

bf4 commented Feb 9, 2014

I should add that the sooner some compromise is reached, the better, so as to reduce divergence of the work in this PR from master, else, a reasonable goal of being able to switch out the backend would be lost, aka No 12-factor for you.

@boblail

This comment has been minimized.

Show comment
Hide comment
@boblail

boblail Feb 28, 2014

Contributor

FWIW, I'm willing to maintain a Postgres fork of Errbit (concordia-publishing-house/errbit).

Contributor

boblail commented Feb 28, 2014

FWIW, I'm willing to maintain a Postgres fork of Errbit (concordia-publishing-house/errbit).

@darkleaf

This comment has been minimized.

Show comment
Hide comment
@darkleaf

darkleaf Feb 28, 2014

@boblail, I full time errbit developer from Undev company.
I push our features to https://github.com/Undev/errbit.
Please see https://github.com/Undev/errbit/blob/master/CHANGELOG.md

darkleaf commented Feb 28, 2014

@boblail, I full time errbit developer from Undev company.
I push our features to https://github.com/Undev/errbit.
Please see https://github.com/Undev/errbit/blob/master/CHANGELOG.md

@shingara

This comment has been minimized.

Show comment
Hide comment
@shingara

shingara Feb 28, 2014

Member

please don't use the name of errbit alone. In your fork.

Member

shingara commented Feb 28, 2014

please don't use the name of errbit alone. In your fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment