Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider creating tables with replication by default #4608

Open
danielmewes opened this issue Jul 31, 2015 · 9 comments

Comments

Projects
None yet
5 participants
@danielmewes
Copy link
Member

commented Jul 31, 2015

The idea is to have a replication factor of 2 or even 3 enabled by default when creating a new table through table_create, assuming there are enough servers in the cluster.

I think most people who run on a cluster will want to use replication, and changing the default makes it a little harder to forget setting it up and losing data in the end when a server dies.

Putting into 2.2 for discussion.

@danielmewes danielmewes added this to the 2.2 milestone Jul 31, 2015

@RubenKelevra

This comment has been minimized.

Copy link

commented Aug 12, 2015

I'm very new to rethinkDB but I would like to set a minimum replication for my cluster. So all new tables have the minimum-level on creation.

Else it would be great to create defaults per database / per cluster for sharding and replication.

@danielmewes danielmewes modified the milestones: subsequent, 2.2 Aug 27, 2015

@j-c-m

This comment has been minimized.

Copy link

commented Oct 16, 2015

I would like to see this with a default of min(3,servers) so it could handle automatic failover by default.

Maybe a separate issue, but also worth considering is if shards should default to min(32,servers).

The possibility to also store more complex clusterwide defaults to tablecreate options in a system table I think would also be good. I would be much happier storing a default for instance of

{ shards: 12, 
replicas: { us1: 3, us2: 3, eu1: 3 }, 
primary_replica_tag: 'us1' }

Compared to having to solely rely on operational process around table and database creation.

@WesleyDavid

This comment has been minimized.

Copy link

commented Jan 13, 2016

Just a note to add a voice to this conversation:

In my scenario I keep track of many RethinkDB clusters for customers, and each cluster is spread across at least three physical hosts. Customers are responsible for their management of the actual data, so I don't cross the threshold into the running Rethink process.

In this case, customers that are unfamiliar with RethinkDB defaults had created tables, but not explicitly configured sharding and replication. The default 1 shard and 1 replica option was used for most, if not all tables. One physical host had an outage that resulted in all data being lost on that host. All customers that had tables residing only on that host had data loss. In discussing with customers, it appears that the expectation of most people that aren't deeply familiar with RethinkDB is: "Well, it's a cluster so it must replicate data, right? Right!"

A default of 2 replicas, if there's enough nodes, would be a positive thing IMO.

@danielmewes

This comment has been minimized.

Copy link
Member Author

commented Jan 13, 2016

Thanks for adding your experience @WesleyDavid .

Unfortunately the solution for this issue got delayed a bit because we have been working on a bunch of other important features/improvements. I still think that we should change the default though.

@danielmewes

This comment has been minimized.

Copy link
Member Author

commented Jan 13, 2016

I'd probably prefer a default of 3 rather than 2 to make automatic failover work, like @j-c-m mentioned.

@WesleyDavid

This comment has been minimized.

Copy link

commented Jan 13, 2016

That's an interesting note about 3 rather than 2 as a default for table replication (where there are at least 3 nodes available, of course). Here's another experience that was had during the outage of one shared host: Several deployments that had 3 nodes only replicated a table across 2 of them. As a result, the table wasn't writable. Data was safe, but it took some manual intervention to resolve. Again, I think the ultimate solution is for folks to simply be more familiar with their data, database engine, and infrastructure as a whole, but... alas. 😄

@danielmewes danielmewes modified the milestones: 2.4-polish, subsequent Apr 29, 2016

@VeXocide

This comment has been minimized.

Copy link
Member

commented Jun 10, 2016

I agree that min(3, number_of_servers) is the way to go.

@danielmewes

This comment has been minimized.

Copy link
Member Author

commented Jun 22, 2016

Marking settled with min(3, number_of_servers) if no other number of replicas is specified during table creation. Sharding will still be manual.

We should keep in mind that this will likely break a bunch of tests which will need to be updated.

@RubenKelevra

This comment has been minimized.

Copy link

commented Jun 22, 2016

I still think it would be nice if it's possible to set a default value on cluster level and on database-level for each table which is going to be created. Especially if you create often some tables this would be a neat feature if sharding/replication-defaults can be set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.