DB Sharding #1522

lvca · 2013-06-13T11:09:18Z

Even though auto-sharding is not available yet (will be in 2.0) by using a cross-database RID. So the classic RID is:

cluster-id:cluster-position, example #12:433

And the cross-db will be:

db:cluster-id:cluster-position, example #demo3:12:433

Now what to store as ? The db name? It could be a waste of space. It could be an id as 1 byte (0-127 databases or even 0-255 by playing with the unsigned bit) but where to map the database list?

kowalot · 2013-06-13T15:47:45Z

Do you consider #db as database on the same server or remote database as well?

lvca · 2013-06-13T15:53:38Z

Both: you provide an URL and OrientDB will make the rest.

kowalot · 2013-06-13T16:03:35Z

Cool! Questions in bulk :)
1.Are you going to support cross database queries like gremlin/SELECT/TRAVERSE traversing partially through remote database records?
2.Will I able to provide db# during creations (my custom sharding key)?
3.Will I able to send TX_COMMIT with mixed #dbs objs to one database?

lvca · 2013-06-13T16:14:06Z

yes, it will be transparent
this is an open point. We could use the orientdb-server-config.xml
No. This would mean implementing a 2 phase commit. This will be supported in the future but not right now

kowalot · 2013-06-13T16:21:25Z

Great. Thanks for info. I look forward to test first workable commits.
Ad 2. Would be nice to have it for cases like CustomersA on ServerA or UsersID<1000000 on ServerA

lvca · 2013-06-13T16:22:47Z

Exactly. It will be a responsibility of the application to find the best way to cluster data.

laa · 2013-06-13T17:24:48Z

Hi, I will add my five cents.
2 phase coomit is not durable and slow I think better to use Paxos for such kind of functionality http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf

lvca · 2013-07-08T10:41:16Z

Hey guys, I don't like this solution anymore, because obliges us to migrate (=update) all the RIDs when you move records between nodes. I prefer manual sharding support in current configuration, by specifying that some record clusters haven't to be replicated but rather it's the owner of the data.

kowalot · 2013-07-09T16:29:36Z

Hi Luca, In what case do you expect moving a record between nodes? Rebalancing nodes in sharding?

lvca · 2013-07-09T16:31:19Z

Yes, rebalancing. This will be done manually through a new command but there can be pluggable strategies.

kowalot · 2013-07-09T17:39:33Z

This is interesting view on sharding and repartitions in SQL Azure
I started to build my custom sharding based on that pattern, but I decided to wait when I noticed cross database feature on your roadmap. Maybe it will inspire you somehow. Cheers.

lvca · 2013-07-25T12:30:25Z

Postponed.

lvca · 2013-08-05T16:10:52Z

Renamed as "DB Sharding" has been planned for 1.6

lvca · 2013-08-10T16:41:37Z

For cross-links I've just created a new issue as federated databases #1597 planned for 2.0

lvca · 2013-08-10T17:28:24Z

This will allow to span database portions across multiple servers using the record cluster granularity. This works pretty well with OrientDB Object Orientation concepts. Example. This is the default:

Class Customer -> Cluster Customer

But if you assign:

Class Customer -> Clusters [ Customer_USA, Customer_Europe ]

OrientDB will use both clusters when you refer to the class "Customer". Well, with the new sharding feature if you assign Customer_USA to the server 1 and Customer_Europe to the server 2 you can work against the generic "Customer" class or single clusters.

You could also work only at class level by creating sub classes:

Customer <|--- Customer_USA, Customer_Europe

In this case OrientDB already creates 3 different clusters. So this query:

select from customer -> will retrieve both customers from both servers
select from customer_usa -> will retrieve only customer from usa server

So this isn't auto-sharding because you've to design how to split the database in efficient way but this is the best solution even when auto-sharding will be ready because it's very hard to create a generic algorithm that shard the database in all the different use cases.

Take a look at this tutorial https://github.com/orientechnologies/orientdb/wiki/Tutorial%3A-Clusters, but think that the cluster can reside only on certain servers.

lvca · 2013-08-14T18:59:17Z

Ok, probably we need to review the replication to address a simple configuration and again big power. I was thinking to change default cfg in this way:

{
    "synchronization" : true,
    "clusters" : {
        "internal" : { "synchronization" : false },
        "index" : { "synchronization" : false },
        "ODistributedConflict" : { "synchronization" : false },
        "*" : {
            "synchronization" : true,
            "master" : "$auto",
            "synchReplicas" : { "total" : 2, "nodes" : [] }
            "asynchReplicas" : { "total" : $all, "nodes" : [] },
        }
    }
}

synchReplicas.total means 2 nodes must have the database in synch. So the master will be chosen from the list in synchReplicas.nodes. This is the list of node names populated as soon as the server nodes connect. So After the first node "europe0" starts up the cfg will change in:

{
    "synchronization" : true,
    "clusters" : {
        "internal" : { "synchronization" : false },
        "index" : { "synchronization" : false },
        "ODistributedConflict" : { "synchronization" : false },
        "*" : {
            "synchronization" : true,
            "master" : "$auto",
            "synchReplicas" : { "total" : 2, "nodes" : ["europe1"] }
            "asynchReplicas" : { "total" : $all, "nodes" : [] },
        }
    }
}

When the 2nd "usa1" node joins:

{
    "synchronization" : true,
    "clusters" : {
        "internal" : { "synchronization" : false },
        "index" : { "synchronization" : false },
        "ODistributedConflict" : { "synchronization" : false },
        "*" : {
            "synchronization" : true,
            "master" : "$auto",
            "synchReplicas" : { "total" : 2, "nodes" : ["europe1", "usa1"] }
            "asynchReplicas" : { "total" : $all, "nodes" : [] },
        }
    }
}

This is because until all the synch nodes aren't filled the server assumes the right role. The 3rd server that joins, europe2, changes the cfg in:

{
    "synchronization" : true,
    "clusters" : {
        "internal" : { "synchronization" : false },
        "index" : { "synchronization" : false },
        "ODistributedConflict" : { "synchronization" : false },
        "*" : {
            "synchronization" : true,
            "master" : "$auto",
            "synchReplicas" : { "total" : 2, "nodes" : ["europe1", "usa1"] }
            "asynchReplicas" : { "total" : $all, "nodes" : ["europe2"] },
        }
    }
}

Because the synch node list is full and asynch is $all that means all the others.

lvca · 2013-09-20T10:45:51Z

This is the definitive format:

{
    "replication": true,
    "clusters": {
        "internal": {
            "replication": false
        },
        "index": {
            "replication": false
        },
        "ODistributedConflict": {
            "replication": false
        },
        "*": {
            "replication": true,
            "writeQuorum": 2,
            "partitioning": {
                "strategy": "round-robin",
                "default": 0,
                "partitions": [
                    [ "<NEW_NODE>" ]
                ]
            }
        }
    }
}

…eouts based on number of nodes and command types. working to issue #1522 about sharding. Fixed problem on triggers: now are executed on master and remote nodes depending by trigger

ghost assigned lvca Jun 13, 2013

lvca mentioned this issue Aug 13, 2013

Distributed transaction phase 1 #1580

Closed

lvca closed this as completed Oct 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DB Sharding #1522

DB Sharding #1522

lvca commented Jun 13, 2013

kowalot commented Jun 13, 2013

lvca commented Jun 13, 2013

kowalot commented Jun 13, 2013

lvca commented Jun 13, 2013

kowalot commented Jun 13, 2013

lvca commented Jun 13, 2013

laa commented Jun 13, 2013

lvca commented Jul 8, 2013

kowalot commented Jul 9, 2013

lvca commented Jul 9, 2013

kowalot commented Jul 9, 2013

lvca commented Jul 25, 2013

lvca commented Aug 5, 2013

lvca commented Aug 10, 2013

lvca commented Aug 10, 2013

lvca commented Aug 14, 2013

lvca commented Sep 20, 2013

DB Sharding #1522

DB Sharding #1522

Comments

lvca commented Jun 13, 2013

cluster-id:cluster-position, example #12:433

db:cluster-id:cluster-position, example #demo3:12:433

kowalot commented Jun 13, 2013

lvca commented Jun 13, 2013

kowalot commented Jun 13, 2013

lvca commented Jun 13, 2013

kowalot commented Jun 13, 2013

lvca commented Jun 13, 2013

laa commented Jun 13, 2013

lvca commented Jul 8, 2013

kowalot commented Jul 9, 2013

lvca commented Jul 9, 2013

kowalot commented Jul 9, 2013

lvca commented Jul 25, 2013

lvca commented Aug 5, 2013

lvca commented Aug 10, 2013

lvca commented Aug 10, 2013

lvca commented Aug 14, 2013

lvca commented Sep 20, 2013