Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Shardon is a schema-less mysql storage solution with table sharding support, used for store objects entities.

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 examples
Octocat-spinner-32 libs
Octocat-spinner-32 scripts
Octocat-spinner-32 test
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.md
Octocat-spinner-32 index.js
Octocat-spinner-32 package.json
README.md

Shardon

!Under developing please dont use the code.

TODOS:

  1. events driven
  2. order options
  3. table creating scripts
  4. mock testing
  5. api documents
  6. adding async
  7. populate user
  8. pooling weight support

Scenario

  1. You are familiar with and believe in mature MYSQL stack and its community, so you dont like MongoDB or other NoSQL stack.
  2. Your data is user oriented, almost belongs to your users, like pins,todos,feeds,notifications,clips,follows, and so on.
  3. You could have millions of users, and trillions of user objects, like Pinterest who knows.
  4. Your are iterating your products, so you need a free schema storage.

What?

Shardon is a schema-less mysql storage solution with sharding support, used for store objects entities, like feeds, comments, lists, users, moods, todos, logs, and activities. (watch head: Normally sharding data will loss convieniant features like joining, this may limit the product function, and stats must run on many tables and dbs;)

Scale Method

Shardon is a database and table level sharding solution, sharding objects into 1000 tables(fixed) and many dbs.

Details:

1. split objects and mappings into 1000 tables by user_id;
2. plan your db pool size and ranges, each db take care of a contineous range of the 1000 tables;
3. create 1000 tables in each db;(or you can create tables seperately);

your dbs maybe:

- User DB and other simple tables
- ShardonDB1[0,200]     -slave
- ShardonDB2[201,400]   -slave
- ShardonDB3[401,600]   -slave
- ShardonDB4[601,800]   -slave
- ShardonDB5[801,1000]  -slave

Shardon Details

Shardon defines data model as object, mapping, index and entity.

UUID

Since tables are sharded, db can not have constraint to generate a unique id. So twitter makes snowflake to generate id numbers. But since generating ids is easy and dont want to add dependencies, we write a simple one.

uuid() will produce a bigint, which contains timestamp(for ordering), shard info(for tables hint), and some random bits.

id composement: 10digit [second precision timestamp] + 4digit[shard_id] + 5digit[random noisy];

I think it is hard to have id collions.

Objects

the objects have something in commen:

  1. they have a user-id, and objects should always belong to a user;
  2. they are in huge quantities, maybe billions.
  3. they will have a object-id(uuid), for indexing and finding;
  4. they need be extended freely, for iterating development;
  5. they could be put into collections, for data operation(mapping,reducing stuff);
  6. have serveral indice, for fetching collections.[see mappings]
  7. they have time orders;

all objects have same table structure, except table name.Like below:

CREATE TABLE todo_0243 (
      local_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,/*-- for mysql ordering --*/
      object_id BIGINT(23) NOT NULL,/*-- uuid() --*/
      user_id INT NOT NULL,/*-- objects must have user_id --*/
      updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,/*-- timestamp  --*/
      body VARCHAR(4000),/*-- data payload, we use a json varchar to persit data  --*/
      UNIQUE KEY (object_id),/*--index for finding by object-id --*/
      KEY (user_id),/*--index for finding by user-id --*/
      KEY (updated)/*--index for time limit and ordering --*/
) ENGINE=InnoDB;

Shardon use fixed table structrue, so loss its indexing facility:

  1. You can find objects by user-id or object-id;
  2. You can not find by other-id with objects.(Like finding comments by post-id).
  3. You can not find by objects by property.(Like finding posts by slug)

So we provide Indexings and Mappings to make 2,3 possible, and make adding indexes with 0 downtimes.

Mappings

When you need setup a mapping set between two objects, you should define a Mapping. Mapping is a one-to-one or one-to-many single direction projection between two objects. Like user-id mapping to a user-id for following relationship. Like user-id mapping to a list-id for liking relationship. Like post-id mapping to a comment-id for having relationship.

mapping table structure:

  CREATE TABLE posts_have_comments_0241 (
      from_id BIGINT(23) NOT NULL,
      to_id BIGINT(23) NOT NULL UNIQUE,
      updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,/*-- timestamp  --*/
      KEY (updated)/*--index for time limit and ordering --*/
      PRIMARY KEY (post_id, comment_id)
  ) ENGINE=InnoDB;

When user loads a blog page: SQLs running like:

  1. select body from user where id = %user-id%;
  2. select post_id from user_has_posts where user_id = %user-id%;
  3. select body from post_1111 where object_id in (1367191739799000190 1367191739799000191);
  4. select comment_id from posts_have_comments_1111 where post_id = 1367191739799000190;
  5. select comment_id from posts_have_comments_1111 where post_id = 1367191739799000191;
  6. select body from comments_1111 where object_id in (above result);
  7. select body from comments_1111 where object_id in (above result);
  8. cache it !

(tips: users data is always bind to same shard )

Think you release a feature, adding a post object property read_count, so you can find hot posts (of a user). Normally you should change table structure, add a field, and index it, which can bring production with a downtime. But now, you just need create a mapping table between post-id and read_count,and then enable the code.

shortcomings:

  1. when increasing read_count of a post, you need update both post object and posts_have_readcounts mapping.
  2. removing objects need both side deleting;

However you can find by user-id with easy, cause Shardon embed user_id into object table for convenient.

Indexings

Objects should be able to find by its properties.

3.Entities Entities are like objects, but not sharding on user_id. It is not user oriented. Like lyrics and other big resource.So developer should define specific function to calculate shard id.

CREATE TABLE lyric_0243 (
      local_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,/*-- for mysql ordering --*/
      object_id BIGINT(23) NOT NULL,/*-- uuid() --*/
      updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,/*-- timestamp  --*/
      body VARCHAR(4000),/*-- data payload, we use a json varchar to persit data  --*/
      UNIQUE KEY (object_id),/*--index for finding by object-id --*/
      KEY (updated)/*--index for time limit and ordering --*/
) ENGINE=InnoDB;

Simple Table Management

Sharding table will make managing table an annoying work. Shardon provides:

  1. keep same type tables having same table structure, for abstracting and mantainance.
  2. Objects,Mappings Table generators;

!But when moving data between two dbs(change pool size), users still have to do table copy manually.

Scaling Up

Shardon is a simple plan, starting with single mysql instance.If you cache data properly, there is no much presure on db side. However some how, when user data are getting bigger and bigger, slaves will have a big sync latency, then full data in every db will be a threshold.

solution: Then you should seperate different types of tables into deferent Mysql databases. Like posts store in post-database,comments objects store in comment-database;

(Watch Head: Isolation data will make system less stable.)

codes:

var posts_shardons    = new Shardon({ mysql : "username:password@host/posts_database" });
var comments_shardon = new Shardon({ mysql : "username:password@host/comments_database" });

var Posts = posts_shardon.objects('post',{
      "user_id":{check:"isNumber",notNull:true},
      "username":{check:"isString",notNull:true},
      "user":{check:"isObject",notNull:true},
      "text":{check:"isString",defaults:""},
  },{ shard_size : 100 });

var Comments = comments_shardon.objects('comment',{
      "user_id":{check:"isNumber",notNull:true},
      "username":{check:"isString",notNull:true},
      "user":{check:"isObject",notNull:true},
      "text":{check:"isString",defaults:""}
},{ shard_size : 100 });




Notes:

  1. user table is better not sharded.

    1. you have not so many users.
    2. keep name,email a unqiue constraint.
    3. caching is better.
  2. re-Sharding data is difficult, so make a better and long-term planing!

    1. Shardon will shard your data into 1000 tables, so if you create a feed table, there will be feed_0000 - feed_0999, 1000 tables out there.
    2. you can change the shard size by creating Shardon Model with size parameter;
    3. table within 10000k rows has an acceptable performance.
    4. too many shards will cause your table scripts and stats lower efficiency.

Install Shardon

APIs

var shardon = new Shardon([{
      type     : 'shard01',
      shard    : [0,300],
      host     : 'localhost',
      user     : 'root',
      password : 'root',
      database : 'shardon'
},{
      type     : 'shard02',
      shard    : [301,600],
      host     : 'localhost',
      user     : 'root',
      password : 'root',
      database : 'shardon'
},{
      type     : 'shard03',
      shard    : [601,1000],
      host     : 'localhost',
      user     : 'root',
      password : 'root',
      database : 'shardon'
}]);


//  profiles_1000 - profiles_0999
var Profiles = shardon.objects('profile',{
    "user_id":{check:"isNumber",notNull:true},
    "username":{check:"isString",notNull:true},
    "created_on":{check:"isDate",notNull:true},
    "user":{check:"isObject",notNull:true},

    "email":{check:"isString",defaults:""},
    "commits":{check:"isArray",defaults:[]},
    "is_display":{check:"isBoolean",defaults:true},
    "hobby":{check:"isString",defaults:""}
},{ shard_size : 1000 });

// create a object
Profiles.save({email:"navy9wf@gmail.com",
    username:'Frank',
    user:{username:'Feng'},
    user_id:13701,
    created_on:new Date()
    }, function(err, profile, result){

    if(err){
      done(err);
      return;
    }

    if(result.affectedRows == 1){
      done();
    }else{
      done(new Error('mysql error'));
    }
});



Inspired by

This idea is inspired by FriendFeed & Pinterest, so it is battle tested.

Winter is comming

Even you have large databases pools and databases are in diferent types, multi api end-points, redis caches every where, your loads winter is comming? May be you should reshard your data to a more heave one.

Something went wrong with that request. Please try again.