Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Shard? #59

Closed
rnavarro opened this issue Feb 25, 2014 · 3 comments
Closed

Split Shard? #59

rnavarro opened this issue Feb 25, 2014 · 3 comments

Comments

@rnavarro
Copy link

Hello,

Are there any tools or is there support for splitting a shard?

Say I have a single MySQL server to start and I want to split that up into two smaller shards of data, how would I go about that?

@greenlion
Copy link
Owner

You can probably write a simple script to do the following:
note: You can only do this if you are using the directory mapper.

Identify how you want to split the shard. For example, if you want to split a shard into even/odd components, you would select which half you want to move. Arbitrarily, I'll pick moving the odd numbered keys to another. You could even just move a single key.

On the directory, get a list of the shard keys to move:
[pseudo-code]
FOR EACH $key_value in (SELECT key_value FROM shard_map WHERE MOD(key_value,2) = 1)
START XA TRANSACTION on source, dest and directory
FOR EACH sharded_table as $table_name
SOURCE: select * from $table_name where shard_key =$key_value INTO OUTFILE ...
DEST: load data infile ....
SOURCE: delete from $table_name where shard_key = $CU
DIRECTORY: UPDATE mapper SET shard_id = $dest_shard_id WHERE key_value = $key_value
XA COMMIT

Let me know if you have questions, or if you want to sponsor development of a tool to split shards.

The XA transaction makes sure that any COUNT(*) for other queries done during the movement return correct results.

@greenlion
Copy link
Owner

To split an existing non shard-query server, use 'mysqldump' to dump into a flat file, then use the Shard-Query loader to reload it. That will do the splitting for you.

Basically load the data into Shard-Query as if it was just a dump from an online data source, etc.

The loader is in the bin/ folder.

You need a loader.spec file. It looks like:
[default]
delimiter=","

[table_name]
file=/path/to/file.txt

You can use globs for the file name if you have multiple files for the same table. You can specify the same table multiple times if you have different paths to load.

cd bin
php loader --spec=loader.spec

That will fire of a bunch of loader jobs. You should only run loader workers on a single node unless you have a shared filesystem. Run them on the same node you invoke the bin/loader script from. If you have a shared filesystem, place the files there, and make sure the path to the filesystem is the same on all nodes running loader workers.

Run bin/update_jobs_table to check on the status of the jobs. It will stop producing output when the jobs are completed.

Please let me know if you have problems.

@greenlion greenlion reopened this Feb 26, 2014
@greenlion
Copy link
Owner

I noticed that you are in Santa Clara. If you want to meet up sometime for coffee and talk about your data and Shard-Query, I'd be happy to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants