filestore

Emil Sit edited this page Aug 24, 2012 · 1 revision
Clone this wiki locally

Filestore tutorial

Filestore tutorial

"Filestore" is a basic Chord application that can store and retrieve files from a Chord ring. It demonstrates how to use DHash to store blocks, and how to write programs with libasync. The source code for filestore can be found in the Chord source code in sfsnet/devel/filestore.C

libasync

libasync helps you write programs that perform actions asynchronously. Actions that might take some time, such as waiting for a network packet, use a callback to notify the program.

DHash

You can use DHash to store and retrieve blocks up to one megabyte in size. Using the libasync interface to DHash requires you to use callbacks when storing and retrieving blocks. To store a block, you must provide the data along with a callback. The callback is called with status information once the store has completed. To retrieve a block, you must provide the block's identifier along with another callback. This callback is called with the contents of the block.

DHash has several different types of blocks. For this filestore application, we are using "content-hash" blocks. This type of block uses the SHA1 hash of the block's contents as an identifier for the block. You can use this identifier to retrieve the block from DHash.

File storage format

The file storage format is fairly simple. Files are split up into 16,384-byte "data blocks." They are stored in DHash as content-hash blocks. "Inode blocks" contain the file's name and length, and a list of hashes for "indirect blocks." The indirect blocks contain a list of hashes for the file's data blocks. The file is identified by the hash of the inode block.

This simple, fixed, three-layer scheme permits files up to 10,815,307,776 bytes. For files smaller than 16,384 bytes, it has an overhead of one inode block of 280 bytes, and one indirect block of 20 bytes. The users can view the filename and file length before attempting to retrieve the file, so they can avoid retrieving files that are larger than they want.

Examples of use

storing a file:
> ./filestore ../lsd/csock -s some_file
e34cb4fa86a073f390b224f637b90db88f82db06

listing inode info:
> ./filestore ../lsd/csock -l e34cb4fa86a073f390b224f637b90db88f82db06
some_file: 19463682 bytes

fetching a file:
> ./filestore ../lsd/csock -f e34cb4fa86a073f390b224f637b90db88f82db06

Throttling

This version of filestore stores blocks in DHash without any delay between stores. Since filestore can generate stores quicker than DHash can actually store them, DHash buffers the stores. These buffered stores can take up a lot of memory, in fact an amount of memory proportional to the size of the file.

Buffering this data is unnecessary, because the filestore program can generate it easily on demand. In order to avoid buffering, filestore should "throttle" its storage requests. It could keep track of the number of times it has called "store", and subtract the number of times the "store completed" callback has been called. This is the number of blocks in flight. Filestore could control this number by not calling "store" when it is too large.

James Robertson