Mongoose's HDFS storage driver
The storage driver extends the Mongoose's Abstract NIO Storage Driver and uses the following libraries:
- hadoop-common
- hadoop-hdfs-client
- Authentification: simple
- SSL/TLS - TODO
- Item types:
data
path
- TODO
- Path listing input
- Automatic destination path creation on demand
- Data item operation types:
create
, additional modes:- copy
read
- full
- random byte ranges
- fixed byte ranges
- content verification
update
- full (overwrite)
- fixed byte ranges: append mode only
delete
noop
- Path item operation types (TODO):
create
, additional modes:- copy - ?
- concatenation - ?
read
delete
noop
Get the latest pre-built jar file which is available at:
http://repo.maven.apache.org/maven2/com/github/emc-mongoose/mongoose-storage-driver-hdfs/
The jar file may be downloaded manually and placed into the <USER_HOME_DIR>/.mongoose/<VERSION>/ext
directory of Mongoose to be automatically loaded into the runtime.
java -jar mongoose-<VERSION>.jar \
--storage-driver-type=hdfs \
--storage-net-node-addrs=<NODE_IP_ADDRS> \
--storage-net-node-port=<NODE_PORT> \
--storage-auth-uid=<USER_ID> \
...
docker run \
--network host \
emcmongoose/mongoose-storage-driver-hdfs \
--storage-net-node-addrs=<NODE_IP_ADDRS> \
--storage-net-node-port=<NODE_PORT> \
--storage-auth-uid=<USER_ID> \
...
docker run \
--network host \
--expose 1099 \
emcmongoose/mongoose-storage-driver-hdfs
--run-node
docker run \
--network host \
emcmongoose/mongoose-storage-driver-hdfs \
--load-step-node-addrs=<ADDR1,ADDR2,...> \
--storage-net-node-addrs=<NODE_IP_ADDRS> \
--storage-net-node-port=<NODE_PORT> \
--storage-auth-uid=<USER_ID> \
...
git clone https://github.com/emc-mongoose/mongoose-storage-driver-hdfs.git
cd mongoose-storage-driver-hdfs
./gradlew clean test
./gradlew clean jar
compile group: 'com.github.emc-mongoose', name: 'mongoose-storage-driver-hdfs', version: '<VERSION>'
Node's FS browser is available at default port #50070
HDFS default port #8020 (or 9000?)
- Run the pseudo distributed HDFS cluster
docker run -p 22022:22 -p 8020:8020 -p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 -d pravega/hdfs
-
Open the browser and check the HDFS share @ http://127.0.0.1:50070/explorer.html to observe the filesystem
-
Build the Mongoose HDFS storage driver jar either use the Docker image.
-
Put the HDFS storage driver jar into the Mongoose's
ext
directory either use the Docker image with HDFS support. -
Run some Mongoose test, for example:
java -jar mongoose-<VERSION>.jar \
--item-data-size=64MB \
--item-output-file=hdfs.files.csv \
--item-output-path=/test \
--load-op-limit-count=100 \
--storage-auth-uid=root \
--storage-driver-limit-concurrency=10 \
--storage-driver-type=hdfs \
--storage-net-node-addrs=<HADOOP_NAME_NODE_IP_ADDR> \
--storage-net-node-port=8020
The information below describes which particular methods are invoked on the endpoint in each case. The endpoint hereafter is a Hadoop FileSystem instance.
The item types data
and path
are supported.
token
type is not supported.
Operations on the data items type are implemented as file operations
Doesn't invoke anything.
The method create(Path, FsPerm, boolean, int, short, long, null)
is invoked with
calculated output buffer size. The returned FSDataOutputStream
is
used to write the data.
Uses both create
and open
methods to obtain output and input streams
Note: not supported as far as HDFS doesn't allow to concatenate to the new & empty destination object
concat(Path dst, Path[] srcs)
is invoked (doesn't return anything).
Note:
source files ranges concatenation is not supported.
open(Path f, int bufferSize)
is invoked. The returned
FSDataInputStream
instance is used then to read the data.
The same method used as above, because the FSDataInputStream
supports
the positioning needed for the partial read.
Random Ranges
Supported
Fixed Ranges
Supported
Supported
Not supported as far as FSDataOutputStream doesn't allow positioning.
Not supported except the append case.
Append
Supported
delete(Path f, false)
is invoked.
Operations on the path are implemented as directory operations
mkdirs(Path)
TODO
TODO
listFiles(Path f, false)
is invoked returning the RemoteIterator
instance which is used to iterate the directory contents.
delete(Path f, true)
is invoked.
Not supported