Skip to content

Getting Started with the WDT command line

Laurent Demailly edited this page Jun 14, 2017 · 10 revisions

This short guide assumes you already built and installed the software and have "wdt" the command line in your prompt on both the source and destination machines.

You can always run wdt --help | less to see all the options - the section below highlights the most important ones, but let's cover some simple examples first

Everything in 1 command:

To securely start receiver and sender and transmit the encryption key and use the new encryption feature (on by default) and the command line:

# For a third host:
ssh dsthost wdt -directory destdir | ssh srchost wdt -directory srcdir -
# Or from src host:
ssh dsthost wdt -directory destdir | wdt -directory srcdir -
# Or from dst host:
wdt -directory destdir | ssh srchost wdt -directory srcdir -
# if the hostname of the receiver (dst) isn't resolvable on the sender,
# set the ip where the dst is reachable directly - e.g.
wdt -hostname fe80::202:c9ff:fe51:dd50 -directory destdir | ssh srchost wdt -directory srcdir -

When a sender is started with - as the argument it will read the connection url from stdin (which the receiver produces). It's a safe way to transmit the encryption key such as it doesn't show up when doing "ps" for instance.

The first 3 examples assume that when running on the destination (dst) host, when the receiver code calls gethostname() it will return a usable hostname that the sender on the source (src) host can resolve to connect to the receiver. If that's not the case the hostname should be set in the url to the working, reachable IP of the receiver. This can be done using the new -hostname option to override the default behavior by setting it in the WdtTransferRequest.

Separately/step by step

Instead of the URL you can use --transfer_id=none --encryption_type=none

Receiver

Start a receiver (destination for your transfer) - if you don't specify the directory it will write in the current directory

dst$ $ wdt -directory /tmp/dest1 -start_port 0 # this tells wdt to use system available ports

I1017 14:19:57.514104 3435574 WdtFlags.cpp:64] Running WDT 1.21.1510120 p 21
I1017 14:19:57.514441 3435574 Receiver.cpp:121] WDT Receiver 1.21.1510120 p 21
I1017 14:19:57.514746 3435574 WdtBase.cpp:450] Generated a transfer id 143585550
I1017 14:19:57.514755 3435574 WdtBase.cpp:423] using wdt protocol version 21
I1017 14:19:57.515416 3435574 Receiver.cpp:171] Registered 8 sockets
I1017 14:19:57.515432 3435574 Receiver.cpp:184] Transfer id 143585550
I1017 14:19:57.515440 3435574 wdtCmdLine.cpp:214] Starting receiver with connection url
wdt://desthost1.facebook.com?ports=36062,36668,41666,45982,53835,55727,57051,60107&recpv=21&id=802755190
I1017 14:19:57.515494 3435574 Receiver.cpp:437] Starting (receiving) server on ports [ 43133 43137 32838 48109 53941 43812 45998 42775 ] Target dir : /tmp/dest1
I1017 14:19:57.515511 3435574 FileCreator.cpp:283] dir already exists /
I1017 14:19:57.515522 3435574 FileCreator.cpp:283] dir already exists /tmp/
I1017 14:19:57.515604 3435574 FileCreator.cpp:285] made dir /tmp/dest1/
I1017 14:19:57.515626 3435574 WdtBase.cpp:438] Throttling not enabled
I1017 14:19:57.516073 3435583 Receiver.cpp:390] Progress reporter updating every 20 ms

The lines starting with I (or E if there were errors) are google log on stderr - you can send those to a file if it's a bother/you don't want to track what wdt is doing using -logtostderr=false

The important line is wdt://desthost1.facebook.com?ports=36062,36668,41666,45982,53835,55727,57051,60107&recpv=21&id=802755190 it is the first line on stdout (can be captured/read from python for instance using:

receiver_process = subprocess.Popen(receiver_cmd, stdout=subprocess.PIPE)
connection_url = receiver_process.stdout.readline().strip()

Sender

Start a matching sender (with the URL produced by the receiver) - if you don't specify a directory or patterns or manifest, it will recursively send all the files in the current directory:

It's important to quote/escape the url as it contains & which otherwise would be interpreted by the shell

src$ wdt -directory to_send/ -connection_url "wdt://desthost1.facebook.com?ports=36062,36668,41666,45982,53835,55727,57051,60107&recpv=21&id=802755190"

I1017 14:40:08.939339 308838 WdtFlags.cpp:64] Running WDT 1.21.1510050 p 21
I1017 14:40:08.939723 308838 wdtCmdLine.cpp:189] Input url: wdt://desthost1.facebook.com?ports=36062,36668,41666,45982,53835,55727,57051,60107&recpv=21&id=802755190
I1017 14:40:08.939770 308838 Sender.cpp:230] WDT Sender 1.21.1510050 p 21
I1017 14:40:08.939785 308838 DirectorySourceQueue.cpp:126] Root dir now to_send/
I1017 14:40:08.939795 308838 WdtBase.cpp:423] using wdt protocol version 21
I1017 14:40:08.939818 308838 wdtCmdLine.cpp:246] Starting sender with details wdt://desthost1.facebook.com?ports=36062,36668,41666,45982,53835,55727,57051,60107&protocol=16&dir=to_send/&recpv=21&id=802755190
I1017 14:40:08.939842 308838 Sender.cpp:450] Client (sending) to desthost1.facebook.com, Using ports [ 36062 36668 41666 45982 53835 55727 57051 60107 ]
I1017 14:40:08.939894 308838 WdtBase.cpp:438] Throttling not enabled
I1017 14:40:08.940990 308839 DirectorySourceQueue.cpp:231] Exploring root dir to_send/ include_pattern :  exclude_pattern :  prune_dir_pattern :
I1017 14:40:08.942677 308839 DirectorySourceQueue.cpp:379] Number of files explored: 2, errors: false
I1017 14:40:08.942739 308848 Sender.cpp:1424] Progress reporter tracking every 20 ms
I1017 14:40:08.944162 308840 Sender.cpp:581] Connection took 1 attempt(s) and 0.00283575 seconds. port 36062
I1017 14:40:08.944185 308844 Sender.cpp:581] Connection took 1 attempt(s) and 0.00143332 seconds. port 53835
I1017 14:40:08.944167 308842 Sender.cpp:581] Connection took 1 attempt(s) and 0.0023244 seconds. port 41666
I1017 14:40:08.944172 308843 Sender.cpp:581] Connection took 1 attempt(s) and 0.00151401 seconds. port 45982
I1017 14:40:08.944161 308841 Sender.cpp:581] Connection took 1 attempt(s) and 0.0028604 seconds. port 36668
I1017 14:40:08.944183 308845 Sender.cpp:581] Connection took 1 attempt(s) and 0.00150234 seconds. port 55727
I1017 14:40:08.944170 308846 Sender.cpp:581] Connection took 1 attempt(s) and 0.00120563 seconds. port 57051
I1017 14:40:08.944166 308847 Sender.cpp:581] Connection took 1 attempt(s) and 0.00150423 seconds. port 60107
I1017 14:40:20.807262 308844 Sender.cpp:1251] Port 53835 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 96. Header kBytes = 0.424805 (0.000432134% overhead). Total bytes = 100663731. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 8.09137 Mbytes/sec
I1017 14:40:20.807298 308840 Sender.cpp:1251] Port 36062 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 90. Header kBytes = 0.421875 (0.000457764% overhead). Total bytes = 94372272. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 7.58473 Mbytes/sec
I1017 14:40:20.807297 308841 Sender.cpp:1251] Port 36668 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 96. Header kBytes = 0.424805 (0.000432134% overhead). Total bytes = 100663731. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 8.09036 Mbytes/sec
I1017 14:40:20.807359 308846 Sender.cpp:1251] Port 57051 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 96. Header kBytes = 0.424805 (0.000432134% overhead). Total bytes = 100663731. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 8.09145 Mbytes/sec
I1017 14:40:20.807400 308843 Sender.cpp:1251] Port 45982 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 96. Header kBytes = 0.422852 (0.000430147% overhead). Total bytes = 100663729. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 8.09121 Mbytes/sec
I1017 14:40:20.807402 308845 Sender.cpp:1251] Port 55727 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 96. Header kBytes = 0.426758 (0.00043412% overhead). Total bytes = 100663733. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 8.09122 Mbytes/sec
I1017 14:40:20.807427 308842 Sender.cpp:1251] Port 41666 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 96. Header kBytes = 0.420898 (0.00042816% overhead). Total bytes = 100663727. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 8.09063 Mbytes/sec
I1017 14:40:20.807446 308847 Sender.cpp:1251] Port 60107 done. Transfer status = OK. Number of blocks transferred = 6. Data Mbytes = 96. Header kBytes = 0.425781 (0.000433127% overhead). Total bytes = 100663732. Wasted bytes due to failure = 0 (0% overhead). Total throughput = 8.09117 Mbytes/sec
I1017 14:40:20.807512 308847 Sender.cpp:1220] Last thread finished 11.8677
[=================================================] 100% 64.2 Mbytes/s
I1017 14:40:20.820672 308838 Sender.cpp:422] Total sender time = 11.8677 seconds (0.00266432 dirTime). Transfer summary : Transfer status = OK. Number of files transferred = 2. Data Mbytes = 762. Header kBytes = 3.39258 (0.000434785% overhead). Total bytes = 799018386. Wasted bytes due to failure = 0 (0% overhead).
Total sender throughput = 64.2082 Mbytes/sec (64.2226 Mbytes/sec pure transfer rate)

If you would rather not use the url (and transfer it from the receiver to the sender, using ssh or thrift or some other rpc mechanism), you can set the transfer_id yourself on both side as well as the ports (default ports are 22356-22363 - ie from start_port to start_port+num_ports-1) and use -destination flag instead of the "-" argument or -connection_url on the sender side. You probably will need to disable encryption as well using --encryption_type=none

The receiver logs look like this:

I1017 14:40:08.943197 3484964 Receiver.cpp:560] New transfer started 1
[===============================================> ] 97% 63.0 0.0 Mbytes/s  I1017 14:40:20.805966 3484962 Receiver.cpp:508] Received done for all threads. Transfer session 1 finished
I1017 14:40:20.806210 3484963 Receiver.cpp:1407] Thread[2, port: 53835]  got ack for DONE. Transfer finished
I1017 14:40:20.806226 3484962 Receiver.cpp:1407] Thread[1, port: 41666]  got ack for DONE. Transfer finished
I1017 14:40:20.806232 3484967 Receiver.cpp:1407] Thread[6, port: 36668]  got ack for DONE. Transfer finished
I1017 14:40:20.806237 3484965 Receiver.cpp:1407] Thread[4, port: 36062]  got ack for DONE. Transfer finished
I1017 14:40:20.806301 3484964 Receiver.cpp:1407] Thread[3, port: 60107]  got ack for DONE. Transfer finished
I1017 14:40:20.806434 3484966 Receiver.cpp:1407] Thread[5, port: 45982]  got ack for DONE. Transfer finished
I1017 14:40:20.806440 3484961 Receiver.cpp:1407] Thread[0, port: 57051]  got ack for DONE. Transfer finished
I1017 14:40:20.806442 3484968 Receiver.cpp:1407] Thread[7, port: 55727]  got ack for DONE. Transfer finished
W1017 14:40:20.806602 3484968 Receiver.cpp:1522] Last thread finished. Duration of the transfer 11.8634
[=================================================] 100% 64.2 Mbytes/s
W1017 14:40:20.806731 3484960 Receiver.cpp:300] WDT receiver's transfer has been finished
I1017 14:40:20.806743 3484960 Receiver.cpp:301] Transfer status = OK. Number of blocks transferred = 48. Data Mbytes = 762. Header kBytes = 1.12695 (0.000144428% overhead). Total bytes = 799016066. Wasted bytes due to failure = 0 (0% overhead).

Key options

Extracted from --help, here are some of the most important options:

  • -avg_mbytes_per_sec (Target transfer rate in mbytes/sec that should be maintained, specify negative for unlimited) type: double default: -1

  • -enable_download_resumption (If true, wdt supports download resumption for append-only files) type: bool default: false

  • -include_regex (Regular expression representing files to include for transfer empty/default is to include all files in directory. If exclude_regex is also specified, then files matching exclude_regex are excluded.) type: string default: ""

  • -exclude_regex (Regular expression representing files to exclude for transfer, empty/default is to not exclude any file.) type: string default: ""

  • -follow_symlinks (If true, follow symlinks and copy them as well) type: bool default: false

  • -ignore_open_errors (will continue despite open errors) type: bool default: false

  • -ipv4 (use ipv4 only, takes precedence over -ipv6) type: bool default: false

  • -ipv6 (prefers ipv6) type: bool default: false

  • -num_ports (Number of sockets) type: int32 default: 8

  • -odirect_reads (Wdt can read files in O_DIRECT mode, set this flag to true to make sender read all files in O_DIRECT) type: bool default: false

  • -open_files_during_discovery (If >0 up to that many files are opened when they are discovered.0 for none. -1 for trying to open all the files during discovery) type: int32 default: 0

  • -overwrite (Allow the receiver to overwrite existing files) type: bool default: false

  • -prune_dir_regex (Regular expression representing directories to exclude for transfer, default/empty is to recurse in all directories) type: string default: ""

  • -resume_using_dir_tree (If true, destination directory tree is trusted during resumption. So, only the remaining portion of the files are transferred. This is only supported if preallocation and block mode are disabled) type: bool default: false

  • -start_port (Starting port number for wdt) type: int32 default: 22356 using 0 will generate available system ports

  • -abort_after_seconds (Abort transfer after given seconds. 0 means don't abort.) type: int32 default: 0

  • -connection_url (Provide the connection string to connect to receiver (incl. transfer_id and other parameters, preferably use the new "-" argument instead)) type: string default: ""

  • -destination (empty is server (destination) mode, non empty is destination host) type: string default: ""

  • -directory (Source/Destination directory) type: string default: "."

  • -encryption_type (Encryption type to use. WDT currently only supports aes128ctr and aes128ofb. A value of none disables encryption) type: string default: "aes128ctr"

  • -manifest (If specified, then we will read a list of files and optional sizes from this file, use - for stdin) - if you need to use this and the "-" argument to read the url together, do use "-fork" so the sender doesn't wait for the receiver to end before being to read the file info from stdin after the url. see the wdt_stdin_test.sh for example. type: string default: ""

  • -recovery_id (Recovery-id to use for download resumption) type: string default: ""

  • -run_as_daemon (If true, run the receiver as never ending process) type: bool default: false

  • -transfer_id (Transfer id. Receiver will generate one to be used (via URL) on the sender if not set explicitly) type: string default: ""

Clone this wiki locally