-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[New] Comparing Data Transfer Utilities #7116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New] Comparing Data Transfer Utilities #7116
Conversation
✅ Deploy Preview for nostalgic-ptolemy-b01ab8 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
|
||
| Along with how much data is being transferred, the kind of data and the destination are equally important (e.g., your data’s source and target destination, their respective file systems, and how much storage is required once the data is at rest). | ||
|
|
||
| For example, most end users only deal with traditionally structured file systems when using their local environment, with data organized hierarchically into folders. In contrast, block storage breaks up files into separate blocks and stores them separately, allowing for significant performance benefits. Object storage saves files in a non-hierarchical, self-contained “flat” format. This allows for increased scalability and sizable reductions in cost and complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think block storage/block devices stand in-contrast with file systems; file systems are implemented on block devices. So, I think this paragraph could use a bit of a rewrite, which I'll take a swing at in a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/guides/tools-reference/file-transfer/comparing-data-transfer-utilities/index.md
Show resolved
Hide resolved
|
|
||
| Different file types and sizes also call for different data transfer utilities. For example, server log files grow in size quickly but can be easily compressed, whereas media files (e.g., MP4 video files, WAV sounds files) are usually quite large, aren’t easily compressed, and typically do not change once saved. Your choice of data transfer utility will depend on what kind of files are being transferred. | ||
|
|
||
| Different file types also involve varying metadata considerations. Object storage is self-contained, so it includes both data and metadata for easy search and access and retrieval. Structured file system storage also includes extensive metadata; when right-clicking a file in your desktop operating system, you can view its create/modify time, file type, version number, and more. Alternatively, block storage is highly performant but provides no metadata capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the last comment, I don't think block storage is an alternative to file systems--they go together. Will also take a swing at editing this in a bit
|
|
||
| #### Pros and Cons | ||
|
|
||
| Rsync excels in use cases that require a proven method for transferring files and situations that call for extended scheduling and automation capabilities, such as using rsync with Linux cron jobs. However, it also requires familiarity with the command line, and it can be more system resource-intensive or slower as the number files copied increases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with how rsync is any slower or faster than other utilities for larger numbers of files, so I'm wary of making that assertion. Is there any documentation elsewhere that describes this?
|
|
||
| #### Use Cases | ||
|
|
||
| Rsync is ideal for syncing files between local and remote Linux machines in incremental backup and transfer use cases, as well as customizing large and complicated sync jobs with specific options and settings, like selecting a particular Linux shell to use in the transfer, or specifying files to analyze and exclude. Its delta-transfer algorithm reduces network traffic during syncs by only sending parts of a file that differ from the files on the recipient machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do Linux shells interact with how rsync is invoked? I'm not familiar with that
|
|
||
| ### Rclone | ||
|
|
||
| [Rclone](https://rclone.org/) is a command-line utility for syncing files and directories to and from different cloud storage providers, servers, and workstations. As its name implies, the utility shares many similarities with rsync, including the same Linux commands (e.g., cp, mv, mount, ls). However, rclone was specially designed for rudimentary file transfers between cloud servers and on-premises servers and workstations, whereas rsync can be configured for more sophisticated file synchronization capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says that rclone shares commands with rsync--but I don't think rsync has commands? It has options you can specify, but that's not really the same syntax as a command/subcommand
|
|
||
| #### Pros and Cons | ||
|
|
||
| Rclone supports leading cloud services like Akamai, Amazon S3, Microsoft OneDrive, Google Drive/Cloud Storage, Microsoft Azure Blob/File Storage, DropBox, and more. Rclone can also recover from interrupted connections during data transfers; however, it only supports unidirectional file synchronization. This means it can only copy files from source to destination in comparison to rsync’s delta validation feature for bidirectional file synchronization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the delta algorithm for rsync means it supports bidirectional file syncing--it just means that it won't attempt to sync files that already exist on the destination. I found some discussions about how to invoke rsync on both the source and the destination to emulate a bidirectional sync, but that's not really the same as a built-in bidirectional sync, and people in these discussions often recommend using another tool like unison:
https://www.resilio.com/blog/rsync-two-way
https://stackoverflow.com/questions/2936627/two-way-sync-with-rsync
docs/guides/tools-reference/file-transfer/comparing-data-transfer-utilities/index.md
Outdated
Show resolved
Hide resolved
|
|
||
| #### Pros and Cons | ||
|
|
||
| Since it leverages the same authentication and security as SSH, SCP is widely regarded as a secure replacement for RCP. Aside from its security benefits and ubiquitous support across Linux systems, SCP also provides more advanced capabilities like file permission and timestamp preservation, among others. And because it uses TCP, SCP provides error detection and recovery capabilities for resuming or restarting data transfers in the event of network problems. However, due to its use of SSH keys and software, SCP is more complicated to manage and maintain than RCP and can be slow when transferring large volumes over the encrypted connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove some of the references/comparisons to RCP; it seems to me like no one really uses it for transferring data, but I could be wrong
Does SCP actually provide the ability to resume or restart a data transfer where it left off? I think it just provides a very simple copy mechanic, vs rsync's delta algorithm allowing it to pick up where it left off
Maybe SCP is slower than RCP because it uses SSH encryption (though I think modern processors are very good at encryption?)--but I don't think that comparing it to RCP's speed would be useful to the reader if they were never considering RCP. In my mind the better speed comparison is to the other utilites in this guide, which we don't seem to be making here.
I don't think we need to say that SSH keys make things more complicated--we should be recommending that the reader use keys or some other auth for all of these tools
docs/guides/tools-reference/file-transfer/comparing-data-transfer-utilities/index.md
Show resolved
Hide resolved
|
|
||
| SFTP works like traditional FTP, but over an encrypted, secure connection. SFTP supports both username and password and SSH key authentication, with some SFTP clients also supporting MFA and role-defined access. When using an SFTP client, it’s important to make sure it's not configured to use outdated encryption protocols like MD5 or DES (versus AES-128 or AES-256). This could result in a false sense of regulatory compliance and security. | ||
|
|
||
| SFTP is considered more complex compared to FTP, and it poses a potentially higher learning curve for non-technical users. And like SCP, SFTP performance and speed can degrade when transferring large amounts of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the basis of the assertion for slower speeds with SFTP. Is that in comparison to FTP, because of encryption?
nmelehan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Leon and john--I'm going to leave some comments on this guide. Let me know if you disagree with any of the comments, hopefully i'm not off-base
New guide written by Leon Yen (contributor) comparing various data transfer utilities.
Also updated: