Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terabyte file support #2791

Closed
wants to merge 2 commits into from
Closed

Terabyte file support #2791

wants to merge 2 commits into from

Conversation

letiemble
Copy link
Contributor

Hi,

This PR is about the file size limit in Syncthing.

From what I understand from the specifications and the code, the current size limit is 1000000 blocks of 131072 bytes, which means that any file larger than 130 GB cannot be replicated. Am I right ?

I dived into the code and patched it to support up to 1 terabyte file size. My tests with 300 GB and 450 GB files were ok and they got replicated successfully.

Here are some points left:

  • The mix of go generate (for the model part ) and hardcoded values (for the DB/Protocol part) make it difficult to know which are the actual limits. For example, the number of blocks of a file has a direct influence on the messages' size (during index exchange). This is not obvious when reading the code. Is is planned to have a specific doc on the subject ? A plus would be to have this information available on the command line via a specific switch.
  • Is it planned to log/store/ the cases when limits are crossed so a feedback can be provided to the user ? When I tried to replicate a 300 GB file, I only knew that something went wrong by using STTRACE.
  • Is there any side-effects with increasing values to support very large files ? If not, what is the rationale behind the current values ?

Regards.

@AudriusButkevicius
Copy link
Member

I think it should error out if the file is bigger. I'll leave this for calmh to merge, as he was the one who imposed limits in the first place.

@calmh
Copy link
Member

calmh commented Feb 23, 2016

A limit of some sort needs to be in place to prevent us from attempting to allocate a gazillion bytes of ram in the face of corruption or protocol changes. The long term solution to this issue is to use a variable block size, which is sort of planned. In the meantime we should probably increase the size. The limits are supposed to be all handled by the go-generate updates parts, but the "truncated" code is manually updated as we want to avoid the cost of loading the (potentially huge, now) list of blocks when we don't need that info.

I haven't looked at the code yet (limited device, will later) but as a first guess this is probably fine and should be merged if there's nothing technically wrong with it.

@calmh
Copy link
Member

calmh commented Feb 23, 2016

Code LGTM to me, except it should not update the man page (that's auto generated from the actual protocol spec), it should be squashed to a single commit, and there should be a companion PR to increase the recommended limits in the actual spec.

Increase limit when unmarshaling XDR.
Increase the size of message.
@letiemble
Copy link
Contributor Author

I have squashed the commits (minus the MAN page one). Where should I submit the companion PR for the limit in the actual spec ?

@calmh
Copy link
Member

calmh commented Feb 27, 2016

LGTM just waiting for the build server to come online again for verification.

https://github.com/syncthing/docs/blob/master/specs/bep-v1.rst for the spec, one of the final sections, "message limits".

@letiemble
Copy link
Contributor Author

Companion PR created (syncthing/docs#127).

calmh added a commit to syncthing/docs that referenced this pull request Feb 28, 2016
@ashleycawley
Copy link

From a user perspective I have been trying to Sync a 1.7TB directory, a few files are 488GB some around 100200GB, all I experienced was a stuck initial scan, no error messages or evidence of problems in the log as far as I could spot. A bit frustrating, thanks to the prompt & helpful Syncthing community I was directed here to see this thread. I dearly would have loved to see an error message or just a warning/notice if it noticed files over 130GB.

Now I know this limitation I can easily split files down to be smaller that is not a problem for me in my use-case.

I would just like to say a huge thank you to the Developers who spend their time on this very worth while project, I love independence and the flexibility that the settings provide the user. I will continue to recommend Syncthing to others wherever I can. Thank you for your time and efforts.

@calmh
Copy link
Member

calmh commented Mar 2, 2016

LGTM, I'm just going to get back from vacation and fix the build server so we can run some builds and tests on this before mergin.

@calmh calmh self-assigned this Mar 2, 2016
@calmh
Copy link
Member

calmh commented Mar 4, 2016

@st-jenkins retest this please

@calmh
Copy link
Member

calmh commented Mar 4, 2016

Merged as c8b6e6f, thanks.

@calmh calmh closed this Mar 4, 2016
@romprod
Copy link

romprod commented Feb 24, 2017

Hi,

Is this limit able to be increased any more? Or is it possible to set it manually by editing anything?

I have a 1.3TB file which I guess will hit this limit.

@AudriusButkevicius
Copy link
Member

I think this is no longer relevant, I don't think we have limits anymore, but let us know if you hit issues.

@calmh
Copy link
Member

calmh commented Feb 24, 2017 via email

@romprod
Copy link

romprod commented Feb 24, 2017

Are there better alternatives for large files above 1TB? The files never change once they've been created.....

@Ferroin
Copy link

Ferroin commented Feb 24, 2017

If they never change once created, you're much better off using some generic file transfer tool to throw them over the network than using a tool like Syncthing, and that actually applies to any size file, not just TB+ ones.

@romprod
Copy link

romprod commented Feb 24, 2017

Well they need be sent over a 100mbit line across the internet.... Was hoping to use Syncthing to split the files up into blocks and send that way.

@a8ksh4
Copy link

a8ksh4 commented Feb 24, 2017 via email

@AudriusButkevicius
Copy link
Member

why would you want to use syncthing for static files if rsync/scp does it better and faster?

@a8ksh4
Copy link

a8ksh4 commented Feb 24, 2017

Because with rsync/scp, I need to have other automation/supporting configs to make them work - dns names registered to find the hosts I want to sync data to, and automation to call rsync/scp and handle failures. With syncthing, I can just drop a file into the shared area (be it a static file or a dynamic, changing, file) and it magically appears on all of my systems. :)

@Ferroin
Copy link

Ferroin commented Feb 24, 2017

OK, let's frame this a different way:
Using Syncthing for stuff like this is trading a pretty significant amount of efficiency for some convenience.

As a point of comparison, over a direct gigabit link, copying data between two reasonably high-end systems, I get about 20% better throughput using rsync than i do Syncthing, roughly another 5% using SCP, and another 4-7% on top of that if I just use netcat. Note that most of the difference between Syncthing, rsync, and SCP is processing overhead, while the difference for netcat is protocol overhead (netcat uses nothing on top of TCP (or UDP, or SCTP, or DCCP, depending on what switches you pass), so there's zero protocol overhead compared to the others).

Now, I do get the DNS issue, but that's not hard to handle sanely as long as you have some system somewhere that has a fixed IP.

@calmh
Copy link
Member

calmh commented Feb 25, 2017 via email

@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Aug 24, 2017
@syncthing syncthing locked and limited conversation to collaborators Aug 24, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants