New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch: Allow uploading files > 2G https://bugs.php.net/bug.php?id=44522 #372

Merged
merged 5 commits into from Sep 17, 2013

Conversation

6 participants
@ralflang
Contributor

ralflang commented Jun 28, 2013

Patch: Allow uploading files > 2G https://bugs.php.net/bug.php?id=44522

This is essentially the same as the patch
"uploads_larger_than_2g_HEAD_v2 (last revision 2012-03-26 03:59 UTC) by
jason at infininull dot com)" but using off_t instead of signed long
(originally: uint)

I tested this on 64bit linux and succeeded uploading a file of 4.8 G.
The File did not get corrupted or truncated in any way.

I did not yet test this under windows or 32 bit linux

Note that there are still limitations:

  • Did not test for files > 8 G
  • php does not yet reject absurdly high values
  • Still limited by underlying file system specific limits and free space
  • in upload
  • tmp dir and destination dir

ralflang added some commits Jun 28, 2013

Patch for https://bugs.php.net/bug.php?id=44522 to allow uploading files
above 2G.

This is essentially the same as the patch
"uploads_larger_than_2g_HEAD_v2 (last revision 2012-03-26 03:59 UTC) by
jason at infininull dot com)" but using off_t instead of signed long
(originally: uint)

I tested this on 64bit linux and succeeded uploading a file of 4.8 G.
The File did not get corrupted or truncated in any way.

I did not yet test this under windows or 32 bit linux

Note that there are still limitations:

* Did not test for files > 8 G
* php does not yet reject absurdly high values
* Still limited by underlying file system specific limits and free space
* in upload
* tmp dir and destination dir
@johannes

View changes

Show outdated Hide outdated main/rfc1867.c
int boundary_len = 0, total_bytes = 0, cancel_upload = 0, is_arr_upload = 0, array_len = 0;
int max_file_size = 0, skip_upload = 0, anonindex = 0, is_anonymous;
int boundary_len = 0, cancel_upload = 0, is_arr_upload = 0, array_len = 0;
off_t total_bytes = 0, max_file_size = 0;

This comment has been minimized.

@johannes

johannes Jul 3, 2013

Member

max_file_size is later (around line 900) set using

                if (!strcasecmp(param, "MAX_FILE_SIZE")) {
                    max_file_size = atol(value);
                }

this might cause issues on windows where sizeof(long) always equals 4, even on 64 bit systems. In case off_t is a 64 bit integer (or unsized 32 bit) this might lead to unexpected behavior.

Also later, around line 1026 I see

if (PG(upload_max_filesize) > 0 && (long)(total_bytes+blen) > PG(upload_max_filesize)) {
#if DEBUG_FILE_UPLOAD
                    sapi_module.sapi_error(E_NOTICE, "upload_max_filesize of %ld bytes exceeded - file [%s=%s] not saved", PG(upload_max_filesize), param, filename);
#endif
                    cancel_upload = UPLOAD_ERROR_A;
                } else if (max_file_size && ((long)(total_bytes+blen) > max_file_size)) {
#if DEBUG_FILE_UPLOAD
                    sapi_module.sapi_error(E_NOTICE, "MAX_FILE_SIZE of %ld bytes exceeded - file [%s=%s] not saved", max_file_size, param, filename);
#endif
                    cancel_upload = UPLOAD_ERROR_B;
                } else if (blen > 0) {

those casts to long seem to be wrong after this patch, probably casting to off_t is better %ld should be double checked (while that's debug code only)

Again for Windows: Line 1222 or so reads

                  file_size.value.lval = total_bytes;

lval is a long. long on Windows always is 32bit. Probably we need (unprecise) double or string representation for files with total_bytes > maxint

This is simpler if off_t on Win64 is 32bits, too, which I don't know :-)

@johannes

johannes Jul 3, 2013

Member

max_file_size is later (around line 900) set using

                if (!strcasecmp(param, "MAX_FILE_SIZE")) {
                    max_file_size = atol(value);
                }

this might cause issues on windows where sizeof(long) always equals 4, even on 64 bit systems. In case off_t is a 64 bit integer (or unsized 32 bit) this might lead to unexpected behavior.

Also later, around line 1026 I see

if (PG(upload_max_filesize) > 0 && (long)(total_bytes+blen) > PG(upload_max_filesize)) {
#if DEBUG_FILE_UPLOAD
                    sapi_module.sapi_error(E_NOTICE, "upload_max_filesize of %ld bytes exceeded - file [%s=%s] not saved", PG(upload_max_filesize), param, filename);
#endif
                    cancel_upload = UPLOAD_ERROR_A;
                } else if (max_file_size && ((long)(total_bytes+blen) > max_file_size)) {
#if DEBUG_FILE_UPLOAD
                    sapi_module.sapi_error(E_NOTICE, "MAX_FILE_SIZE of %ld bytes exceeded - file [%s=%s] not saved", max_file_size, param, filename);
#endif
                    cancel_upload = UPLOAD_ERROR_B;
                } else if (blen > 0) {

those casts to long seem to be wrong after this patch, probably casting to off_t is better %ld should be double checked (while that's debug code only)

Again for Windows: Line 1222 or so reads

                  file_size.value.lval = total_bytes;

lval is a long. long on Windows always is 32bit. Probably we need (unprecise) double or string representation for files with total_bytes > maxint

This is simpler if off_t on Win64 is 32bits, too, which I don't know :-)

@weltling

This comment has been minimized.

Show comment
Hide comment
@weltling

weltling Jul 8, 2013

Contributor

off_t is long, so 4 bytes in both x64 and x86 windows. On 32 bit linux as they say it's 'signed integer'. That all means on windows and 32 bit linux there's no benefit from this patch, true 64 bit only systems would benefit. I think double instead of off_t could be used on non 64 bit systems to make it profitable even there.

Contributor

weltling commented Jul 8, 2013

off_t is long, so 4 bytes in both x64 and x86 windows. On 32 bit linux as they say it's 'signed integer'. That all means on windows and 32 bit linux there's no benefit from this patch, true 64 bit only systems would benefit. I think double instead of off_t could be used on non 64 bit systems to make it profitable even there.

@ralflang

This comment has been minimized.

Show comment
Hide comment
@ralflang

ralflang Jul 8, 2013

Contributor

The off_t was advised by yohgaki@ohgaki.net on the php-dev internals list - to be used together with #define _FILE_OFFSET_BITS 64 which I haven't done yet. And I need to address Johannes' points..

Double has a wide range on all platforms but it's not a precise, discrete integer type. I'm not sure a floating point type is appropriate here and like to hear other opinions. I think I'm best off with an integer type that is 64 bit on all platforms.

Contributor

ralflang commented Jul 8, 2013

The off_t was advised by yohgaki@ohgaki.net on the php-dev internals list - to be used together with #define _FILE_OFFSET_BITS 64 which I haven't done yet. And I need to address Johannes' points..

Double has a wide range on all platforms but it's not a precise, discrete integer type. I'm not sure a floating point type is appropriate here and like to hear other opinions. I think I'm best off with an integer type that is 64 bit on all platforms.

@johannes

This comment has been minimized.

Show comment
Hide comment
@johannes

johannes Jul 8, 2013

Member

Be careful with _FILE_OFFSET_BITS 64 - as there is an off_t in a global header this would have to affect all of PHP, which might cause trouble with external libs we're linking too. That can of worms made us not add the large file support.

Member

johannes commented Jul 8, 2013

Be careful with _FILE_OFFSET_BITS 64 - as there is an off_t in a global header this would have to affect all of PHP, which might cause trouble with external libs we're linking too. That can of worms made us not add the large file support.

@ralflang

This comment has been minimized.

Show comment
Hide comment
@ralflang

ralflang Jul 8, 2013

Contributor

I've read a bit about windows type lengths and it's probably best to define an own type filesize_t which equals unsigned long long in linux and whatever equivalent we can get on both 32 bit and 64 bit windows (i'm not a windows dev but http://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.80%29.aspx says their long long is also 64 bit). This could avoid _File_OFFSET_BITS.

Contributor

ralflang commented Jul 8, 2013

I've read a bit about windows type lengths and it's probably best to define an own type filesize_t which equals unsigned long long in linux and whatever equivalent we can get on both 32 bit and 64 bit windows (i'm not a windows dev but http://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.80%29.aspx says their long long is also 64 bit). This could avoid _File_OFFSET_BITS.

@weltling

This comment has been minimized.

Show comment
Hide comment
@weltling

weltling Jul 8, 2013

Contributor

With 64 bit integer on 32 bit platform there were possible overflow in returning the file size to PHP, even returning it as string. double is good enough as soon as it's ieee 754 complaint, so has 52 bit integer part. And it can be returned to php without overflow. Or how would you do it?

Contributor

weltling commented Jul 8, 2013

With 64 bit integer on 32 bit platform there were possible overflow in returning the file size to PHP, even returning it as string. double is good enough as soon as it's ieee 754 complaint, so has 52 bit integer part. And it can be returned to php without overflow. Or how would you do it?

@johannes

This comment has been minimized.

Show comment
Hide comment
@johannes

johannes Jul 8, 2013

Member

For exporting a large integer to PHP use a string. Using double's
imprecision might cause issues with further file processing. For fixed
64bit types look at int64_t from C99 which we provide via
win32/php_stdint.h and the PHP_CHECKTYPES config.m4 macro.

Member

johannes commented Jul 8, 2013

For exporting a large integer to PHP use a string. Using double's
imprecision might cause issues with further file processing. For fixed
64bit types look at int64_t from C99 which we provide via
win32/php_stdint.h and the PHP_CHECKTYPES config.m4 macro.

@weltling

This comment has been minimized.

Show comment
Hide comment
@weltling

weltling Jul 8, 2013

Contributor

That's true. Plus minus one byte would corrupt a file. But what's the issue using just the integer part, like double d = .0; int i = 42; d += (double)i; ... 42 were converted to 42.0 even without explicit cast, or where is the pitfall?

Contributor

weltling commented Jul 8, 2013

That's true. Plus minus one byte would corrupt a file. But what's the issue using just the integer part, like double d = .0; int i = 42; d += (double)i; ... 42 were converted to 42.0 even without explicit cast, or where is the pitfall?

@ralflang

This comment has been minimized.

Show comment
Hide comment
@ralflang

ralflang Jul 22, 2013

Contributor

I settled for int64_t - sorry for the delay but I was a little alien to windows build at first.
You seem to define an uint64_t too, but only for windows? Actually signed file sizes don't make too much sense but it seems like glibc does not have an atoull either. strtoull seems an option but I did not want to introduce too much change at once. 63bits + signed is probably enough for the next decade or so.

Contributor

ralflang commented Jul 22, 2013

I settled for int64_t - sorry for the delay but I was a little alien to windows build at first.
You seem to define an uint64_t too, but only for windows? Actually signed file sizes don't make too much sense but it seems like glibc does not have an atoull either. strtoull seems an option but I did not want to introduce too much change at once. 63bits + signed is probably enough for the next decade or so.

@weltling

This comment has been minimized.

Show comment
Hide comment
@weltling

weltling Jul 22, 2013

Contributor

Yep, that should already compile with VC. I'd however rewrite the check a bit more precise

#if defined(PHP_WIN32) && !defined(HAVE_ATOLL)

like that, to prevent possible future conflicts. You can ping me again when i can test on windows.

Contributor

weltling commented Jul 22, 2013

Yep, that should already compile with VC. I'd however rewrite the check a bit more precise

#if defined(PHP_WIN32) && !defined(HAVE_ATOLL)

like that, to prevent possible future conflicts. You can ping me again when i can test on windows.

@weltling

This comment has been minimized.

Show comment
Hide comment
@weltling

weltling Jul 22, 2013

Contributor

or, as that's an isolated case, just do that in place, like

#ifdef PHP_WIN32
.... do __atoi64
#else
.... do atoll
#endif

Contributor

weltling commented Jul 22, 2013

or, as that's an isolated case, just do that in place, like

#ifdef PHP_WIN32
.... do __atoi64
#else
.... do atoll
#endif

@ralflang

This comment has been minimized.

Show comment
Hide comment
@ralflang

ralflang Jul 23, 2013

Contributor

I've setup a win7/VC11 build and it built. I'm currently reading/searching how to set up a windows test case with this build.

Contributor

ralflang commented Jul 23, 2013

I've setup a win7/VC11 build and it built. I'm currently reading/searching how to set up a windows test case with this build.

m6w6 added a commit to m6w6/php-src that referenced this pull request Aug 5, 2013

Merge branch 'master' of github.com:/ralflang/php-src
merged pull request #372:
>2G uploads by Ralf Lang
php#372
@pierrejoye

This comment has been minimized.

Show comment
Hide comment
@pierrejoye

pierrejoye Aug 15, 2013

Contributor

I do not really understand why this patch gets merged already. Not that it is not necessary :) but I do not think it is ready enough to get into master. There are a couple of areas that needs more work (userland side, handling for large filesize, only meta infos f.e.). As we discussed on IRC, it could way easier to get it via the 64bit support patch/RFC.

What was the reasoning

Contributor

pierrejoye commented Aug 15, 2013

I do not really understand why this patch gets merged already. Not that it is not necessary :) but I do not think it is ready enough to get into master. There are a couple of areas that needs more work (userland side, handling for large filesize, only meta infos f.e.). As we discussed on IRC, it could way easier to get it via the 64bit support patch/RFC.

What was the reasoning

@m6w6

This comment has been minimized.

Show comment
Hide comment
@m6w6

m6w6 Aug 16, 2013

Contributor

It was ridiculous to not support 64bit upload size on platforms where long is large enough. The int64 change is a lot more work. I see that it is not perfect yet, but small steps are better than no progress at all. Actually, using int64_t instead of long would not be necessary, so I can change it back if you don't like it!

Contributor

m6w6 commented Aug 16, 2013

It was ridiculous to not support 64bit upload size on platforms where long is large enough. The int64 change is a lot more work. I see that it is not perfect yet, but small steps are better than no progress at all. Actually, using int64_t instead of long would not be necessary, so I can change it back if you don't like it!

@ralflang

This comment has been minimized.

Show comment
Hide comment
@ralflang

ralflang Aug 16, 2013

Contributor

We had all this type changing back and forth (long, size_t, int64_t) in the patch to make sure it behaves consistently in linux and windows, 32 and 64 bit platforms.

signed long in 32 bit systems is only 32 bit, roughly 2G - not enough.

Contributor

ralflang commented Aug 16, 2013

We had all this type changing back and forth (long, size_t, int64_t) in the patch to make sure it behaves consistently in linux and windows, 32 and 64 bit platforms.

signed long in 32 bit systems is only 32 bit, roughly 2G - not enough.

@php-pulls php-pulls merged commit d80a910 into php:master Sep 17, 2013

1 check failed

default The Travis CI build could not complete due to an error
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment