Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix calculate optimal part size bug #608

Merged
merged 1 commit into from Jan 8, 2018

Conversation

liuzxc
Copy link
Contributor

@liuzxc liuzxc commented Jan 5, 2018

In [4]: optimal_part_info(549755813)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-4-9348e40232c0> in <module>()
----> 1 optimal_part_info(549755813)

/data/apps/ecoboost-backend/ecoboost/eggs/minio-2.2.6-py2.7.egg/minio/helpers.py in optimal_part_info(length)
    592                        * MIN_PART_SIZE)
    593     # Total parts count.
--> 594     total_parts_count = int(math.ceil(length/part_size_float))
    595     # Part size.
    596     part_size = int(part_size_float)

ZeroDivisionError: integer division or modulo by zero

Uploaded failed when length larger than 5MB put_object.

env:

python version: 2.7.5
minio version: 2.2.6

@kannappanr
Copy link
Collaborator

kannappanr commented Jan 5, 2018

Hi,
Thanks for the PR.
But, I could not reproduce the issue that you have described in the PR I used Python 2.7.14 and minio-py version 3.0.0. Here is the sample code that I tested.


from minio import Minio
from minio.error import ResponseError

client = Minio('192.168.1.60:9000',
               access_key='minio',
               secret_key='minio123',
               secure=False)

# Put a file with default content-type.
try:
    file_stat = os.stat('/home/admin/Downloads/minio_test')
    file_data = open('/home/admin/Downloads/minio_test', 'rb')
    client.put_object('test', 'minio_test', file_data, file_stat.st_size)
except ResponseError as err:
    print(err)

The file that I used was about 22 MB, which is bigger than 5MB as you have mentioned.
Can you provide us with more information about your setup and environment, so that we can reproduce the issue? If you have a sample code to reproduce the issue, it will be much easier for us to reproduce the issue.

@liuzxc
Copy link
Contributor Author

liuzxc commented Jan 6, 2018

HI,

In [6]: import math

In [7]: math.ceil(4.2)
Out[7]: 5

I am using minio-py for my flask app, Although I used python 2.7, the math.ceil return an integer not a float(I didn't find the root cause). And For python3, the math.ceil also return an integer. For compatibility with python3, the code should add from __future__ import division.

@harshavardhana
Copy link
Member

I am using minio-py for my flask app, Although I used python 2.7, the math.ceil return an integer not a float(I didn't find the root cause). And For python3, the math.ceil also return an integer. For compatibility with python3, the code should add from future import division.

Yes but you are missing that we do int() casting

Here are the results for python2.7 and python3.6

python -c "import math; import platform; print(platform.python_version()); print(int(math.ceil(4.2)))"
2.7.14
5
python3 -c "import math; import platform; print(platform.python_version()); print(int(math.ceil(4.2)))"
3.6.3
5

So i am not sure what this PR is really doing, the title doesn't indicate what it is doing. The PR only adds tests but not any bug in the actual code.

Can you restate what is the real issue here?

@liuzxc
Copy link
Contributor Author

liuzxc commented Jan 6, 2018

part_size_float = math.ceil(length/MAX_MULTIPART_COUNT)
part_size_float = (math.ceil(part_size_float/MIN_PART_SIZE)* MIN_PART_SIZE)
total_parts_count = int(math.ceil(length/part_size_float))

if the first part_size_float return an integer, and it less than MIN_PART_SIZE, the second part_size_float could be zero, the error "ZeroDivisionError" will be raised.

@harshavardhana
Copy link
Member

part_size_float = math.ceil(length/MAX_MULTIPART_COUNT)
part_size_float = (math.ceil(part_size_float/MIN_PART_SIZE)* MIN_PART_SIZE)
total_parts_count = int(math.ceil(length/part_size_float))
if the first part_size_float return an integer, the second could be zero, the error "ZeroDivisionError" will be raised.

Can you provide a test case?

@harshavardhana
Copy link
Member

part_size_float = math.ceil(length/MAX_MULTIPART_COUNT)
part_size_float = (math.ceil(part_size_float/MIN_PART_SIZE)* MIN_PART_SIZE)
total_parts_count = int(math.ceil(length/part_size_float))
if the first part_size_float return an integer, the second could be zero, the error "ZeroDivisionError" will be raised.

Can you provide a test case?

Ah I see now it won't happen in the code as is while using put_object because we never send optimal_part_info a value which is < MIN_PART_SIZE

optimal_part_info()

The problem is when we use optimal_part_info() elsewhere it is possible that someone might pass in size < MIN_PART_SIZE

I would suggest a better fix to be following code with from __future__ import division

diff --git a/minio/helpers.py b/minio/helpers.py
index 304b0e7..015dd65 100644
--- a/minio/helpers.py
+++ b/minio/helpers.py
@@ -584,6 +584,8 @@ def optimal_part_info(length):
     if length > MAX_MULTIPART_OBJECT_SIZE:
         raise InvalidArgumentError('Input content size is bigger '
                                    ' than allowed maximum of 5TiB.')
+    if length <= MIN_PART_SIZE:
+        return 1, length, length

@liuzxc
Copy link
Contributor Author

liuzxc commented Jan 6, 2018

I don't think it is better solution. For my environment, the error as below:

In [7]: from minio.helpers import optimal_part_info

In [8]: optimal_part_info(50000 * 1024 * 1024)
Out[8]: (10000, 5242880, 5242880)

In [9]: optimal_part_info(5000 * 1024 * 1024)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-9-0c8b3bb80f4d> in <module>()
----> 1 optimal_part_info(5000 * 1024 * 1024)

/data/apps/ecoboost-backend/ecoboost/eggs/minio-2.2.6-py2.7.egg/minio/helpers.py in optimal_part_info(length)
    591                        * MIN_PART_SIZE)
    592     # Total parts count.
--> 593     total_parts_count = int(math.ceil(length/part_size_float))
    594     # Part size.
    595     part_size = int(part_size_float)

ZeroDivisionError: integer division or modulo by zero

if length less than MAX_MULTIPART_COUNT * MIN_PART_SIZE, part_size_float could be zero. so math.ceil(length/MAX_MULTIPART_COUNT) must be float. But for python3, it could be integer.

@harshavardhana
Copy link
Member

In [9]: optimal_part_info(5000 * 1024 * 1024)
ZeroDivisionError: integer division or modulo by zero

FWIW - I don't see this error on python3.6.3

>>> from minio.helpers import optimal_part_info
>>> optimal_part_info(5000 * 1024 * 1024)
(1000, 5242880, 5242880)

Choosing 42428800000 is less than MAX_MULTIPART_COUNT * MIN_PART_SIZE

>>> optimal_part_info(42428800000)
4242880
5242880
(8093, 5242880, 3415040)

I am not sure how it is possible on your system though.

@liuzxc
Copy link
Contributor Author

liuzxc commented Jan 8, 2018

HI,

I found the root cause, for our flask app, we use the future package, it rewrite the math.ceil method.

future is the missing compatibility layer between Python 2 and Python 3. It allows you to use a single, clean Python 3.x-compatible codebase to support both Python 2 and Python 3 with minimal overhead.

In [5]: math.ceil.func_code
Out[5]: <code object ceil at 0x7fd74853cab0, file "/data/apps/ecoboost-backend/ecoboost/eggs/future-0.16.0-py2.7.egg/future/backports/misc.py", line 31>

the source code as below:

def ceil(x):
    """
    Return the ceiling of x as an int.
    This is the smallest integral value >= x.
    """
    return int(oldceil(x))

so for python2.7, the function optimal_part_info doesn't work.

part_size_float = math.ceil(length/MAX_MULTIPART_COUNT)
part_size_float = (math.ceil(part_size_float/MIN_PART_SIZE)
                       * MIN_PART_SIZE)

the first part_size_float return an integer right now, and if it less than MIN_PART_SIZE, for python2.7, divide part_size_float by MIN_PART_SIZE is zero.

So if use python2.7 and future package, the error will be found. So I hope import division from __future__ module, let part_size_float/MIN_PART_SIZE will be a float.

@harshavardhana
Copy link
Member

I found the root cause, for our flask app, we use the future package, it rewrite the math.ceil method.

future is the missing compatibility layer between Python 2 and Python 3. It allows you to use a single, clean Python 3.x-compatible codebase to support both Python 2 and Python 3 with minimal overhead.

This is one of the reasons always unhappy with __future__ it can silently change behaviors underneath. Now in this case i would suggest that your fix is acceptable but please add a comment indicating why we had to add import __future__ so that it is not removed in future by mistake.

Also would be nice to add test cases for other situations as well where we expect the result to be float.

for some situations(for example, use future package), math.ceil will return an integer for python2.x,
it will cause errors for calculate optimal part size for multipart uploads. So, add __future__ module
to make sure the integer division return a float.
@liuzxc
Copy link
Contributor Author

liuzxc commented Jan 8, 2018

@harshavardhana Thanks for your support, add test cases for this maybe is not needed, because this is a special case, just make sure the result of integers devision is a float, import __future__can make it.

@kannappanr kannappanr merged commit 2bf812f into minio:master Jan 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants