Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error checking of Storage._writeFile and fix serialization flakiness #46036

Closed
wants to merge 4 commits into from

Conversation

gchanan
Copy link
Contributor

@gchanan gchanan commented Oct 8, 2020

Stack from ghstack:

Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which led to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

Differential Revision: D24228370

Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which lead to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

[ghstack-poisoned]
gchanan added a commit that referenced this pull request Oct 8, 2020
Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which lead to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

ghstack-source-id: 875a35d00de95740a96b6380cf6bfedf4b53137e
Pull Request resolved: #46036
@t-vi
Copy link
Collaborator

t-vi commented Oct 8, 2020

If you want CI to pass on this, add a False third parameter to the call to _write_file in _save in serialization.py. This could then be cherry-picked on the release branch.

Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which lead to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

[ghstack-poisoned]
@gchanan gchanan requested a review from mruberry October 8, 2020 20:02
Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which lead to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

[ghstack-poisoned]
gchanan added a commit that referenced this pull request Oct 8, 2020
Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which lead to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

ghstack-source-id: 92e635634da4309a2323383e8efecec5e964c180
Pull Request resolved: #46036
@dr-ci
Copy link

dr-ci bot commented Oct 8, 2020

💊 CI failures summary and remediations

As of commit 28baf90 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


codecov.io: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 5 times.

@@ -747,6 +748,7 @@ def run(self, *args, **kwargs):
with serialization_method(use_zip=True):
return super(TestSerialization, self).run(*args, **kwargs)

instantiate_device_type_tests(TestBothSerialization, globals())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this didn't cause failures with the inherited tests from the mixin.

PyObject *file = PyTuple_GET_ITEM(args, 0);
bool is_real_file = PyTuple_GET_ITEM(args, 1) == Py_True;
bool save_size = PyTuple_GET_ITEM(args, 2) == Py_True;
PyObject *file = PyTuple_GetItem(args, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bounds checking is cool.

Reviewer's note: https://docs.python.org/3/c-api/tuple.html#c.PyTuple_GetItem

@@ -488,7 +488,7 @@ def persistent_id(obj):
else:
# Copy to a buffer, then serialize that
buf = io.BytesIO()
storage._write_file(buf, _should_read_directly(buf))
storage._write_file(buf, _should_read_directly(buf), False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hope no one else is using _write_file.

@mruberry
Copy link
Collaborator

mruberry commented Oct 9, 2020

Nit: a test could be added for _write_file throwing an indexerror.

@@ -585,8 +586,8 @@ def __exit__(self, *args, **kwargs):

class TestBothSerialization(TestCase, SerializationMixin):
Copy link
Collaborator

@mruberry mruberry Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will run the tests in the mixin an additional time with the same arguments since they're now instantiated on CUDA and CPU:

test_load_error_msg (__main__.TestBothSerializationCPU) ... ok (0.102s)
test_load_error_msg (__main__.TestBothSerializationCUDA) ... ok (0.045s)

These tests run so quickly it's OK, but I think it'd also be fine to remove the mixin here. The next two test classes run the mixin tests with _use_new_zipfile_serialization=False/True, anyway, so this is just testing its default value (redundantly, since the default value is True).

Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which led to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

[ghstack-poisoned]
gchanan added a commit that referenced this pull request Oct 9, 2020
Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which lead to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

ghstack-source-id: 2c61ab79276763ad3b529176d0c1733c7890d98b
Pull Request resolved: #46036
gchanan added a commit to gchanan/pytorch that referenced this pull request Oct 9, 2020
Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which lead to issues like pytorch#46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

ghstack-source-id: 2c61ab79276763ad3b529176d0c1733c7890d98b
Pull Request resolved: pytorch#46036
@gchanan gchanan changed the title Improve error checking of Storage._writeFile. Improve error checking of Storage._writeFile and fix serialization flakiness Oct 9, 2020
@codecov
Copy link

codecov bot commented Oct 10, 2020

Codecov Report

Merging #46036 into gh/gchanan/337/base will decrease coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@                   Coverage Diff                   @@
##           gh/gchanan/337/base   #46036      +/-   ##
=======================================================
- Coverage                68.28%   68.27%   -0.01%     
=======================================================
  Files                      410      410              
  Lines                    53306    53306              
=======================================================
- Hits                     36399    36397       -2     
- Misses                   16907    16909       +2     
Impacted Files Coverage Δ
torch/serialization.py 86.30% <0.00%> (ø)
torch/testing/_internal/expecttest.py 77.55% <0.00%> (-1.03%) ⬇️
torch/utils/benchmark/utils/common.py 96.40% <0.00%> (-0.72%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52f2db7...28baf90. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

@gchanan merged this pull request in 2070834.

malfet pushed a commit to malfet/pytorch that referenced this pull request Oct 12, 2020
Summary:
Pull Request resolved: pytorch#46036

Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which led to issues like pytorch#46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D24228370

Pulled By: gchanan

fbshipit-source-id: f5d0a3d21ff12b4380beefe1e9954fa81ea2f567
@malfet malfet added this to the 1.7.0 milestone Oct 12, 2020
malfet added a commit that referenced this pull request Oct 12, 2020
Summary:
Pull Request resolved: #46036

Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which led to issues like #46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D24228370

Pulled By: gchanan

fbshipit-source-id: f5d0a3d21ff12b4380beefe1e9954fa81ea2f567

Co-authored-by: Gregory Chanan <gchanan@fb.com>
@facebook-github-bot facebook-github-bot deleted the gh/gchanan/337/head branch October 16, 2020 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants