Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[quant][graphmode][fx] Add support for dynamic quant for RNN and RNNCell #49126

Closed
wants to merge 3 commits into from

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Dec 9, 2020

Stack from ghstack:

Summary:

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_rnn
python test/test_quantization.py TestQuantizeFxOps.test_rnn_cell

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D25449047

Summary:

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_rnn
python test/test_quantization.py TestQuantizeFxOps.test_rnn_cell

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
…NN and RNNCell"

Summary:

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_rnn
python test/test_quantization.py TestQuantizeFxOps.test_rnn_cell

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Dec 9, 2020
Summary:

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_rnn
python test/test_quantization.py TestQuantizeFxOps.test_rnn_cell

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 810a87cfb19308fa036220936e456df46c5714f5
Pull Request resolved: #49126
@dr-ci
Copy link

dr-ci bot commented Dec 9, 2020

💊 CI failures summary and remediations

As of commit ff4353e (more details on the Dr. CI page):


  • 2/4 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)
  • 2/4 broken upstream at merge base e5a98c5 on Dec 09 from 8:37am to 6:41pm

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 (1/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 10 22:07:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 10 22:07:54 At:
Dec 10 22:07:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 10 22:07:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 10 22:07:54 
Dec 10 22:07:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 10 22:07:54 
Dec 10 22:07:54 At:
Dec 10 22:07:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 10 22:07:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 10 22:07:54 
Dec 10 22:07:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 10 22:07:54 
Dec 10 22:07:54 At:
Dec 10 22:07:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 10 22:07:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 10 22:07:54 
Dec 10 22:07:54 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Dec 10 22:07:54 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown)
Dec 10 22:07:54 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown)
Dec 10 22:07:55 ok (1.227s)
Dec 10 22:07:56   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 10 21:38:14 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 10 21:38:14 At:
Dec 10 21:38:14   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 10 21:38:14   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 10 21:38:14 
Dec 10 21:38:14 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 10 21:38:14 
Dec 10 21:38:14 At:
Dec 10 21:38:14   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 10 21:38:14   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 10 21:38:14 
Dec 10 21:38:14 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 10 21:38:14 
Dec 10 21:38:14 At:
Dec 10 21:38:14   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 10 21:38:14   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 10 21:38:14 
Dec 10 21:38:14 ok (1.026s)
Dec 10 21:38:15   test_return_future_remote (__main__.ProcessGroupRpcTestWithSpawn) ... RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Dec 10 21:38:15 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Dec 10 21:38:15 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Dec 10 21:38:15 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.

See CircleCI build pytorch_linux_xenial_py3_clang7_onnx_build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Dec 10 19:31:13 sccache: error: couldn't connect to server
Dec 10 19:31:13 +++ eval 'extract_trap_cmd '
Dec 10 19:31:13 ++++ extract_trap_cmd
Dec 10 19:31:13 ++++ printf '%s\n' ''
Dec 10 19:31:13 +++ printf '%s\n' cleanup
Dec 10 19:31:13 ++ trap -- '
Dec 10 19:31:13 cleanup' EXIT
Dec 10 19:31:13 ++ [[ pytorch-linux-xenial-py3-clang7-onnx-build != *pytorch-win-* ]]
Dec 10 19:31:13 ++ which sccache
Dec 10 19:31:13 ++ sccache --stop-server
Dec 10 19:31:13 Stopping sccache server...
Dec 10 19:31:13 sccache: error: couldn't connect to server
Dec 10 19:31:13 sccache: caused by: Connection refused (os error 111)
Dec 10 19:31:13 ++ true
Dec 10 19:31:13 ++ rm /var/lib/jenkins/sccache_error.log
Dec 10 19:31:13 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
Dec 10 19:31:13 ++ true
Dec 10 19:31:13 ++ [[ pytorch-linux-xenial-py3-clang7-onnx-build == *rocm* ]]
Dec 10 19:31:13 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
Dec 10 19:31:13 ++ SCCACHE_IDLE_TIMEOUT=1200
Dec 10 19:31:13 ++ RUST_LOG=sccache::server=error
--- ### 🚧 2 fixed upstream failures: These were probably **caused by upstream breakages** that were **already fixed**.
Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 9 times.

}
model_graph = prepare_fx(model_graph, graph_qconfig_dict)
model_graph = convert_fx(model_graph)
self.assertEqual(model_eager(sample_input), model_graph(sample_input))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test for serialization here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be the same as eager mode module, I'm not very familiar, are we using state_dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or are you referring to checkScriptable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added checkScriptable here, but in general we'll do e2e test in TestQuantizeFxModels

@@ -124,6 +124,19 @@ def get_static_quant_module_class(float_module_class, additional_static_quant_ma
" does not have a corresponding quantized module class"
return static_quant_module_class

def get_dynamic_quant_module_class(float_module_class, additional_dynamic_quant_mapping=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to add types to function I/O

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add it in a separate PR I think, all other functions in this file are not typed yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not blocking this PR, but would be awesome if we started adding these as we go, at least to function I/O. We don't have to wait for a file to have existing type annots to add more. This also distributes the cost of adding them to everyone, as opposed to one person.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, fully agree that we should add types as we change code. I'm saying I plan to add it in a separate PR, or are you suggesting to add the type annotations for the functions in this file in this PR?

…NN and RNNCell"

Summary:

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_rnn
python test/test_quantization.py TestQuantizeFxOps.test_rnn_cell

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D25449047](https://our.internmc.facebook.com/intern/diff/D25449047)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 882eb0f.

@facebook-github-bot facebook-github-bot deleted the gh/jerryzh168/518/head branch December 14, 2020 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants