Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to native batchnorm implementation if input shape unsupported by cudnn #31976

Closed
wants to merge 323 commits into from

Conversation

ptrblck
Copy link
Collaborator

@ptrblck ptrblck commented Jan 9, 2020

Should fix #29744 by falling back to native batch norm implementation, if cudnn cannot execute the provided shape.

Shape numbers were verified for cudnn 7.6.5.32 with tensor shapes:

# for spatial bn
x = torch.Size([880801, 256, 5])
x = torch.Size([65535, 256, 5])
x = torch.Size([880801, 64, 4, 4])
x = torch.Size([65535, 64, 4, 4])

# for per-act bn
x = torch.Size([131070, 2048])
x = torch.Size([262136, 2048])

for training() and eval() mode using torch.float32 and torch.float16.

I've increased the shape of our current smoke test to, but I can also add all use cases of the support matrix, if wanted.

@kostmo
Copy link
Member

kostmo commented Jan 9, 2020

💊 CircleCI build failures summary and remediations

As of commit 3692fd2:

None of the build failures appear to be your fault.

  • 1/1 broken upstream at merge base c729614 since Jan 29

    Please rebase on the viable/strict branch (expand for instructions)

    If your commit is newer than viable/strict, you can try basing on an older, stable commit:

    git fetch origin viable/strict
    git rebase --onto viable/strict $(git merge-base origin/master HEAD)
    

    If your commit is older than viable/strict:

    git fetch origin viable/strict
    git rebase viable/strict
    

    Check out the recency history of this "viable master" tracking branch.

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🚧 1 upstream failure recognized by patterns:

These builds matched patterns, but were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 3 times.

EscapeZero and others added 28 commits January 14, 2020 11:49
Summary:
Pull Request resolved: pytorch#32149

This is an attempt at clarifying some of the preprocessor boolean logic that was getting more and more complicated. The previous logic used constexpr with nvcc on clang; which we were getting compiler failures on in ovrsource with mode/linux/* (based on platform007).

Test Plan:
ovrsource xplat/caffe2 compiles
fbsource sandcastle green

Differential Revision: D19385409

fbshipit-source-id: 60a02bae9854388b87510afdd927709673a6c313
Summary:
Pull Request resolved: pytorch#31912

### Summary

Clean up the logs from pip-install.

### Test Plan

- Don't break the iOS simulator build

Test Plan: Imported from OSS

Differential Revision: D19395526

Pulled By: xta0

fbshipit-source-id: a638a209cab801ce90c8615e7ea030b1ab0939f3
Test Plan: revert-hammer

Differential Revision:
D18482934

Original commit changeset: bd82a0d820c4

fbshipit-source-id: ca5e50fb0a883ee311aeb310198d84ad28062158
Summary:
Pull Request resolved: pytorch#32170

Stack from [ghstack](https://github.com/ezyang/ghstack):
Change the overload name from passing by const ref to by value and move.
* **pytorch#32170 Fix the passing-by-ref constructor of OperatorName.**

Test Plan: Imported from OSS

Differential Revision: D19396225

Pulled By: iseeyuan

fbshipit-source-id: e946c47647e1f8d23d7565cfe93f487845e7f24c
Summary:
Pull Request resolved: pytorch#32147

### Summary

Got some security warnings regarding the ruby dependencies. This diff updates the packages in Gemfile.

```
GitHub has detected that a package defined in the ios/TestApp/Gemfile.lock file of the pytorch/pytorch repository contains a security vulnerability.

Package name: excon
Affected versions: < 0.71.0
Fixed in version: 0.71.0
Severity: LOW

Identifier(s):
GHSA-q58g-455p-8vw9
CVE-2019-16779
```

### Test Plan

- Won't affect the existing iOS CI jobs

Test Plan: Imported from OSS

Differential Revision: D19400087

Pulled By: xta0

fbshipit-source-id: 34b548d136cfd6b68fcc53bf0b243461bd7afd64
Summary:
This was not tested before, fixes pytorch#32139 (which was actually a false positive, functions with kwargs but without defaults on those kwargs are supported). This PR adds testing for both cases and cleans up the error reporting.
](https://our.intern.facebook.com/intern/diff/19385828/)
Pull Request resolved: pytorch#32146

Pulled By: driazati

Differential Revision: D19385828

fbshipit-source-id: 5eab74df6d02f8e1d7ec054cafb44f909f9d637e
…740f8f (pytorch#32125)

Summary:
Pull Request resolved: pytorch#32125

Previous import was 57ebc587fcf3913b4be93653b0dd58c686447298

Included changes:
- **[65020daa](onnx/onnx@65020daa)**: better error message for undefined inputs (pytorch#2540) <Yuxin Wu>
- **[8afff0e9](onnx/onnx@8afff0e9)**: bump ORT version (pytorch#2538) <Lu Fang>
- **[3d9ca57e](onnx/onnx@3d9ca57e)**: fix name of directory (pytorch#2537) <Prasanth Pulavarthi>
- **[df8fa2c9](onnx/onnx@df8fa2c9)**: Repository guidelines (pytorch#2539) <Prasanth Pulavarthi>
- **[49cc2f0](onnx/onnx@49cc2f02)**: Update CircleCI job to use Python3.6 (pytorch#2527) <bddppq>
- **[25ff79a4](onnx/onnx@25ff79a4)**: Fix wrong model version, it's not 12 (the onnx_opset_version()), not 11 (the opset version of the latest stable), but 10 (pytorch#2478) <daquexian>
- **[7cebaed5](onnx/onnx@7cebaed5)**: Fix Windows py3.5 CI (pytorch#2529) <bddppq>
- **[eddae00e](onnx/onnx@eddae00e)**: Correct the order of arguments of InferShapes (pytorch#2500) <Shinichiro Hamaji>
- **[41b5afe6](onnx/onnx@41b5afe6)**: Include <ostream> in common/status.h (pytorch#2519) <Casey Carter>
- **[423f1977](onnx/onnx@423f1977)**: add 8 bit support to maxpool op (pytorch#2510) <Ashwini Khade>
- **[78593c2f](onnx/onnx@78593c2f)**: add 8 bit support to reducemin and reducemax ops (pytorch#2516) <Ashwini Khade>

Test Plan: cont build

Reviewed By: benoitsteiner

Differential Revision: D19380034

fbshipit-source-id: ddce8450864a611773b2a32e2f0254c9bb6b6906
…OS CI (pytorch#32072)

Summary:
Currently, libtorch build and test are not running in macOS CI. This PR fixes the issue.

**Test Plan:**
Check that libtorch build and test are running again in macOS CI.
Pull Request resolved: pytorch#32072

Differential Revision: D19391909

Pulled By: yf225

fbshipit-source-id: 1ab345b099869f78e1124f1a8bd185fa51371b6a
…tten to (pytorch#28455)

Summary:
fixes pytorch#28360
Pull Request resolved: pytorch#28455

Differential Revision: D19374601

Pulled By: Krovatkin

fbshipit-source-id: 622f24b40aba03e79e55a6b8d25d88417f7d8bad
Summary: Pull Request resolved: pytorch#32169

Differential Revision: D19393236

Pulled By: anjali411

fbshipit-source-id: 5dac6b0a4038eb48458d4a0b253418daeccbb6bc
Summary:
Pull Request resolved: pytorch#32133

We should do this to better debug the test.

Differential Revision: D19375479

fbshipit-source-id: 8c2bf61bae605a38252bb793b091ade479bea11a
…2190)

Summary:
Pull Request resolved: pytorch#32190

We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.

- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.
ghstack-source-id: 96693296

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn

buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_rref_leak
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id
```

# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```

```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```

Differential Revision: D19399908

fbshipit-source-id: 1dee607cd49adafe88534621a1c85e2736e2f595
…2104)

Summary:
Pull Request resolved: pytorch#32104

Fixes these warnings:
```
xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(96,17): warning: use 'template' keyword to treat 'data' as a dependent template name
            W.t.data<uint8_t>(),
                ^
                template
xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(97,17): warning: use 'template' keyword to treat 'data' as a dependent template name
            B.t.data<int32_t>(),
                ^
                template
```

Test Plan: Tested locally with clang-cl and CI for other toolchains

Reviewed By: boguscoder

Differential Revision: D19353563

fbshipit-source-id: c28afb8c1ad72fd77ef82556ba89fcf09100d1f9
…se_ops_test (pytorch#32086)

Summary:
Pull Request resolved: pytorch#32086

np.clip(1, num_indices // 2, 10) -> np.clip(num_indices // 2, 1, 10)
Also change batchsize -> num_rows to match with what the variable actually does

Test Plan: CI

Reviewed By: hx89

Differential Revision: D19361521

fbshipit-source-id: 9ce864c7d7da046dc606afa5207da677ccf80f52
Summary:
Pull Request resolved: pytorch#32179

Tensors are used as keys in dictionaries, so we need to annotate that key insertion into a dictionary inserts the key into the wildcard set. Also fixes bug with `listCopyAndSort` not copying the input list.

Test Plan: Imported from OSS

Differential Revision: D19397555

Pulled By: eellison

fbshipit-source-id: 17acdc22ff5e2dda44fd25c80450396f5592095e
Summary:
Pull Request resolved: pytorch#32185

Previously we would unify the contained types of dictionaries, however this breaks type safety.
```
torch.jit.script
def test(input: Dict[str, None], cond):
    if cond:
        out = input
    else:
        out: {"1": 1}
    out["hi"] = 3
```

This would only occur if a dictionary is being re-assigned across an if condition with different contained types, which is pretty unlikely. I tested `model_backward_compatibility` for all fb models and this didn't break anything. This PR is a precursor to alias analysis changes.

Also fixes `Future` type unification. Because `Future` is an immutable type, it is okay to unify the contained type.

Test Plan: Imported from OSS

Differential Revision: D19398585

Pulled By: eellison

fbshipit-source-id: ebc8812cdf5b6dba37b1cfbc2edc7d8c467b258c
…rch#32187)

Summary:
Pull Request resolved: pytorch#32187

Fixes pytorch#32058. Previously we would build documentation during the pytorch
linux cuda build. We don't actually need to do this because we have a
dedicated python_doc_build job that builds the docs. With this change,
the CUDA build should run ~10 minutes faster, giving devs faster signal.

Test Plan: - Check the CUDA (10.1) build on this PR, make sure it doesn't build the docs.

Differential Revision: D19400417

Pulled By: zou3519

fbshipit-source-id: e8fb2b818146f33330e06760377a9afbc18a71ed
Summary:
Just update the comment to make it accurate.
Pull Request resolved: pytorch#32222

Differential Revision: D19410428

Pulled By: albanD

fbshipit-source-id: ad13596382613c2728e674a47049ea4f563964b9
)

Summary:
"in_features" and "out_features" are not defined. Possibly a typo. They should be "input_features" and "output_features" instead
Pull Request resolved: pytorch#31682

Differential Revision: D19251685

Pulled By: zou3519

fbshipit-source-id: ac9e524e792a1853a16e8876d76b908495d8f35e
…ytorch#32209)

Summary:
Pull Request resolved: pytorch#32209

* Deprecate use of scipy.misc.logsumexp and scipy.misc.comb.
* Removed in 1.0.0 https://docs.scipy.org/doc/scipy-1.1.0/reference/generated/scipy.misc.logsumexp.html and https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.misc.comb.html
* Use scipy.special.logsumexp and scipy.special.comb instead.
* This diff updates most usages of except those in experimental folders.
* This diff does NOT fix existing lint/code/TARGETS issues.
* This diff does NOT autoformat codes.

Test Plan: sandcastle auto unittests

Differential Revision: D19406460

fbshipit-source-id: 2103fa0d674d9671a0175f4ce54b3c887d22f04e
Summary:
Pull Request resolved: pytorch#32154

TensorTypeId -> DispatchKey
	c10/core/TensorTypeId.h -> c10/core/DispatchKey.h
	c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp
	TensorTypeId::* -> DispatchKey::*
	TensorTypeId type_id -> DispatchKey dispatch_key
		type_id -> dispatch_key
	TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys
	RealTensorTypeId -> RealDispatchKey
TensorTypeSet -> DispatchKeySet
	TensorTypeIds -> DispatchKeys
	c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h
	c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp
	type_set() -> key_set()
	type_set_ -> key_set_
	typeSet -> keySet
ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard
IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard
LocalTensorTypeSet -> LocalDispatchKeySet
	c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h
	c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp
	tls_local_tensor_type_set -> tls_local_dispatch_key_set
	tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded
	tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded
	tls_is_tensor_type_id_included -> tls_is_dispatch_key_included
	tls_set_tensor_type_id_included -> tls_set_dispatch_key_included
MultiDispatchTensorTypeSet -> MultiDispatchKeySet
	multi_dispatch_tensor_type_set -> multi_dispatch_key_set
tensorTypeIdToBackend -> dispatchKeyToBackend
backendToTensorTypeId -> backendToDispatchKey
initForTensorTypeSet -> initForDispatchKeySet
inferred_type_set -> inferred_key_set
computeTensorTypeId -> computeDispatchKey
PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set
get_default_tensor_type_id -> get_default_dispatch_key
inferred_type_id -> inferred_dispatch_key
actual_type_id -> actual_dispatch_key
typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_
get_type_id() -> get_dispatch_key()
legacyExtractTypeId -> legacyExtractDispatchKey
extractTypeId -> extractDispatchKey

Test Plan: Imported from OSS

Differential Revision: D19398900

Pulled By: pbelevich

fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776
Summary:
Pull Request resolved: pytorch#31970

Now that the ClassType can be shared among different module instances, we'll
preserve the sharing in clone as well, that is if the original module has
a ClassType that is shared, we'll clone this ClassType once and share it between
different module instances as well.

Test Plan:
build/test/test_jit

Imported from OSS

Differential Revision: D19406251

fbshipit-source-id: 2881c695f6e718e5432040a3817cf187a62017bf
Summary:
Closes pytorch#31287
Pull Request resolved: pytorch#31288

Differential Revision: D19166753

Pulled By: zou3519

fbshipit-source-id: da31ad323b8fafa7cbc502fda4e2eb6e02facfb6
Summary:
Pull Request resolved: pytorch#31713

- In case the callbacks are heavy/slow, the other threads should be able to start work on the value of the future after the current thread moves the value and unlock the mutex.
- `completed()` is not inlined. Avoid function call overhead.

ghstack-source-id: 96694593

Test Plan: tdb

Differential Revision: D5624371

fbshipit-source-id: 5762e6e894d20108ec9afedd1a6e64bcd97ee3fe
Summary:
torch.onnx.export docs contain two descriptions for 'example_outputs' arg.
So combined the information for it with the description with the parameters.
Pull Request resolved: pytorch#31826

Differential Revision: D19274928

Pulled By: zou3519

fbshipit-source-id: cbcce0a79c51784c1d7aa8981aab8aac118ca9b4
Summary: Pull Request resolved: pytorch#31482

Test Plan: Imported from OSS

Differential Revision: D19303243

Pulled By: albanD

fbshipit-source-id: 5afdfeb4b8382c09b9ec65acd545148ed76d4285
…ytorch#31892)

Summary:
Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up.
Pull Request resolved: pytorch#31892

Test Plan: No functional changes, no tests yet.

Differential Revision: D19290739

Pulled By: agolynski

fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a
Summary: Pull Request resolved: pytorch#32158

Test Plan: Imported from OSS

Differential Revision: D19405980

Pulled By: mrshenli

fbshipit-source-id: 808ef1c71b637546f8872375bf1828967b1a5a60
jspark1105 and others added 16 commits January 28, 2020 14:07
Summary:
Pull Request resolved: pytorch#32711

Original commit changeset: 4f29d34523ef

Test Plan: CI

Differential Revision: D19603967

fbshipit-source-id: af3f647fff416a84290a42217747948bac4d73c6
Summary:
Stacked PRs
 * pytorch#32244 - Make zip serialization the default
 * **pytorch#32241 - Split serialization tests to their own file**

This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`.
](https://our.intern.facebook.com/intern/diff/19415826/)
Pull Request resolved: pytorch#32241

Pulled By: driazati

Differential Revision: D19415826

fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b
Summary:
Pull Request resolved: pytorch#32720

Original commit changeset: 5ebc4c978af5

Test Plan: existing tests

Reviewed By: chenyangyu1988

Differential Revision: D19603336

fbshipit-source-id: 56051a716c4eedf49cfe7367ff447b4b9c5429ea
…orch#31627)

Summary:
This method is pretty hot.  In an internal workload, this single
call to at() accounted for ~2% of overall cycles.
Pull Request resolved: pytorch#31627

Reviewed By: yinghai

Differential Revision: D19607779

Pulled By: qizzzh

fbshipit-source-id: 1684919049a35fdad686d8396c7dce7243ab92d4
Summary: Pull Request resolved: pytorch#32723

Differential Revision: D19607256

Pulled By: mlacayo

fbshipit-source-id: 2993014d4d90fa26acd5bc01ed7494cc43a29a62
Summary:
Included the ONNX model checker code in the ONNX export
this will force onnx checker to run for all models that get exported.
This should help with validating exported models.
Pull Request resolved: pytorch#32298

Reviewed By: hl475

Differential Revision: D19538251

Pulled By: houseroad

fbshipit-source-id: eb20b124fe59200048f862ddaf20f6c59a0174d5
Summary:
Pull Request resolved: pytorch#32742

As Title says (Check pytorch#32644).
ghstack-source-id: 97352793

Test Plan: CI

Differential Revision: D19611029

fbshipit-source-id: 9f4a155c909f419e41c1d7078eb2796dd17cedd2
Summary: Pull Request resolved: pytorch#32717

Reviewed By: xianjiec

Differential Revision: D19604954

fbshipit-source-id: c02eccf048c0dba3f66d729ab1fda50f3cacef63
Summary:
Pull Request resolved: pytorch#32501

This diff will address pytorch#24699

We ask the input `lambda` to be >= 0 to be same as https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.exponential.html#numpy-random-exponential. This does not exist in the previous implementation.

Benchmark I am using PT operator microbenchmark
```
================================================================================
Before the change, Program Output:
================================================================================
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: exponential_
# Mode: Eager
# Name: exponential__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 21311.746

================================================================================
After the change, Program Output:
================================================================================
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: exponential_
# Mode: Eager
# Name: exponential__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 20919.914

================================================================================
```

Test Plan: Sandcastle and Github tests

Reviewed By: BIT-silence

Differential Revision: D19518700

fbshipit-source-id: 0e79cb6a999c1278eb08b0d94cf61b119c85a36c
Summary: ATT. Since the infra is there.

Test Plan: run it

Reviewed By: amylittleyang

Differential Revision: D19605250

fbshipit-source-id: c68be4d7963afa4fa5f8f60c90f1913605eae516
…#32251)

Summary:
Pull Request resolved: pytorch#32251

Previously wildcard sets were associated by TypeKind, meaning all Lists were in one alias set, all Classes were in one alias set, etc. We can improve analysis by bucketing wildcard sets by TypePtr instead. Any two mutable types which can unify should be in the same wildcard set bucket.

This also allows us do much simpler `mayContainAlias` analysis, and also improves `analyzeConservative` analysis because now we can recurse through all contained memory locations and mark writes, instead of just recursing only level deep in contained elements.

Test Plan: Imported from OSS

Differential Revision: D19563263

Pulled By: eellison

fbshipit-source-id: 371a37d1a8596abc6c53f41c09840b6c140ea362
Summary:
Pull Request resolved: pytorch#32326

Now that we have type-level granularity we can improve `mayContainAlias` queries. Each new values is initialized as containing the wildcard set of each contained mutable type. Whenever a value is added to a container it is set to the wildcard set. Now, to check if any two values contain overlapping values, we can just check if the `containedMemoryLocations` of two sets overlap.

Test Plan: Imported from OSS

Differential Revision: D19563262

Pulled By: eellison

fbshipit-source-id: c6d7489749c14b2054a6d50ef75baca699ada471
@ptrblck
Copy link
Collaborator Author

ptrblck commented Jan 29, 2020

Ignore this, I'll rebase it properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CuDNN batchnorm has batch size limit for eval with channel