New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SLEEF for libm float and double #6725

Merged
merged 1 commit into from May 2, 2018

Conversation

Projects
None yet
5 participants
@cpuhrsch
Contributor

cpuhrsch commented Apr 18, 2018

Speeds up vectorized operations on contiguous tensors while preserving the ULP required by libc's math library.

In particular, the "_u10" part of the function means that it has a ULP of 1.0.

This library comes with different valid ranges. For example sinf has a valid range of [-5e+9, 5e+9]. This needs to be accounted for manually (work in progress).

EDIT: It was decided that we'll add a flag to functions where the corresponding vectorized implementation does not cover the entire range of floating pointer values and will disable them by default.

@o8ht88z00f o8ht88z00f referenced this pull request Apr 18, 2018

Closed

[auto] pytorch-pr-6725 #1649

@cpuhrsch cpuhrsch changed the title from Add SLEEF for float and double to Add SLEEF for libm float and double Apr 18, 2018

.gitmodules Outdated
@@ -77,3 +77,6 @@
[submodule "third_party/onnx"]
path = third_party/onnx
url = https://github.com/onnx/onnx.git
[submodule "aten/src/ATen/cpu/sleef"]

This comment has been minimized.

@ezyang

ezyang Apr 22, 2018

Contributor

Can we start adding these libraries to top-level third_party instead of inside of ATen?

This comment has been minimized.

@cpuhrsch

cpuhrsch Apr 26, 2018

Contributor

Moved it.

@cpuhrsch

This comment has been minimized.

Contributor

cpuhrsch commented May 2, 2018

Timings for newly vectorized functions.

Master
$ taskset -c 0 perf stat /private/home/cpuhrsch/miniconda2/bin/python unary_comp.py
erf_       size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.8720 type: torch.FloatTensor    dim: 3
exp_       size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.3596 type: torch.FloatTensor    dim: 3
expm1_     size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.6120 type: torch.FloatTensor    dim: 3
log_       size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.6009 type: torch.FloatTensor    dim: 3
log1p_     size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.6070 type: torch.FloatTensor    dim: 3
rsqrt_     size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.6829 type: torch.FloatTensor    dim: 3
erf        size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.9626 type: torch.FloatTensor    dim: 3
exp        size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.5030 type: torch.FloatTensor    dim: 3
expm1      size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.6780 type: torch.FloatTensor    dim: 3
log        size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.7591 type: torch.FloatTensor    dim: 3
log1p      size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   1.6743 type: torch.FloatTensor    dim: 3
rsqrt      size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.7634 type: torch.FloatTensor    dim: 3

Branch
$ taskset -c 0 perf stat /private/home/cpuhrsch/miniconda2/bin/python unary_comp.py
erf_       size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.9195 type: torch.FloatTensor    dim: 3
exp_       size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.1244 type: torch.FloatTensor    dim: 3
expm1_     size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.4478 type: torch.FloatTensor    dim: 3
log_       size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.2981 type: torch.FloatTensor    dim: 3
log1p_     size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.3259 type: torch.FloatTensor    dim: 3
rsqrt_     size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.1390 type: torch.FloatTensor    dim: 3
erf        size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.9952 type: torch.FloatTensor    dim: 3
exp        size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.2064 type: torch.FloatTensor    dim: 3
expm1      size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.5290 type: torch.FloatTensor    dim: 3
log        size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.3769 type: torch.FloatTensor    dim: 3
log1p      size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.3893 type: torch.FloatTensor    dim: 3
rsqrt      size: 10^5   count: 1      size: [464, 464, 464]      stride: [' 215296', '    464', '      1']                            numel: 99897344  elapsed:   0.2194 type: torch.FloatTensor    dim: 3

@MlWoo

This comment has been minimized.

Contributor

MlWoo commented May 2, 2018

@cpuhrsch Could you provide me with the test scripts? I think it's not a bad idea to compare the perf of the version with that of vml.

@apaszke

Haven't reviewed CMake. Mostly LGTM

backends:
- CPU
- CUDA
name: _tan

This comment has been minimized.

@apaszke

apaszke May 2, 2018

Member

Can we call all of them _th_*? It's unclear what's different in a _tan() call compared to tan()

This comment has been minimized.

@cpuhrsch

cpuhrsch May 2, 2018

Contributor

_th_ indicates that it'd otherwise conflict with something from TH. For example tanh.

This comment has been minimized.

@apaszke

apaszke May 2, 2018

Member

Yeah, but right now it's super inconsistent. Some functions will use the _th_ form, others will only have _. Prefixing all of them with _th would just seem easier to reason about (and would be clear what happens when you use them in your code). Not a big deal though.

Tensor& fill_(Tensor& self, const Tensor& value) {
return self._fill_(value);
}

This comment has been minimized.

@apaszke

apaszke May 2, 2018

Member

Lol why can't we leave fill_ as it was then?

This comment has been minimized.

@cpuhrsch

cpuhrsch May 2, 2018

Contributor

I created this stub to figure out how to replace it correctly (since it takes a scalar value). I'll add the vectorized implementation soon. I'd prefer to keep this and then add all the other stuff quickly in another PR (see the TODO section).

@@ -396,7 +399,8 @@ def cosh(x):
try:
return math.cosh(x)
except OverflowError:
return float('inf') if x > 0 else float('-inf')
# http://en.cppreference.com/w/cpp/numeric/math/cosh

This comment has been minimized.

@apaszke

apaszke May 2, 2018

Member

Can you add an extra comment here? It's unclear why should someone go there, and what should they look for.

This comment has been minimized.

@cpuhrsch

cpuhrsch May 2, 2018

Contributor

This reference shows that for an overflow std::cosh overflows to inf in any case, no matter the sign of the argument. If this is not changed it'll fail and why not stick to the standard.

@@ -260,6 +260,9 @@ def _test_math(self, torchfn, mathfn, input=None):
input = []
input.append(list(range(-5, 5)))
input.append([x + 1e-6 for x in range(-5, 5)])
# Some vectorized implementations don't support large ranges
input.append([x + 1e10 for x in range(-5, 5)])
input.append([x - 1e10 for x in range(-5, 5)])

This comment has been minimized.

@apaszke

apaszke May 2, 2018

Member

Aren't we going with SLEEF for large inputs anyway, since the precision at this scale is too low?

This comment has been minimized.

@cpuhrsch

cpuhrsch May 2, 2018

Contributor

Hm, what do you mean by that? SLEEF is just as precise as libm (1ULP) for all the vectorized functions in this PR.

This comment has been minimized.

@apaszke

apaszke May 2, 2018

Member

Oh ok. I thought SLEEF has those guarantees only when abs(x) < 1e10, so this wouldn't be the case, but maybe I misunderstood something.

This comment has been minimized.

@cpuhrsch

cpuhrsch May 2, 2018

Contributor

That's the case for the trigonometric functions (sin, cos, etc.) and they all have various regions. We'll need to deal with that separately.

@cpuhrsch

This comment has been minimized.

Contributor

cpuhrsch commented May 2, 2018

@MlWoo Thank you for offering to look into this. I'm fully convinced Intel MKL VML is faster than this. But it doesn't provide us with short vector implementations. Intel SVML does, but it's specific to icc as far as I know. Also VML doesn't provide us with an implementation of softmax or other more ML specific function approximations as far as I know. So, we'll need to add SLEEF in any case. Also, as far as I know, MKL VML is not open source and comes with it's own threading library. The plan is to integrate VML in any case, since many users to have access to MKL, but I'd prefer to do that in another PR. I'm adding a lot more timings soon to get a wider understanding of the perf here, I'll add benchmark script alongside that.

Christian Puhrsch
@cpuhrsch

This comment has been minimized.

Contributor

cpuhrsch commented May 2, 2018

Command: 
$ taskset -c 0 perf stat python unary_comp.py
                                                                                                                                                                                                            Branch   Master
acos_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.1517   1.9183
asin_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.8858   2.6810
atan_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.0440   2.3369
erf_       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   2.3798   1.7685
exp_       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.3553   2.5604
expm1_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.2715   3.0089
log_       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.7799   1.6440
log10_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.8359   1.5597
log1p_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.8101   2.6163
log2_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.8255   1.2799
rsqrt_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.3447   1.6849
acos       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.4053   3.6224
asin       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.1181   3.3192
atan       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.2559   4.2657
erf        memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   2.5989   2.3763
exp        memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.5444   3.5639
expm1      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.4211   4.1880
log        memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.9978   4.1999
log10      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.0550   5.3410
log1p      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.0254   4.1792
log2       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   1.0463   3.5762
rsqrt      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   0.5622   1.9105
acos_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.4928   2.8882
asin_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.0030   4.6732
atan_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.4583   9.0569
erf_       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   7.0484   1.8329
exp_       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   0.9457  22.9256
expm1_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   3.2400   3.1798
log_       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   1.9607  20.5083
log10_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.0516   2.4911
log1p_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.1793   2.5732
log2_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.2580   1.3894
rsqrt_     memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   1.2637   2.4888
acos       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   3.3793   6.5379
asin       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.5023   6.7188
atan       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.8744   8.5583
erf        memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   7.4680   2.7230
exp        memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   1.3532  29.3864
expm1      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   3.6106   4.5248
log        memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.3265   9.4931
log10      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.4229   9.3373
log1p      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.5026   4.6397
log2       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   2.6288   4.1076
rsqrt      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   1.6803   2.9060

@MlWoo I used this script to get these timings. I'll still need to check for regressions for functions I touched without vectorizing them: cosh, sinh, logh, tanh.

Except for erf we see an average speedup of 2x, which is skewed by some of the 10x+ speedups.

@cpuhrsch

This comment has been minimized.

Contributor

cpuhrsch commented May 2, 2018

Command:
$ taskset -c 0 perf stat /private/home/cpuhrsch/miniconda2/bin/python unary_comp.py
                                                                                                                                                                                                            Branch   Master
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   2.5891   2.7025
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   2.5878   2.3791
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   3.1100   5.4281
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   3.1073   3.0097
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.5547   6.4665
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.5550   5.7426
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.8367  10.3669
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.8381   6.6344
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   5.3930   5.8535
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   5.3996   5.3044
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   5.7780   9.7723
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   5.7788   5.9249
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.3247   5.6126
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.1694   4.9121
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:  10.3552   9.0592
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.7299   5.6690
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   8.2897   7.4305
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   8.1219   6.6375
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:  12.2621  11.0027
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   8.3045   7.4596
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   7.1711   7.1856
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   6.9551   6.5375
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:  11.3500  11.0331
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.FloatTensor  dim: 5     elapsed:   7.1671   7.1707
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.2582   2.2141
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.2571   1.9871
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.2734   2.4468
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.2768   2.0613
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.5985   2.5966
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.6005   2.1778
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.6802   2.7173
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   2.6848   2.3949
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   4.3644   4.8065
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   4.3643   4.0399
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   4.4168   5.0669
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   4.4142   4.4207
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   6.7849   5.5861
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   6.4834   4.8040
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   7.7771   5.9853
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   6.8844   5.3808
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   8.8616   7.5683
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   8.4936   6.5620
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   9.1968   7.7312
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   8.5136   7.2452
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   7.7990   7.3576
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   7.2996   6.4564
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   8.1363   7.6300
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.FloatTensor  dim: 5     elapsed:   7.3228   6.9526
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.6452   1.8650
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.6438   1.5568
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.6715   2.0004
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.6705   1.6570
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.5971   1.8163
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.6140   1.4658
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.6563   2.0001
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   1.6514   1.6402
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   3.2782   4.0267
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   3.2761   3.1363
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   3.3404   4.2887
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   3.3438   3.4964
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   5.3278   4.5098
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   4.7760   3.7878
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   5.8208   4.8728
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   4.9010   4.2955
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   6.7591   5.9875
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   6.2615   5.1188
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   7.7023   6.2119
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   6.5316   5.7322
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   6.1111   5.9245
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   5.3783   5.0613
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   6.4011   6.1820
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.FloatTensor  dim: 5     elapsed:   5.4206   5.5031
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   8.1336   4.3476
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   8.1335   3.1824
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  11.3922   6.4561
cosh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  11.3907   6.9254
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   7.4790   7.1503
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   7.4795   6.2661
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  14.3201  11.0709
sinh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  14.3387  10.7166
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   5.3856   6.3043
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   5.3867   5.3274
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  11.3527  10.7641
tanh_      memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  11.3501  10.6167
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  24.4912   7.6408
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  24.2500   6.6494
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  27.4797  11.8339
cosh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  27.1596  11.5234
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   8.9446   8.1183
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   8.4417   7.4450
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  13.5665  12.2714
sinh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  12.7447  11.7223
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '      1', '     25']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   8.3958   8.0694
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: [' 390625', '  15625', '    625', '     25', '      1']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:   7.6539   7.1490
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '     18', '    450']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  14.9351  12.2609
tanh       memory: O(10^4)KB  count: 25     size: [25, 25, 25, 25, 25] stride: ['7031250', ' 281250', '  11250', '    450', '     18']      numel: 9765625   type: torch.DoubleTensor dim: 5     elapsed:  15.7175  11.7494
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   2.4915   2.5583
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   2.4943   2.3809
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   2.7291   2.7028
cosh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   2.7293   2.5105
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   2.7461   2.7223
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   2.7347   2.2681
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   3.3288   3.1060
sinh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   3.3204   2.8195
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   4.2483   4.7579
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   4.2480   4.0077
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   4.8053   5.9527
tanh_      memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   4.8021   4.6924
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   8.5136   7.2887
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   8.1550   6.3952
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:  10.8530   8.5458
cosh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   9.1037   7.2073
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   9.1390   7.8359
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   8.6532   7.1613
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:  10.6318   8.8416
sinh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   8.8870   7.6117
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '      1', '     10']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   8.2677   7.7980
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: ['  10000', '   1000', '    100', '     10', '      1']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   7.7801   6.8986
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '     18', '    180']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:  10.4059   8.9055
tanh       memory: O(10^2)KB  count: 2500   size: [10, 10, 10, 10, 10] stride: [' 180000', '  18000', '   1800', '    180', '     18']      numel: 100000    type: torch.DoubleTensor dim: 5     elapsed:   8.6071   7.5443
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.8237   2.1299
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.8203   1.8734
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.9200   2.1946
cosh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.9199   2.0017
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.6658   2.1593
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.6725   1.5419
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.9877   2.1883
sinh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   1.9877   1.9618
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   3.2649   4.0071
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   3.2627   3.1124
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   4.2319   4.6465
tanh_      memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   4.2199   3.7329
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   6.4991   5.7979
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   5.9756   5.0264
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   7.3297   6.5532
cosh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   6.2160   5.7050
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   7.0411   6.2900
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   6.4476   5.6171
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   8.4923   6.8943
sinh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   7.3673   6.0459
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      1', '      6']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   6.4698   6.2874
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['   1296', '    216', '     36', '      6', '      1']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   5.8012   5.3956
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '     18', '    108']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   7.8144   6.9672
tanh       memory: O(10^1)KB  count: 25000  size: [6, 6, 6, 6, 6]      stride: ['  23328', '   3888', '    648', '    108', '     18']      numel: 7776      type: torch.DoubleTensor dim: 5     elapsed:   6.5572   5.9968
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   2.6316   2.6982
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   2.6322   2.4203
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   3.1628  10.0484
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   3.1616   3.0675
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   6.6711   6.6598
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   6.6679   5.8477
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   6.9588  20.3463
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   6.9553   6.7508
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   5.4915   6.0613
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   5.4891   5.3987
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   5.8800  25.4129
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   5.8808   6.0298
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   7.4229   6.6017
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   6.2785   4.9935
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:  20.5386  18.2167
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   6.8371   5.7653
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   9.3778   8.4048
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   8.2697   6.7505
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:  25.6932  25.6437
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   8.4460   7.5845
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   8.1796   8.2093
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   7.0822   6.6501
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:  26.0725  26.8499
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.FloatTensor  dim: 3     elapsed:   7.2882   7.2953
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.1389   2.1041
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.1374   1.9322
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.1551   2.5081
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.1545   2.0065
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.4778   2.3799
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.4804   2.1185
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.5597   2.9544
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   2.5481   2.3374
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   4.1881   4.3850
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   4.1852   3.9340
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   4.2356   6.8893
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   4.2355   4.3029
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   6.2951   5.3045
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   6.2024   4.6680
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   9.6152   8.3201
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   6.5858   5.2407
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   8.2371   7.1096
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   8.1660   6.3968
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:  10.7004   9.4785
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   8.1768   7.0538
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   7.1022   6.8432
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   6.9999   6.2873
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:  10.1710   9.8194
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.FloatTensor  dim: 3     elapsed:   7.0165   6.7715
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.9436   2.1225
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.9419   1.8478
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.9688   2.2290
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.9684   1.9617
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.8928   2.0595
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.8919   1.7387
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.9192   2.2607
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   1.9405   1.9497
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   3.8880   4.2626
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   3.8854   3.7324
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   3.9541   4.7691
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   3.9545   4.1492
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   5.8815   5.1240
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   5.6350   4.4871
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   6.7861   5.8348
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   5.7830   5.0987
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   7.6256   6.8703
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   7.4766   6.1018
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   8.5821   7.3244
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   7.7266   6.8047
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   6.6479   6.6245
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   6.3983   6.0166
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   7.2256   7.1994
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.FloatTensor  dim: 3     elapsed:   6.4443   6.5346
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   3.3894   5.0686
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   3.3914   3.2485
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   7.9528  12.5422
cosh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   7.9666   7.0465
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   7.2901   8.8959
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   7.2964   6.3801
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  14.2151  22.5845
sinh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  14.3469  10.9077
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   5.4816   8.1756
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   5.4809   5.4155
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  11.5542  26.4650
tanh_      memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  11.5512  10.7602
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  10.7706   9.5557
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   8.1095   6.7619
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  27.4127  26.9676
cosh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  15.7539  11.7278
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  10.9171  10.0364
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   8.5955   7.5669
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  28.6285  27.5927
sinh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  12.9789  11.9055
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '      1', '    215']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  10.4209  10.0117
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: ['  46225', '    215', '      1']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:   7.7893   7.2708
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '     18', '   3870']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  27.5052  27.8797
tanh       memory: O(10^4)KB  count: 25     size: [215, 215, 215]      stride: [' 832050', '   3870', '     18']                            numel: 9938375   type: torch.DoubleTensor dim: 3     elapsed:  16.1032  11.9118
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   2.3655   2.4377
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   2.3652   2.3160
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   2.6014   2.7968
cosh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   2.5958   2.4414
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   2.6096   2.5196
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   2.6041   2.2114
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   3.1692   3.1528
sinh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   3.1674   2.7426
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   4.0765   4.3512
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   4.0765   3.9011
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   4.6170   7.2132
tanh_      memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   4.6186   4.5650
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   8.0401   6.9191
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   7.8147   6.2279
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:  11.2889  10.1851
cosh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   8.7351   6.9922
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   8.4244   7.3784
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   8.3202   6.9752
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:  11.5492  10.2462
sinh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   8.5306   7.3983
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '      1', '     46']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   7.6743   7.3177
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['   2116', '     46', '      1']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   7.4583   6.7149
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '     18', '    828']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:  10.7661  10.4294
tanh       memory: O(10^2)KB  count: 2500   size: [46, 46, 46]         stride: ['  38088', '    828', '     18']                            numel: 97336     type: torch.DoubleTensor dim: 3     elapsed:   8.2653   7.3431
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   2.1556   2.4317
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   2.1524   2.2282
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   2.2635   2.5418
cosh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   2.2690   2.3682
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   1.9699   2.2380
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   1.9675   1.8417
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   2.3319   2.4807
sinh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   2.3493   2.3328
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   3.8733   4.2097
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   3.8695   3.7032
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   4.9999   5.4632
tanh_      memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   4.9998   4.4374
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   7.3321   6.6758
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   7.0973   5.9711
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   8.8322   7.8929
cosh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   7.3928   6.7995
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   7.9359   7.1511
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   7.6638   6.6996
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   9.6429   8.2578
sinh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   8.7109   7.1752
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '      1', '     21']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   7.1205   7.0776
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['    441', '     21', '      1']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   6.8990   6.4518
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '     18', '    378']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   8.6332   8.2295
tanh       memory: O(10^1)KB  count: 25000  size: [21, 21, 21]         stride: ['   7938', '    378', '     18']                            numel: 9261      type: torch.DoubleTensor dim: 3     elapsed:   7.7501   7.1375

This branch is usually a bit slower on average than master (~5%) on these non-vectorized functions, however much faster in some cases, presumably because it knows how to sort strides. Not that all of this is restricted to sinh/cosh/tanh.

@shibatch

This comment has been minimized.

shibatch commented May 2, 2018

Why did you modify CMakeLists.txt in sleef?
I think it would be better if you just add sleef as a submodule, from maintainability perspective.
I can make building DFT optional.

@cpuhrsch

This comment has been minimized.

Contributor

cpuhrsch commented May 2, 2018

@shibatch Thanks for looking at this! Sleef is a fantastic library, thank you very much!

I went through many changes, because it wouldn't build with the standard CMAKE setup on all our different platforms. I'm happy to remove all of that again and it is straightforward to drop in the original build setup. I'd want to do this as part of a separate PR. I'll send one out with all this removed and tag you in it, if you want. Then we can resolve the issues that the CI shows.

@ezyang

This comment has been minimized.

Contributor

ezyang commented May 2, 2018

CC @orionr on cmake stuff

@apaszke

apaszke approved these changes May 2, 2018

LGTM, provided @ezyang or @orionr are happy with our CMake strategy

@shibatch

This comment has been minimized.

shibatch commented May 2, 2018

@cpuhrsch Okay. And I will add VML-style API to the development plan.

@@ -0,0 +1,71 @@
IF(MSVC)
option(BUILD_SHARED_LIBS "Build shared libs" ON)
ELSE(MSVC)

This comment has been minimized.

@ezyang

ezyang May 2, 2018

Contributor

You don't have to fix this, but FYI modern cmake style is to not repeat the conditional in else/elseif/endif. So:

if(MSVC)
  ...
else()
  ...
endif()
option(SLEEF_SHOW_ERROR_LOG "Show cmake error log." OFF)
set(SLEEF_VERSION_MAJOR 3)
set(SLEEF_VERSION_MINOR 2)

This comment has been minimized.

@ezyang

ezyang May 2, 2018

Contributor

Picture in my head: Someone's gonna upgrade the sleef submodule and forget to update these version numbers. ;)

This comment has been minimized.

@cpuhrsch

cpuhrsch May 2, 2018

Contributor

Yes :(

endif()
# Function used to generate safe command arguments for add_custom_command
function(command_arguments PROPNAME)

This comment has been minimized.

@ezyang

ezyang May 2, 2018

Contributor

This utility is from sleef proper?

This comment has been minimized.

@cpuhrsch

cpuhrsch May 2, 2018

Contributor

Indeed, I didn't write this.

@ezyang

This comment has been minimized.

Contributor

ezyang commented May 2, 2018

I didn't read the rest of the diff, but I'm signing LGTM for the cmake stuff.

It would be nice to have some description of what the changes are, so it's easier to upstream the necessary changes (since you don't have to go through a diff with a fine toothed comb to find out what was changed.) Also, so you don't forget if you have to put this aside for a bit :)

@cpuhrsch cpuhrsch merged commit 88a7055 into pytorch:master May 2, 2018

35 of 36 checks passed

pr/caffe2-conda2-ubuntu16.04-test Build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
onnx-fb-universe
Details
pr/caffe2-conda2-macos10.13-build Build successful
Details
pr/caffe2-conda3-cuda9.0-cudnn7-ubuntu16.04-build Build successful
Details
pr/caffe2-conda3-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-android-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-clang3.8-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-clang3.9-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-cuda8.0-cudnn5-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-cuda8.0-cudnn6-ubuntu16.04-test Build successful
Details
pr/caffe2-py2-cuda8.0-cudnn7-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-cuda9.0-cudnn7-centos7-build Build successful
Details
pr/caffe2-py2-cuda9.0-cudnn7-ubuntu16.04-test Build successful
Details
pr/caffe2-py2-cuda9.0-cudnn7-windows-build Build successful
Details
pr/caffe2-py2-gcc4.8-ubuntu14.04-test Build successful
Details
pr/caffe2-py2-gcc4.9-ubuntu14.04-build Build successful
Details
pr/caffe2-py2-gcc5-ubuntu16.04-test Build successful
Details
pr/caffe2-py2-gcc6-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-gcc7-ubuntu16.04-build Build successful
Details
pr/caffe2-py2-ios-macos10.13-build Build successful
Details
pr/caffe2-py2-mkl-ubuntu16.04-test Build successful
Details
pr/caffe2-py2-system-macos10.13-build Build successful
Details
pr/pytorch-linux-trusty-py2.7 Build successful
Details
pr/pytorch-linux-trusty-py2.7.9 Build successful
Details
pr/pytorch-linux-trusty-py3.5 Build successful
Details
pr/pytorch-linux-trusty-py3.6-gcc4.8 Build successful
Details
pr/pytorch-linux-trusty-py3.6-gcc5.4 Build successful
Details
pr/pytorch-linux-trusty-py3.6-gcc7.2 Build successful
Details
pr/pytorch-linux-trusty-pynightly Build successful
Details
pr/pytorch-linux-xenial-cuda8-cudnn6-py3 Build successful
Details
pr/pytorch-linux-xenial-cuda9-cudnn7-py2 Build successful
Details
pr/pytorch-linux-xenial-cuda9-cudnn7-py3 Build successful
Details
pr/pytorch-linux-xenial-py3-clang5-asan Build successful
Details
pr/pytorch-macos-10.13-py3 Build successful
Details
pr/pytorch-win-ws2016-cuda9-cudnn7-py3 Build successful
Details

Jorghi12 added a commit to wsttiger/pytorch that referenced this pull request May 10, 2018

weiyangfb added a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment