torch.arange numerics are different after 1.7 update on CPU #47043

coquelin77 · 2020-10-29T08:28:53Z

🐛 Bug

The floor operation is not working properly for float64 dtype. other dtypes not tested.

To Reproduce

Steps to reproduce the behavior:

(base) python
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.arange(-5.0, 5.0, 1.4)
>>> x
tensor([-5.0000, -3.6000, -2.2000, -0.8000,  0.6000,  2.0000,  3.4000,  4.8000])
>>> x.floor()
tensor([-5., -4., -3., -1.,  0.,  2.,  3.,  4.])
>>> x = torch.arange(-5.0, 5.0, 1.4, dtype=torch.float64)
>>> x
tensor([-5.0000, -3.6000, -2.2000, -0.8000,  0.6000,  2.0000,  3.4000,  4.8000],
       dtype=torch.float64)
>>> x.floor()
tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], dtype=torch.float64)

Expected behavior

see code snippet.

Environment

PyTorch version: 1.7.0
Is debug build: True
CUDA used to build PyTorch: 9.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.10 (x86_64)
GCC version: (Ubuntu 10.2.0-13ubuntu1) 10.2.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.0
[pip3] torchvision==0.8.1
[conda] _tflow_select             2.3.0                       mkl  
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               9.2                           0  
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.2.0            py37h23d657b_0  
[conda] mkl_random                1.1.1            py37h0573a6f_0  
[conda] numpy                     1.19.2           py37h54aff64_0  
[conda] numpy-base                1.19.2           py37hfa32c7d_0  
[conda] pytorch                   1.7.0           py3.7_cuda9.2.148_cudnn7.6.3_0    pytorch
[conda] tensorflow                2.2.0           mkl_py37h6e9ce2d_0  
[conda] tensorflow-base           2.2.0           mkl_py37hd506778_0  
[conda] torchvision               0.8.1                 py37_cu92    pytorch

Additional context

Only tested on CPU.

cc @ezyang @gchanan @zou3519 @bdhirsh @heitorschueroff

The text was updated successfully, but these errors were encountered:

coquelin77 · 2020-10-29T12:51:49Z

this is a machine epsilon error i believe. not sure why it ran clean on previous versions

albanD · 2020-10-29T14:47:35Z

You mean that the 2. returned by the arange is atually 2-eps ?

Hi-pri to find the root cause.

coquelin77 · 2020-10-29T14:59:13Z

yes. but only for this specific case. since 1.4 in hex would be 0x1.66666666666p+0 and that isnt easily represented by a float. by the time it gets to the 2. value in the arange it is less than 2.0 but above 2-eps.

albanD · 2020-10-29T19:13:16Z

Ok that sounds good. I am not sure if there is anything we can do on the pytorch side here?

gchanan · 2020-10-30T15:57:52Z

this doesn't explain why the behavior changes in 1.7 (I confirmed it does).

coquelin77 · 2020-11-02T12:43:33Z

does it happen in the nightly build as well? perhaps the release build for 1.7 was built with an updated compiler and there was something that changed there.

gchanan · 2020-11-02T18:15:20Z

I confirmed it's a problem on a recent version of master.

ngimel · 2020-11-02T18:24:54Z

Leaving high priority to figure out why vectorized path gives a different answer.

mruberry · 2020-11-02T18:25:47Z

Possibly changed by #38697.

gchanan · 2020-11-02T19:10:09Z

Confirmed it is caused by #38697.

gchanan · 2020-11-03T23:43:22Z

Ok so #38697 doesn't really vectorize the computation, it unrolls it.

As pointed out by @ngimel, the difference is basically:

>>> a=torch.tensor(-5., dtype = torch.double)
>>> b=torch.tensor(1.4, dtype=torch.double)

>>> a+5*b - 2 #this was computed before, produce exact 2
tensor(0., dtype=torch.float64)

>>> a+4*b+b-2 this is what's computed now
tensor(-4.4409e-16, dtype=torch.float64)

gchanan · 2020-11-03T23:51:22Z

Also, numpy produces the "exact" result (that doesn't mean it does for all cases, I didn't check the implementation):

>>> x = np.arange(-5.0, 5.0, 1.4, dtype=np.float64)
>>> np.floor(x)
array([-5., -4., -3., -1.,  0.,  2.,  3.,  4.])

kurtamohler · 2020-11-05T20:44:32Z

I noticed that CUDA has a similar problem, but in that case it is not a new issue.

import torch
print('device dtype arange-floor-result')
for device in ['cpu', 'cuda']:
    for dtype in [torch.float32, torch.float64]:
        result = torch.arange(-5, 5, 1.4, device=device, dtype=dtype)
        print(f'{device} {dtype} {result.floor()}')
print()
print('device dtype arange-item-5')
for device in ['cpu', 'cuda']:
    for dtype in [torch.float32, torch.float64]:
        result = torch.arange(-5, 5, 1.4, device=device, dtype=dtype)[5].item()
        print(f'{device} {dtype} {result}')

Running the above script in Pytorch 1.6 shows this:

cpu torch.float32 tensor([-5., -4., -3., -1.,  0.,  2.,  3.,  4.])
cpu torch.float64 tensor([-5., -4., -3., -1.,  0.,  2.,  3.,  4.], dtype=torch.float64)
cuda torch.float32 tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], device='cuda:0')
cuda torch.float64 tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], device='cuda:0', dtype=torch.float64)

device dtype arange-item-5
cpu torch.float32 2.0
cpu torch.float64 2.0
cuda torch.float32 1.9999998807907104
cuda torch.float64 1.9999999999999996

Running on the current main branch shows this:

cpu torch.float32 tensor([-5., -4., -3., -1.,  0.,  2.,  3.,  4.])
cpu torch.float64 tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], dtype=torch.float64)
cuda torch.float32 tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], device='cuda:0')
cuda torch.float64 tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], device='cuda:0', dtype=torch.float64)

device dtype arange-item-5
cpu torch.float32 2.0
cpu torch.float64 1.9999999999999996
cuda torch.float32 1.9999998807907104
cuda torch.float64 1.9999999999999996

Should a separate issue be opened for the difference between CUDA and CPU as well?

kurtamohler · 2020-11-05T21:17:19Z

It seems to me that we can solve this issue by creating a Vec256::arange_index_offset() function to use instead of the Vec256::arange() function. It would look something like this:

  template<typename step_t>
  static Vec256<T> arange_index_offset(T base = static_cast<T>(0), step_t step = static_cast<step_t>(1), int64_t index_offset = 0) {
    Vec256 vec;
    int64_t index_end = index_offset + size();
    for (int64_t i = index_offset; i < index_end; i++) {
      vec.values[i] = base + i * step;
    }   
    return vec;
  }

Currently, each call to Vec256::arange() underneath the torch.arange() call is given a different base argument, so the element-wise result is like start + (base_offset * step) + (vec256_offset * step). With the introduction of Vec256::arange_index_offset(), the element-wise result would be start + (index_offset + vec256_offset) * step, which should solve this issue as far as I understand.

kurtamohler · 2020-11-06T21:18:43Z

Well I implemented arange_index_offset and it did not solve the problem on its own. I tried disabling vectorization, just to see if the simplified formula would fix the numerical issue, but the issue still persists. This is the kernel without vectorization:

      cpu_serial_kernel(
          it,
          [start, step, &idx]() -> scalar_t {
            return start + step * (idx++);
          },
          {p_begin, p_end});

Since that formula doesn't fix the issue, I think the problem is slightly more nuanced. There must be an additional numerical problem somewhere. I'm looking into what it might be.

ngimel · 2020-11-06T21:20:52Z

It's baffling, because gpu kernel is doing pretty much what formula above suggests, and yet it still has numerical issues.

kurtamohler · 2020-11-06T21:50:09Z

I believe the issue is AVX2. Consider the following program, which essentially performs a simplified torch.arange(-5, 5, 1.4, dtype=torch.double) and prints the result:

arange_test.cpp

#include <stdio.h>
#include <stdlib.h>

int main() {
  double start = -5; 
  double step = 1.4;
  int64_t num_steps = 8;

  printf("step: %.20e\n", step);
  printf("start: %.20e\n", start);

  for (int64_t ind = 0; ind < num_steps; ind++) {
    printf("%ld: %.20e\n", ind, (start + step * ind));
  }
}

Build it with and without AVX2:

Makefile

all: arange_test arange_test_avx2

arange_test: arange_test.cpp
	g++ -o $@ $^ -std=gnu++14 -O3

arange_test_avx2: arange_test.cpp
	g++ -o $@ $^ -std=gnu++14 -O3  -mavx2 -mfma -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store -DCPU_CAPABILITY=AVX2 -DCPU_CAPABILITY_AVX2 -MD

We get different results with each binary:

$ ./arange_test
step: 1.39999999999999991118e+00
start: -5.00000000000000000000e+00
0: -5.00000000000000000000e+00
1: -3.60000000000000008882e+00
2: -2.20000000000000017764e+00
3: -8.00000000000000710543e-01
4: 5.99999999999999644729e-01
5: 2.00000000000000000000e+00
6: 3.39999999999999857891e+00
7: 4.79999999999999893419e+00

$ ./arange_test_avx2 
step: 1.39999999999999991118e+00
start: -5.00000000000000000000e+00
0: -5.00000000000000000000e+00
1: -3.60000000000000008882e+00
2: -2.20000000000000017764e+00
3: -8.00000000000000266454e-01
4: 5.99999999999999644729e-01
5: 1.99999999999999955591e+00
6: 3.39999999999999946709e+00
7: 4.79999999999999893419e+00

The AVX2 version matches today's Pytorch result if vectorization is turned off, and I suspect the non-AVX2 version matches what Pytorch used to give. The most notable difference is element 5--exactly 2 without AVX2, and slightly lower than 2 with AVX2.

kurtamohler · 2020-11-06T23:04:41Z

To summarize, there are two CPU numerical problems:

Vec256::arange's formula introduces extra numerical error
AVX2 gives a different result than non-AVX2 code

To confirm, I ran torch.arange(-5, 5, 1.4, device='cpu', dtype=torch.float64).floor() in the following ways to get the following results:

Using the current Pytorch 1.7 release:

$ ATEN_CPU_CAPABILITY=avx2 python
>>> import torch
>>> torch.arange(-5, 5, 1.4, device='cpu', dtype=torch.float64).floor()
tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], dtype=torch.float64)

$ ATEN_CPU_CAPABILITY=avx python
>>> import torch
>>> torch.arange(-5, 5, 1.4, device='cpu', dtype=torch.float64).floor()
tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], dtype=torch.float64)

Using my Vec256::arange_index_offset instead of Vec256::arange:

$ ATEN_CPU_CAPABILITY=avx2 python
>>> import torch
>>> torch.arange(-5, 5, 1.4, device='cpu', dtype=torch.float64).floor()
tensor([-5., -4., -3., -1.,  0.,  1.,  3.,  4.], dtype=torch.float64)


$ ATEN_CPU_CAPABILITY=avx python
>>> import torch
>>> torch.arange(-5, 5, 1.4, device='cpu', dtype=torch.float64).floor()
tensor([-5., -4., -3., -1.,  0.,  2.,  3.,  4.], dtype=torch.float64)

So we only get the correct result in one of the above four cases: if we use AVX instead of AVX2 and we use Vec256::arange_index_offset instead of Vec256::arange.

I can create a PR to introduce Vec256::arange_index_offset. But I think we still need to discuss what (if anything) we can do about AVX2. Is there a way to disable AVX2 support for just one operation?

ezyang · 2020-11-09T14:57:02Z

We can certainly disable AVX2 support for one operation; we already compile kernels multiple times per AVX version, so simplest would be just to wrap the operation in a preprocessor macro that omits the DispatchStub registration when AVX2 is detected.

Fixing the problem by disabling AVX2 does seem a bit goofy though...

mruberry · 2020-11-10T05:08:45Z

I propose we just could just revert the PR (or the part of it) that "vectorized" arange. This would leave the CUDA issue, but people don't seem as sensitive to it. We should also add a test case for this behavior so if we try to improve the performance of arange again we detect this.

kurtamohler · 2020-11-11T19:58:46Z

The problem appears to actually be the -mfma compiler flag. On my machine, Pytorch builds AVX2 with these flags:

-std=gnu++14 -O3 -mavx2 -mfma -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store -DCPU_CAPABILITY=AVX2 -DCPU_CAPABILITY_AVX2 -MD

But AVX is built with these flags:

-std=gnu++14 -O3 -mavx -DCPU_CAPABILITY=AVX -DCPU_CAPABILITY_AVX -MD

Indeed, if I remove the -mfma flag from my small test program above, the AVX2 binary produces the correct result (meaning that element 5 of the result is exactly 2, not 1.999999...). And if I use our AVX flags, the result is correct as well.

Here's the code for this experiment: https://github.com/kurtamohler/pytorch-perf-test-scripts/tree/master/arange_compiler_flags

Looking at these in godbolt shows that an FMA instruction is in fact used only if -mfma is included.

I think it could be possible to solve this issue by using AVX2 intrinsics to possibly prevent the compiler from being able to substitute with an FMA. Or perhaps there is some macro that would tell the compiler to not use FMA for the specified section of code. I'll look into whether a macro like that exists.

kurtamohler · 2020-11-11T20:01:59Z

Another possible solution is to separate code that we don't want to use FMA into a different file, and avoid using the -mfma flag on that file.

kurtamohler · 2020-11-11T21:38:27Z

I tried using intrinsics to see if it would avoid FMA, and it didn't work at first--but then I found a way to get it working. First, I changed from this

  template<typename step_t>
  static Vec256<double> arange_index_offset(double base = 0., step_t step = static_cast<step_t>(1), int64_t index_offset = 0) {
    return Vec256<double>(
        base + step * index_offset,
        base + step * (index_offset + 1),
        base + step * (index_offset + 2),
        base + step * (index_offset + 3));
  }

to this:

  template<typename step_t>
  static Vec256<double> arange_index_offset(double base = 0., step_t step = static_cast<step_t>(1), int64_t index_offset = 0) {
    Vec256<double> base_v(base, base, base, base);
    Vec256<double> step_v(step, step, step, step);
    Vec256<double> index_v(index_offset, index_offset + 1, index_offset + 2, index_offset + 3); 
    return _mm256_add_pd(base_v, _mm256_mul_pd(step_v, index_v));
  }

Evidently the _mm256_add_pd and _mm256_mul_pd calls get combined such that I'm effectively calling _mm256_fmadd_pd, so this didn't fix the problem. But then I wondered, what would happen if I just used the multiply intrinsic and then used regular add operations? Like this:

  template<typename step_t>
  static Vec256<double> arange_index_offset(double base = 0., step_t step = static_cast<step_t>(1), int64_t index_offset = 0) {
    Vec256<double> index_v(index_offset, index_offset + 1, index_offset + 2, index_offset + 3); 
    Vec256<double> step_v(step, step, step, step);
    Vec256<double> tmp = _mm256_mul_pd(step_v, index_v);
    return Vec256<double>(
      base + tmp.values[0],
      base + tmp.values[1],
      base + tmp.values[2],
      base + tmp.values[3]);
  }

This worked! Since I'm using an intrinsic only for the multiply and not the add, the two operations don't get combined, and now we get the proper behavior for the example we've been looking at:

$ ATEN_CPU_CAPABILITY=avx2 python
>>> import torch
>>> torch.arange(-5, 5, 1.4, device='cpu', dtype=torch.float64).floor()
tensor([-5., -4., -3., -1.,  0.,  2.,  3.,  4.], dtype=torch.float64)
>>> torch.arange(-5, 5, 1.4, device='cpu', dtype=torch.float64).storage()
 -5.0
 -3.6
 -2.2
 -0.8000000000000007
 0.5999999999999996
 2.0      <---- NOTE: this is exactly 2 now, and not 1.99999... like before
 3.3999999999999986
 4.799999999999999
[torch.DoubleStorage of size 8]

This result is exactly the same for ATEN_CPU_CAPABILITY=avx2, ATEN_CPU_CAPABILITY=avx, and ATEN_CPU_CAPABILITY=default. And it agrees with the Pytorch 1.6 result as well.

I'll admit that it's a bit of an odd solution, but it does work. Is it alright if we go ahead with this?

mruberry · 2020-11-11T22:27:28Z

cc @VitalyFedyunin for the AVX build issues

ezyang · 2020-11-13T18:54:25Z

@kurtamohler Thank you for the detailed analysis; in particular, narrowing it down to FMA is really useful. You didn't say so explicitly, but it makes sense that FMA would cause this problem, as it changes the rounding behavior of the operation: multiply-add does two rounds, whereas fused multiply-add only does one round. I'm guessing that in this particular case, that's enough extra precision to "witness" the fact that 1.4 isn't actually 1.4 in floating point, and give you the odd rounding behavior. In fact, arguably, the FMA result is "more correct" (because 1.4 is not actually 1.4).

All told, I'm not sure we should fix this. We compile with -fma because performance matters to us, and so we don't mind when there are slight changes in precision (after all this is deep learning cough cough). If we have a tradeoff between "results are bit-for-bit identical no matter what level of vectorization you have" versus "things might be faster at higher vectorization levels, at the cost of some slight numeric differences" (note that we are still bit-for-bit reproducible; you just have to stay at a consistent vectorization level). And it also helps us match CUDA behavior.

I'm open to other opinions though. Maybe @ngimel has more thoughts.

kurtamohler · 2020-11-13T19:05:47Z

Yeah I think that's a reasonable argument @ezyang .

On the topic of better performance, I think using my Vec256::arange_index_offset instead of Vec256::arange should improve performance. I haven't actually measured it, but it does cut out one add and one multiply per call. Furthermore, we could make the operation be actually vectorized for AVX, rather than just unrolled. I'd be more than happy to do a bit of benchmarking and update my PR to introduce these performance improvements (and keep using the FMA so results won't change), just let me know if that's something you'd like me to do.

ezyang · 2020-11-13T19:34:32Z

I think I agree arange_index_offset may indeed still be a good thing to do anyway, but it's mostly orthogonal to this issue report :)

@coquelin77 Coming back to you, we think this might just be "expected" behavior. Does this leave you in a bad spot, e.g., it is now not easy for you to get the right behavior one way or another?

ezyang · 2020-11-13T21:36:17Z

For posterity, it IS possible to selectively disable fma: https://twitter.com/colesbury/status/1327331440992849923

coquelin77 · 2020-11-16T07:15:32Z

it make things slightly more difficult is a small subset of functions, but i think that it would be in the best interest for torch to target speed. to get exact numeric consistency while making optimization passes is very difficult, if not impossible. I posted the issue primarily for you to be aware of the changes brought on during the update

Thanks for the good work!

ezyang · 2020-11-16T15:25:54Z

Thanks, I'll close this as not a bug.

BTW, for the public record, another way to implement the arange as originally quoted is to do arange on integers, and then divide by 10 after the fact; you should be able to avoid precision issues entirely that way.

albanD added high priority module: numerical-stability Problems related to numerical stability of operations module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 29, 2020

pytorch-probot bot added the triage review label Oct 29, 2020

albanD removed the high priority label Oct 29, 2020

gchanan added the high priority label Oct 30, 2020

ngimel removed the triage review label Nov 2, 2020

gchanan changed the title ~~floor with float64 not working properly after 1.7 update on CPU~~ torch.arange numerics are different after 1.7 update on CPU Nov 3, 2020

kurtamohler self-assigned this Nov 5, 2020

kurtamohler mentioned this issue Nov 11, 2020

Improve performance of CPU torch.arange() #47787

Closed

ezyang closed this as completed Nov 16, 2020

kurtamohler mentioned this issue Nov 16, 2020

Performance improvements for arange CPU #48022

Closed

gchanan added this to the 1.7.1 milestone Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.arange numerics are different after 1.7 update on CPU #47043

torch.arange numerics are different after 1.7 update on CPU #47043

coquelin77 commented Oct 29, 2020 •

edited by pytorch-probot bot

coquelin77 commented Oct 29, 2020

albanD commented Oct 29, 2020 •

edited

coquelin77 commented Oct 29, 2020

albanD commented Oct 29, 2020

gchanan commented Oct 30, 2020

coquelin77 commented Nov 2, 2020

gchanan commented Nov 2, 2020

ngimel commented Nov 2, 2020

mruberry commented Nov 2, 2020

gchanan commented Nov 2, 2020

gchanan commented Nov 3, 2020

gchanan commented Nov 3, 2020

kurtamohler commented Nov 5, 2020

kurtamohler commented Nov 5, 2020 •

edited

kurtamohler commented Nov 6, 2020 •

edited

ngimel commented Nov 6, 2020

kurtamohler commented Nov 6, 2020 •

edited

kurtamohler commented Nov 6, 2020 •

edited

ezyang commented Nov 9, 2020

mruberry commented Nov 10, 2020 •

edited

kurtamohler commented Nov 11, 2020 •

edited

kurtamohler commented Nov 11, 2020

kurtamohler commented Nov 11, 2020 •

edited

mruberry commented Nov 11, 2020

ezyang commented Nov 13, 2020

kurtamohler commented Nov 13, 2020 •

edited

ezyang commented Nov 13, 2020

ezyang commented Nov 13, 2020

coquelin77 commented Nov 16, 2020

ezyang commented Nov 16, 2020

torch.arange numerics are different after 1.7 update on CPU #47043

torch.arange numerics are different after 1.7 update on CPU #47043

Comments

coquelin77 commented Oct 29, 2020 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

coquelin77 commented Oct 29, 2020

albanD commented Oct 29, 2020 • edited

coquelin77 commented Oct 29, 2020

albanD commented Oct 29, 2020

gchanan commented Oct 30, 2020

coquelin77 commented Nov 2, 2020

gchanan commented Nov 2, 2020

ngimel commented Nov 2, 2020

mruberry commented Nov 2, 2020

gchanan commented Nov 2, 2020

gchanan commented Nov 3, 2020

gchanan commented Nov 3, 2020

kurtamohler commented Nov 5, 2020

kurtamohler commented Nov 5, 2020 • edited

kurtamohler commented Nov 6, 2020 • edited

ngimel commented Nov 6, 2020

kurtamohler commented Nov 6, 2020 • edited

kurtamohler commented Nov 6, 2020 • edited

ezyang commented Nov 9, 2020

mruberry commented Nov 10, 2020 • edited

kurtamohler commented Nov 11, 2020 • edited

kurtamohler commented Nov 11, 2020

kurtamohler commented Nov 11, 2020 • edited

mruberry commented Nov 11, 2020

ezyang commented Nov 13, 2020

kurtamohler commented Nov 13, 2020 • edited

ezyang commented Nov 13, 2020

ezyang commented Nov 13, 2020

coquelin77 commented Nov 16, 2020

ezyang commented Nov 16, 2020

coquelin77 commented Oct 29, 2020 •

edited by pytorch-probot bot

albanD commented Oct 29, 2020 •

edited

kurtamohler commented Nov 5, 2020 •

edited

kurtamohler commented Nov 6, 2020 •

edited

kurtamohler commented Nov 6, 2020 •

edited

kurtamohler commented Nov 6, 2020 •

edited

mruberry commented Nov 10, 2020 •

edited

kurtamohler commented Nov 11, 2020 •

edited

kurtamohler commented Nov 11, 2020 •

edited

kurtamohler commented Nov 13, 2020 •

edited