Comparison between k2 CTC loss and PyTorch CTC loss #575

zhu-han · 2021-01-08T10:25:44Z

Has anyone compaired the performance between k2 CTC loss implementation and the CTCLoss in PyTorch?

I tried to write a K2CTCLoss with k2 to replace torch.nn.CTCLoss and did some experiments using ESPnet. It shows there is a gap between K2CTCLoss and torch.nn.CTCLoss.

The experiments are conducted on Librispeech 100h and the training criterion is CTC only. Acoustic model is BLSTM or Transformer based encoder. For CTC modeling unit, I tried char and bpe 5000. Here are some conclusions of my experiments:

K2CTCLoss could work with BLSTM based acoustic model, though torch.nn.CTCLoss could reduce the loss faster;
K2CTCLoss did't work with Transformer. When using bpe 5000 as CTC modeling unit, the loss curve of K2CTCLoss would be like :

In comparison, torch.nn.CTCLoss with Transformer is like:
The above conclusions are the same when CTC modeling unit is char or bpe 5000.
In snowfall, the CTC implementation is (1) acoustic feature->phone->word. I did a experiment using the K2CTCLoss with (2) acoustic feature->char structure. And the WERs are (1) 12.84% and (2) 15.99% respectively. So I think the K2CTCLoss implementation should be fine.

Could anyone give me some advice on how to make it work better? And does anyone know why it can't work well with transformer? Thanks!

danpovey · 2021-01-08T11:07:01Z

That's interesting.
It's possible that it could be a bug in k2, but there are many places it could be.
I checked the documentation for torch.nn.CTCLoss but it is a little vague so it's hard to know whether they are attempting to implement the same thing as us.
One thing you could do which would be helpful to us is to try to evaluate k2's version of the loss given the model trained with PyTorch. If it looks similar to PyTorch's loss, it would likely indicate a bug in computing derivatives.

danpovey · 2021-01-08T11:17:42Z

Also it would be nice if someone could compute the sum of the derivative (.grad) of our CTC loss and make sure the sum on each frame of each sequence is close to 1.0. [if we can somehow access the .grad w.r.t. the nnet output].

zhu-han · 2021-01-08T12:09:52Z

Using random initialized transformer model, the first 10 iteration loss computed with K2CTCLoss and torch.nn.CTCLoss are:

iteration	K2CTCLoss	torch.nn.CTCLoss
1	3379.74	3385.51
2	2644.70	2643.39
3	2760.64	2765.41
4	2593.41	2595.71
5	2360.99	2363.25
6	2351.16	2346.32
7	3471.35	3478.80
8	2540.69	2540.17
9	2953.67	2955.29
10	2190.49	2189.26

Using transformer model trained with torch.nn.CTCLoss for 10 epoch, the first 10 iteration loss computed with K2CTCLoss and torch.nn.CTCLoss are:

iteration	K2CTCLoss	torch.nn.CTCLoss
1	862.52	35.42
2	782.07	35.21
3	870.66	39.65
4	804.01	27.49
5	821.23	37.25
6	806.56	36.18
7	717.92	31.09
8	705.32	32.66
9	829.99	28.65
10	749.98	27.37

It seems that we can get a similar loss value with random initialized model but not a pretrained model.

danpovey · 2021-01-08T12:16:52Z

Thanks a lot!! For the transformer model, can you clarify how you were training it? Was it with one of the two CTC losses?

danpovey · 2021-01-08T12:18:20Z

And in the 2nd table, can you clarify if you were training with the same loss functions you were evaluating?
What I want is for you to train with one loss and also evaluate the objectives with the other. To see if the actual loss calculation is the same (might be bug in derivative computation)

zhu-han · 2021-01-08T12:23:28Z

In the 2nd table, the pretrained transformer model is trained with torch.nn.CTCLoss only. And then the training and loss calculation used the same loss function.

danpovey · 2021-01-08T12:52:10Z

OK, but what I want is for you to train with the torch loss and evaluate with k2 CTC loss, with the same model. So same code will evaluate 2 objectives.

With the random transformer model, what are the iterations? That is, what objective are you training with?

zhu-han · 2021-01-08T13:00:00Z

Sorry for the misunderstanding. ~~When trained with torch.nn.CTCLoss and also evaluate the K2CTCLoss, no matter with a random initialized or a pretrained model, the two loss value is the same.~~

In the 1st table above (random transformer model results), the two column are trained with K2CTCLoss and torch.nn.CTCLoss as objective respectively.

danpovey · 2021-01-08T13:10:43Z

So if you train with the PyTorch loss and evaluate also with the k2 one, you'll get the same value? Because in iteration 1 of your 2nd table, they're very different... if you showed iteration 0, would the k2 one be the same?

zhu-han · 2021-01-08T13:33:28Z

I checked the code and find a bug which accidently make the two loss function the same.

The real results is : with a random initialized model, the two losses are similar. With a pretrained model, the two losses are very different. I will paste the results bellow.

zhu-han · 2021-01-08T13:49:49Z

The training objective is torch.nn.CTCLoss and the evaluation is performed on both K2CTCLoss and torch.nn.CTCLoss

Using random initialized transformer model

iteration	K2CTCLoss	torch.nn.CTCLoss
1	3379.74	3385.51
2	2644.70	2643.39
3	2760.65	2765.41
4	2593.45	2595.71
5	2361.09	2363.25
6	2351.38	2346.32
7	3471.62	3478.80
8	2540.69	2540.17
9	2953.91	2955.29
10	2191.66	2189.26

Using pretrained transformer model which trained with torch.nn.CTCLoss as objective for 10 epoch.

iteration	K2CTCLoss	torch.nn.CTCLoss
1	862.52	35.42
2	782.09	35.21
3	870.74	39.65
4	804.16	27.49
5	821.47	37.25
6	806.93	36.18
7	718.35	31.09
8	705.85	32.66
9	830.93	28.65
10	750.99	27.37

danpovey · 2021-01-08T14:01:31Z

OK. WIthout seeing the code it will be hard to comment much further or help debug.
Fanjun says he will try to debug the derivatives of the k2 loss over the weekend.

zhu-han · 2021-01-08T14:19:17Z

Thanks a lot for your help! If anyone is interested, my K2CTCLoss implementation is in https://github.com/zhu-han/espnet-k2/blob/main/espnet/nets/pytorch_backend/ctc_graph.py.

danpovey · 2021-01-08T14:50:58Z

[re-posting directly, mail is unreliable.]
You are not using 'indices' to sort the FSAs in the graphs.
I'm not sure if our Fsa object has an operator [] that can take a Tensor, but it might.

Basically, your graphs are in the wrong order.
You could also possibly reorder targets and target_lengths before compiling the graph.

danpovey · 2021-01-08T14:57:58Z

You are not using 'indices' to sort the FSAs in the graphs. I'm not sure if our Fsa object has an operator [] that can take a Tensor, but it might. Basically, your graphs are in the wrong order. You could also possibly reorder `targets` and `target_lengths` before compiling the graph.

…

On Fri, Jan 8, 2021 at 10:19 PM Han Zhu ***@***.***> wrote: Thanks a lot for your help! If anyone is intrested, my K2CTCLoss implementation is in https://github.com/zhu-han/espnet-k2/blob/main/espnet/nets/pytorch_backend/ctc_graph.py . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6YXZHFVCXH35WNLXLSY4H7JANCNFSM4V2H7NEQ> .

danpovey · 2021-01-08T15:02:22Z

possibly

decoding_graph = k2.index(decoding_graph, indices)

would work (not sure though)

zhu-han · 2021-01-08T15:12:22Z

Thanks for your help! I will change my code accordingly and do the experiments.

sw005320 · 2021-01-08T15:16:06Z

@zhu-han, thanks for sharing your interesting report.
I will also take a look at this.
We (@brianyan918) are also working on comparing pytorch CTC, warpCTC, k2 CTC, and gtn CTC.

zhu-han · 2021-01-09T01:36:21Z

After fixing the graph order issue, K2CTCLoss could work with Transformer now. With bpe 500 as CTC modeling unit, the loss curve is like:

And previous results make sense now. Before making batch, the training samples is sorted according to input length. So with a smaller batch size, all samples in the same batch is more likely to have the same length. With the same length, the sorted text could match the order of unsorted graph. In my experiments, BLSTM has smaller batch size than Transformers (20 vs 256), so BLSTM suffers less than Transformer because of this bug. That's why BLSTM could work and Transformer could not work in previous results.

Thanks a lot!

zhu-han · 2021-01-09T01:42:11Z

@sw005320 My revised K2CTCLoss is in https://github.com/zhu-han/espnet-k2/blob/main/espnet/nets/pytorch_backend/ctc_graph.py. I will be glad to help on this.

csukuangfj · 2021-01-09T14:24:24Z

I just added gradients test for k2 CTC loss. Please see #577

It shows that k2 CTC loss is identical to PyTorch CTC loss and warp-ctc when they are given the same input.

The gradients of k2 and PyTorch are also the same.

zhu-han · 2021-01-10T01:53:59Z

Thanks! But since I find models trained with k2 CTC loss and PyTorch CTC loss did have some differences, I added additional test cases baed on test_random_case1 in ctc_gradients_test.py to check it. Here are some results:

When I run this test case directly, it could pass;
When I changed parameters T and C to match my experiment's setup, i.e., T = 400 (16s training sample and 4 × subsampling factor), C = 5000 (with bpe 5000 as CTC modeling unit), this test case is failed. Specifically ,the gradient check assert torch.allclose(torch_activation.grad, k2_activation.grad, atol=1e-2) is failed.
When I keep T as orignial and only change C to 5000, the gradient check is passed. But when I keep C and change the sample length T to 400, and the gradient check is also failed.

It seems with longer samples, the difference is larger.

zhu-han · 2021-01-10T02:47:19Z

And these are the results I got on librispeech 100h using PyTorch CTC loss and k2 CTC loss:

PyTorch CTC loss:

Criterion	Test clean	Test other
CTC	17.1	35.9
Hybrid CTC/Attention	10.3	27.1

k2 CTC loss:

Criterion	Test clean	Test other
CTC	17.3	36.4
Hybrid CTC/Attention	10.6	27.5

Detailed setup:

k2 CTC loss: In k2.intersect_dense(), set output_beam = 10.0
Training: For both criterions, SpecAugument is not used. For CTC, epochs=30, batch size = 256. For hybrid CTC/Attention, epochs = 80, CTC weight = 0.3.
Decoding: For both criterions, best 5 models based on validation performance are averaged to get the final model, beam size = 10, and language model is not used. For hybrid CTC/Attention, CTC weight = 0.4.
Model: For hybrid CTC/Attention, Transformer with 12 encoder layers and 6 decoder layers. Attention heads = 4, attention dimension = 256, feed forward dimension = 2048. For CTC, same encoder structure is used.

danpovey · 2021-01-10T03:48:28Z

Cool!
Regarding the gradient-check: sometimes there can be roundoff error that causes the posteriors on some frames to sum to a number different than 1. Can you compute those sums? I.e. the sum of the grad, per frame...

zhu-han · 2021-01-10T04:26:14Z

Given the same input, the PyTorch CTC gradient sum per frame is:
[ 0.0000e+00, 2.3842e-07, -3.5763e-07, -2.3842e-07, -3.5763e-07,...]
and the k2 CTC gradient sum per frame is:
[-1.1921e-06, -2.3842e-07, 1.0729e-06, 8.3447e-07, 4.7684e-07,...]

danpovey · 2021-01-10T04:35:05Z

That must be prior to the softmax. Can you get it after the softmax?

…

On Sun, Jan 10, 2021 at 12:26 PM Han Zhu ***@***.***> wrote: Given the same input, the PyTorch CTC gradient sum per frame is: [ 0.0000e+00, 2.3842e-07, -3.5763e-07, -2.3842e-07, -3.5763e-07,...] and the k2 CTC gradient sum per frame is: [-1.1921e-06, -2.3842e-07, 1.0729e-06, 8.3447e-07, 4.7684e-07,...] — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6GAEBOJEFYOVVG5V3SZET7FANCNFSM4V2H7NEQ> .

zhu-han · 2021-01-10T04:47:40Z

Those was already after softmax results.
For example, the torch gradient for one frame is:

[ -9.4860,   2.4738,   5.9179,   4.7736,   5.5900,   2.8961,   6.4206,
            4.4688,   2.8942,   4.0882, -74.9657,   4.3691,   5.7488,   6.3485,
            6.4876,   2.9647,   3.2492,   4.7775,   3.5132,   2.7532,   4.7165]

It's sum is 5.2452e-06.

k2 gradient of this same frame is:

[ -9.4859,   2.4738,   5.9179,   4.7736,   5.5900,   2.8961,   6.4206,
            4.4688,   2.8942,   4.0882, -74.9657,   4.3691,   5.7488,   6.3485,
            6.4876,   2.9647,   3.2492,   4.7775,   3.5132,   2.7532,   4.7165]

And it's sum is -8.5831e-06

These two gradients only have one different value: -9.4860 vs -9.4859 in the first dimension.

danpovey · 2021-01-10T04:57:12Z

doesnt look right. .gradient after.softmax should sum to one, is.equal to posterior.

…

On Sunday, January 10, 2021, Han Zhu ***@***.***> wrote: It was after softmax result. for example, the torch gradient for one frame is: [ -9.4860, 2.4738, 5.9179, 4.7736, 5.5900, 2.8961, 6.4206, 4.4688, 2.8942, 4.0882, -74.9657, 4.3691, 5.7488, 6.3485, 6.4876, 2.9647, 3.2492, 4.7775, 3.5132, 2.7532, 4.7165] It's sum is 5.2452e-06. k2 gradient of this same frame is: [ -9.4859, 2.4738, 5.9179, 4.7736, 5.5900, 2.8961, 6.4206, 4.4688, 2.8942, 4.0882, -74.9657, 4.3691, 5.7488, 6.3485, 6.4876, 2.9647, 3.2492, 4.7775, 3.5132, 2.7532, 4.7165] And it's sum is -8.5831e-06 These two gradients only have one different value: -9.4860 vs -9.4859 in the first dimension — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOY2WDPP5WBQWAX7H7LSZEWPPANCNFSM4V2H7NEQ> .

zhu-han · 2021-01-10T04:58:56Z

Oh, I misunderstand that ,I thought you mean the loss was computed prior to the softmax. I will update results.

zhu-han · 2021-01-10T06:27:10Z

When I set learning rate as 1 and use k2 CTC loss, the gradient sum per frame of the tensor after logsoftmax is -1. I'm not sure it is what you want to check.

danpovey · 2021-01-10T07:30:51Z

Yes that sounds right. See if the same is true of PyTorch's one; the error could be there.

zhu-han · 2021-01-10T07:49:31Z

For PyTorch, these values is near to 0, i.e.,[-4.7088e-6, -4.6492e-6, ...]

danpovey · 2021-01-10T08:59:07Z

Ah, I guess it does the normalization internally.
It's unlikely, IMO, that there is a roundoff problem in k2, given what you say. More likely in pytorch itself and the WER differences may be tuning-dependent, most likely.

csukuangfj · 2021-01-10T09:00:24Z

For the simplest case,

#             blk    a    b    c    d
activation = [0.2, 0.2, 0.2, 0.2, 0.2]
log_probs = log_softmax of activation
log_probs.retain_grad()

And if the target label is a,

for k2, log_probs.grad is [0, -1, 0, 0, 0]. log_probs.grad.sum() is -1
for PyTorch, log_probs.grad is [0.2, -0.8, 0.2, 0.2, 0.2]. log_probs.grad.sum() is 0

danpovey · 2021-01-10T09:10:53Z

PyTorch is obviously doing the log-softmax normalization as part of the CTC computation; in k2 those things are separate.

danpovey · 2021-01-14T04:53:06Z

Do we know of any difference in speed?

csukuangfj · 2021-01-14T05:04:16Z

We (@brianyan918) are also working on comparing pytorch CTC, warpCTC, k2 CTC, and gtn CTC.

@sw005320 Could you share the progress with us? Does the comparison include speed differences?

brianyan918 · 2021-01-14T05:34:04Z

I tested these different CTC modes in espnet with these results on voxforge italian eval:

Model	CER	WER
Conformer (warpctc)	8.5	30.0
Conformer (pytorch)	8.6	30.6
Conformer (gtnctc)	8.5	30.0
Conformer (k2)	8.7	30.8

Previously I was able to compare the speeds of pytorch vs warp vs gtn, but for k2 I used a different device. I'll provide an update with speed comparisons shortly.

zhu-han · 2021-01-14T06:17:00Z

When training on librispeech 100h for one epoch, the results are:

Method	Time
PyTorch	15.69 min
k2	17.78 min

danpovey · 2021-01-14T07:30:01Z

OK thanks. Was that in debug or release mode? (It can be quite different). In debug mode, there is a speed boost from doing export K2_DISABLE_CHECKS=1 prior to running it. We have a lot of checking code active by default right not.

…

On Thu, Jan 14, 2021 at 2:17 PM Han Zhu ***@***.***> wrote: When training on librispeech 100h for one epoch, the results are: Method Time PyTorch 15.69 min k2 17.78 min — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6WQSNO2HU5B2YPRKDSZ2D6VANCNFSM4V2H7NEQ> .

zhu-han · 2021-01-14T07:43:59Z

I followed https://k2.readthedocs.io/en/latest/installation.html#install-k2-from-source to install k2. Is this in release mode by default?

csukuangfj · 2021-01-14T07:47:55Z

cmake -DCMAKE_BUILD_TYPE=Release ..

If you followed it step by step, then it is a Release build.

zhu-han · 2021-01-14T07:49:25Z

Yes, it is in release mode then.

csukuangfj · 2021-01-14T07:51:01Z

python3 -m k2.version

should tell you whether k2 was built in Release mode or in Debug mode.

zhu-han · 2021-01-14T07:54:12Z

It shows Build type: Release.

danpovey · 2021-01-14T08:03:59Z

OK. When was the code pulled? there may have been speed improvements.

…

On Thu, Jan 14, 2021 at 3:54 PM Han Zhu ***@***.***> wrote: It shows Build type: Release. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6536EOWLFSN7BVCC3SZ2PLFANCNFSM4V2H7NEQ> .

zhu-han · 2021-01-14T08:10:07Z

Pulled on 2021/01/06.

danpovey · 2021-01-14T08:10:58Z

OK, probably no speed optimizations since then.

…

On Thu, Jan 14, 2021 at 4:10 PM Han Zhu ***@***.***> wrote: Pulled on 2021/01/06. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO22D7SFPE63U2BYDH3SZ2RG3ANCNFSM4V2H7NEQ> .

csukuangfj · 2021-01-14T09:40:38Z

This pull-request #571 (comment), merged on Jan 8, made
GetTransposeReodering 2-3x faster than before. Not sure how it would affect the training speed.

zhu-han · 2021-01-14T13:38:51Z

Tried with the latest k2, training time is similar. Previous training time is 17.78 min and the latest one is 17.68 min.

csukuangfj · 2021-01-14T13:50:34Z

Which version of CUDA Toolkit are you using? The change is enabled only for NVCC version > 10.1.105.

zhu-han · 2021-01-14T14:08:02Z

I'm using CUDA 10.1, NVCC version 10.1.243

danpovey · 2021-01-14T14:49:56Z

That function wasn't the bottleneck anyway. We can optimize more later. Anyway our code is more general than torch's one (e.g. supports lexicons) so there are fewer opportunities for optimization.

…

On Thu, Jan 14, 2021 at 10:08 PM Han Zhu ***@***.***> wrote: I'm using CUDA 10.1, NVCC version 10.1.243 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYZCNBI5I3OPOJNLF3SZ33FFANCNFSM4V2H7NEQ> .

yaguanghu · 2021-02-04T12:28:00Z

I just added gradients test for k2 CTC loss. Please see #577

It shows that k2 CTC loss is identical to PyTorch CTC loss and warp-ctc when they are given the same input.

The gradients of k2 and PyTorch are also the same.

In test_case3, when I change the input to torch_activation after softmax and remove the softmax function below, the gradient of k2 seems not identical to pytorch build-in ctc loss. e.g.
torch_activation = torch.tensor([[ [-5, -4, -3, -2, -1], [-10, -9, -8, -7, -6], [-15, -14, -13, -12, -11.], ]]).permute(1, 0, 2).detach().log_softmax(2).requires_grad_(True)
It seems strange, what might be the reason?

danpovey · 2021-02-04T15:23:05Z

How different are they?
I'm not convinced that what we implemented is 100% the same as the standard CTC loss.
We may not be treating repeats of the same symbol quite the same way, e.g. "aa" at the nnet output
could represent either the single symbol "a" or "a" followed by "a".

yaguanghu · 2021-02-04T16:18:19Z

Repeats symbols are already handled by the current CTC topology as standard CTC loss does, and I don't think there's a difference between them.
I'm just curious about why log_softmax affects the gradient.

danpovey · 2021-02-05T02:58:44Z

It is definitely expected to affect the gradient. In our implementation we don't do that as part of the FSA stuff, it is a separate component, so our CTC loss needs the log-softmax.

zhu-han closed this as completed Mar 8, 2021

Comparison between k2 CTC loss and PyTorch CTC loss #575

Comparison between k2 CTC loss and PyTorch CTC loss #575

Comments

zhu-han commented Jan 8, 2021

danpovey commented Jan 8, 2021

danpovey commented Jan 8, 2021

zhu-han commented Jan 8, 2021 • edited Loading

danpovey commented Jan 8, 2021

danpovey commented Jan 8, 2021

zhu-han commented Jan 8, 2021

danpovey commented Jan 8, 2021

zhu-han commented Jan 8, 2021 • edited Loading

danpovey commented Jan 8, 2021 • edited Loading

zhu-han commented Jan 8, 2021 • edited Loading

zhu-han commented Jan 8, 2021

danpovey commented Jan 8, 2021

zhu-han commented Jan 8, 2021 • edited Loading

danpovey commented Jan 8, 2021

danpovey commented Jan 8, 2021 via email

danpovey commented Jan 8, 2021

zhu-han commented Jan 8, 2021

sw005320 commented Jan 8, 2021

zhu-han commented Jan 9, 2021

zhu-han commented Jan 9, 2021

csukuangfj commented Jan 9, 2021

zhu-han commented Jan 10, 2021 • edited Loading

zhu-han commented Jan 10, 2021

danpovey commented Jan 10, 2021

zhu-han commented Jan 10, 2021

danpovey commented Jan 10, 2021 via email

zhu-han commented Jan 10, 2021 • edited Loading

danpovey commented Jan 10, 2021 via email

zhu-han commented Jan 10, 2021 • edited Loading

zhu-han commented Jan 10, 2021

danpovey commented Jan 10, 2021

zhu-han commented Jan 10, 2021

danpovey commented Jan 10, 2021

csukuangfj commented Jan 10, 2021 • edited Loading

danpovey commented Jan 10, 2021

danpovey commented Jan 14, 2021

csukuangfj commented Jan 14, 2021

brianyan918 commented Jan 14, 2021

zhu-han commented Jan 14, 2021

danpovey commented Jan 14, 2021 via email

zhu-han commented Jan 14, 2021

csukuangfj commented Jan 14, 2021

zhu-han commented Jan 14, 2021

csukuangfj commented Jan 14, 2021

zhu-han commented Jan 14, 2021

danpovey commented Jan 14, 2021 via email

zhu-han commented Jan 14, 2021

danpovey commented Jan 14, 2021 via email

csukuangfj commented Jan 14, 2021

zhu-han commented Jan 14, 2021

csukuangfj commented Jan 14, 2021

zhu-han commented Jan 14, 2021

danpovey commented Jan 14, 2021 via email

yaguanghu commented Feb 4, 2021

danpovey commented Feb 4, 2021

yaguanghu commented Feb 4, 2021 • edited Loading

danpovey commented Feb 5, 2021

zhu-han commented Jan 8, 2021 •

edited

Loading

zhu-han commented Jan 8, 2021 •

edited

Loading

danpovey commented Jan 8, 2021 •

edited

Loading

zhu-han commented Jan 8, 2021 •

edited

Loading

zhu-han commented Jan 8, 2021 •

edited

Loading

zhu-han commented Jan 10, 2021 •

edited

Loading

zhu-han commented Jan 10, 2021 •

edited

Loading

zhu-han commented Jan 10, 2021 •

edited

Loading

csukuangfj commented Jan 10, 2021 •

edited

Loading

yaguanghu commented Feb 4, 2021 •

edited

Loading