Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support torchscript. #839

Merged
merged 23 commits into from
Oct 29, 2021
Merged

Support torchscript. #839

merged 23 commits into from
Oct 29, 2021

Conversation

csukuangfj
Copy link
Collaborator

No description provided.

I have compared the output from C++ with that from Python.
The sums of the tensors are equal.
@@ -0,0 +1,89 @@
#include <cassert>
#include <cstdio>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if it were more obvious from the binary itself what it does and how to use it.. or at least a pointer to where to find this info.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, I see it has a usage message; is it accurate? (./bin/decode.py)... and what does this binary do?
I'm just wondering what kind of thing a jit file can contain, is it a nnet evaulation? With fixed size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a prototype. I will add more documentation later.

The intention of this file is to replicate what we are doing in decode.py in icefall.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering what kind of thing a jit file can contain, is it a nnet evaulation? With fixed size?

As far as I know, the torch scripted model behaves similarly to torch.nn.Module. You can vary the input shape, i.e.,
the N and T in (N, T, C).

Checkpoints just save the parameters of a model, without topology information.
If you want to recover from some checkpoint, you have to first build a nn.Module through some Python code,
which contains the definition of the model.

With a torch scripted model, all you need is a binary file ( I don't know what exactly is saved in it), load
it in C++, feed some input into it, and get the output for later processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the jit file has model weights and the model code translated into TorchScript intermediate representation (IR). AFAIK it doesn’t speed up the inference in any way unless the IR code is further optimized; e.g. it can be translated/exported to ONNX.

Also I think it is possible to extend both TorchScript and ONNX with custom ops (by building shared library extensions: https://pytorch.org/tutorials/advanced/torch_script_custom_ops) — maybe it makes sense to build k2 custom ops and involve LM + decoding as part of the JIT model for deployment packaging, I guess a lot of people would find it very convenient (same could be done for kaldifeat for a single-file model that outputs transcripts directly). Some food for thought.

Copy link
Collaborator Author

@csukuangfj csukuangfj Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it makes sense to build k2 custom ops and involve LM + decoding as part of the JIT model for deployment packaging,

Turns out custom operators supported by PyTorch has a few restrictions. i.e.,

The TorchScript compiler understands a fixed number of types. Only these types can be used as arguments to your custom operator. Currently these types are: torch::Tensor, torch::Scalar, double, int64_t and std::vector s of these types. Note that only double and not float, and only int64_t and not other integral types such as int, short or long are supported.

We need to pass FSAs to decoding functions, which I think is not supported by custom ops.


[EDITED]: See https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that’s too bad. I guess it’s possible to convert an FSA to a bunch of tensors and reconstruct it inside but it’s probably too much hassle…

at::set_num_interop_threads(1);

std::string usage = R"(
./bin/decode.py \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, a typo here. It should be ./bin/decode

If you clone k2 and do

mkdir build
cd build
cmake ..
make decode

you will find a binary in ./bin/decode

@csukuangfj
Copy link
Collaborator Author

Now it supports CTC decoding in C++ with a pre-trained model from icefall.
(Taking as inputs a file containing features, a file containing the scripted model, and a file containing a pre-trained BPE model,
it outputs the decoded transcripts).

$ ./bin/decode  --jit_pt ./cpu_jit.pt --feature_pt ./feature.pt  --bpe_model ./bpe.model
[I] /ceph-fj/fangjun/open-source/k2-torchscript/k2/torch/bin/decode.cc:200:int main(int, char**) THE GOOD NATURED AUDIENCE IN PITY TO FALLEN MAJESTY SHOWED FOR ONCE GREATER DEFERENCE TO THE KING THAN TO THE MINISTER AND SUNG THE PSALM WHICH THE FORMER HAD CALLED FOR THE OLD SERVANT TOLD HIM QUIETLY
AS THEY CREPT BACK TO GAMEWELL THAT THIS PASSAGEWAY LED FROM THE HUT AND THE PLEASANTS TO SHERWOOD AND THAT JEEOFFREY FOR THE TIME WAS HIDING WITH THE OUTLAWS IN THE FOREST BUT THE ESSENCE OF LUTHER'S LECTURES IS THERE

It will eventually take sound files as input and supports n-gram LM in decoding.

@danpovey
Copy link
Collaborator

danpovey commented Oct 6, 2021 via email

Link sentencepiece dynamically causes segmentation fault at the end
of the process.
@danpovey
Copy link
Collaborator

danpovey commented Oct 7, 2021

OK, so if I understand correctly, what is happening here is that you are using TorchScript to encode the model itself, then just coding the FSA-related parts in C++ directly?

And I believe the current status of this is that, since we haven't finished creating a C++ version of the Fsa object that we use in Python, things like automatic handling of attributes will not be happening right now, and these have to be coded directly?

BTW, it still does worry me a little, that the attribute handling will become much more opaque once we convert it to C++. But I also don't see another way to easily enable C++ deployment.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Oct 7, 2021

The purpose of this pull-request is to show that we can import a pre-trained neural network model from PyTorch and
use it with k2 in C++.
(You can see that the code is not well organized and not well documented)

As for the attributes, this pull-request only handles aux_labels.


We will use k2::RaggedArc later, which handles attribute propagations automatically.
See https://github.com/k2-fsa/k2/blob/v2.0-pre/k2/python/csrc/torch/v2/ragged_arc.h

the attribute handling will become much more opaque once we convert it to C++

I don't think it is opaque. We can use it just like what we are doing in Python, i.e., c_fsa = a_fsa.IntersectDense(b_fsa)
will handle the attributes of c_fsa automatically.

k2::RaggedArc currently has Python dependencies so I am not using it in this pull-request.

Will remove Python dependencies from k2::RaggedArc later.

@danpovey
Copy link
Collaborator

danpovey commented Oct 7, 2021

Ah I see, cool.
I am wondering whether a different name might be better, than RaggedArc. Perhaps FsaClass?
Another possibility would be to rename Fsa,FsaVec,FsaOrVec to RawFsa, RawFsaVec, RawFsaOrVec, and rename RaggedArc to Fsa. That would be long-term a better long-term solution, IMO, because the names would match those in Python, but would likely require more messing about with documentation and so on.

@pkufool
Copy link
Collaborator

pkufool commented Oct 8, 2021

Will remove Python dependencies from k2::RaggedArc later.

@csukuangfj Have you started this stuff ? If not, I want to have a try. The plan is to replace the pyobject with torch::IValue.

@csukuangfj
Copy link
Collaborator Author

Will remove Python dependencies from k2::RaggedArc later.

@csukuangfj Have you started this stuff ? If not, I want to have a try. The plan is to replace the pyobject with torch::IValue.

Sure, you can go ahead.

@csukuangfj
Copy link
Collaborator Author

Now we can load HLG.pt in C++ to reconstruct a k2::Fsa. Will use it to decode.
And later we can load G.pt in C++ and use it for rescoring.

@csukuangfj
Copy link
Collaborator Author

Now it supports both CTC decoding and HLG decoding using pre-computed features.

Will implement decoding directly by reading sound files using https://github.com/csukuangfj/kaldifeat

@csukuangfj
Copy link
Collaborator Author

Now it takes sound files as input and outputs the recognition results.

Examples

CTC decoding

$ ./bin/decode \
--ue_ctc_decoding 1  \
--jit_pt ./cpu_jit.pt \
--bpe_model ./bpe.model \
./1089-134686-0000.wav  \
./1089-134686-0001.wav

[I] /ceph-fj/fangjun/open-source/k2-torchscript/build-cpu/k2/torch/bin/decode.cc:358:int main(int, char**)
Decoding result:

./1089-134686-0000.wav
HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FAT AND SAUCE

./1089-134686-0001.wav
STUFFOR IT INTO YOU HIS BELLY COUNSELLED HIM

HLG decoding

$ ./bin/decode \
--use_ctc_decoding 0  \
--jit_pt ./cpu_jit.pt \
--hlg ./HLG.pt \
--word_table ./words.txt \
./1089-134686-0000.wav  \
./1089-134686-0001.wav

[I] /ceph-fj/fangjun/open-source/k2-torchscript/build-cpu/k2/torch/bin/decode.cc:358:int main(int, char**)
Decoding result:

./1089-134686-0000.wav
HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTEN SAUCE

./1089-134686-0001.wav
STUFFED INTO YOU HIS BELLY COUNSELLED HIM

The ground truth corresponding to the sound files is:
Screen Shot 2021-10-11 at 4 43 42 PM

@csukuangfj
Copy link
Collaborator Author

Will add GPU support, refactor the code, and add more documentation.

@csukuangfj
Copy link
Collaborator Author

Because of the issue mentioned in pytorch/pytorch#47493,
k2/torch/csrc/deserialization.cu causes compilation errors with NVCC when using torch 1.7.1

That issue has been fixed in https://github.com/pytorch/pytorch/pull/47492/files, which is
only available from torch 1.8.0.

I am afraid we have to use at least torch 1.8.0 for deployment.

@csukuangfj csukuangfj changed the title WIP: Support torchscript. Support torchscript. Oct 11, 2021
@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Oct 11, 2021

Here are some decoding logs of this pull request.

You can get an impression of the decoding speed by looking at the time stamp of the logs.

GPU + CTC decoding

$ CUDA_VISIBLE_DEVICES=2 ./bin/decode --use_gpu 1 --use_ctc_decoding 1 --jit_pt ./cpu_jit.pt --bpe_model ./bpe.model ./1089-134686-0000.wav ./1089-134686-0001.wav ./1089-134686-0002.wav ./1089-134686-0003.wav ./1089-134686
-0004.wav ./1089-134686-0005.wav ./1089-134686-0006.wav ./1089-134686-0007.wav ./1089-134686-0008.wav ./1089-134686-0009.wav
2021-10-12 00:04:18.597 [I] k2/torch/bin/decode.cu:121:int main(int, char**) Device: cuda:0
2021-10-12 00:04:23.648 [I] k2/torch/bin/decode.cu:140:int main(int, char**) Load wave files
2021-10-12 00:04:23.656 [I] k2/torch/bin/decode.cu:147:int main(int, char**) Compute features
2021-10-12 00:04:23.836 [I] k2/torch/bin/decode.cu:155:int main(int, char**) Load neural network model
2021-10-12 00:04:24.647 [I] k2/torch/bin/decode.cu:174:int main(int, char**) Compute nnet_output
2021-10-12 00:04:25.486 [I] k2/torch/bin/decode.cu:196:int main(int, char**) Build CTC topo
2021-10-12 00:04:25.488 [I] k2/torch/bin/decode.cu:208:int main(int, char**) Decoding
2021-10-12 00:04:25.629 [I] k2/torch/bin/decode.cu:254:int main(int, char**)
Decoding result:

./1089-134686-0000.wav
HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FAT AND SAUCE

./1089-134686-0001.wav
STUFFOR IT INTO YOU HIS BELLY COUNSELLED HIM

./1089-134686-0002.wav
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./1089-134686-0003.wav
HELLO BERTI ANY GOOD IN YOUR MIND

./1089-134686-0004.wav
NUMBER TEN FRESH NELLIE IS WAITING ON YOU GOOD NIGHT HUSBAND

./1089-134686-0005.wav
THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS

./1089-134686-0006.wav
THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL

./1089-134686-0007.wav
A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL

./1089-134686-0008.wav
THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF

./1089-134686-0009.wav
AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE

CPU + CTC decoding

$ ./bin/decode --use_gpu 0 --use_ctc_decoding 1 --jit_pt ./cpu_jit.pt --bpe_model ./bpe.model ./1089-134686-0000.wav ./1089-134686-0001.wav ./1089-134686-0002.wav ./1089-134686-0003.wav ./1089-134686-0004.wav ./1089-134686
-0005.wav ./1089-134686-0006.wav ./1089-134686-0007.wav ./1089-134686-0008.wav ./1089-134686-0009.wav
2021-10-12 00:11:01.817 [I] k2/torch/bin/decode.cu:121:int main(int, char**) Device: cpu
2021-10-12 00:11:01.818 [I] k2/torch/bin/decode.cu:140:int main(int, char**) Load wave files
2021-10-12 00:11:01.825 [I] k2/torch/bin/decode.cu:147:int main(int, char**) Compute features
2021-10-12 00:11:01.946 [I] k2/torch/bin/decode.cu:155:int main(int, char**) Load neural network model
2021-10-12 00:11:02.449 [I] k2/torch/bin/decode.cu:174:int main(int, char**) Compute nnet_output
2021-10-12 00:11:17.42 [I] k2/torch/bin/decode.cu:196:int main(int, char**) Build CTC topo
2021-10-12 00:11:17.44 [I] k2/torch/bin/decode.cu:208:int main(int, char**) Decoding
2021-10-12 00:11:21.848 [I] k2/torch/bin/decode.cu:254:int main(int, char**)
Decoding result:

./1089-134686-0000.wav
HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FAT AND SAUCE

./1089-134686-0001.wav
STUFFOR IT INTO YOU HIS BELLY COUNSELLED HIM

./1089-134686-0002.wav
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./1089-134686-0003.wav
HELLO BERTI ANY GOOD IN YOUR MIND

./1089-134686-0004.wav
NUMBER TEN FRESH NELLIE IS WAITING ON YOU GOOD NIGHT HUSBAND

./1089-134686-0005.wav
THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS

./1089-134686-0006.wav
THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL

./1089-134686-0007.wav
A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL

./1089-134686-0008.wav
THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF

./1089-134686-0009.wav
AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE

GPU + HLG decoding

$ CUDA_VISIBLE_DEVICES=2 ./bin/decode --use_gpu 1 --use_ctc_decoding 0 --jit_pt ./cpu_jit.pt --hlg ./HLG.pt --word_table ./words.txt ./1089-134686-0000.wav ./1089-134686-0001.wav ./1089-134686-0002.wav ./1089-134686-0003.w
av ./1089-134686-0004.wav ./1089-134686-0005.wav ./1089-134686-0006.wav ./1089-134686-0007.wav ./1089-134686-0008.wav ./1089-134686-0009.wav
2021-10-12 00:16:36.776 [I] k2/torch/bin/decode.cu:124:int main(int, char**) Device: cuda:0
2021-10-12 00:16:41.765 [I] k2/torch/bin/decode.cu:143:int main(int, char**) Load wave files
2021-10-12 00:16:41.771 [I] k2/torch/bin/decode.cu:150:int main(int, char**) Compute features
2021-10-12 00:16:41.778 [I] k2/torch/bin/decode.cu:158:int main(int, char**) Load neural network model
2021-10-12 00:16:42.424 [I] k2/torch/bin/decode.cu:177:int main(int, char**) Compute nnet_output
2021-10-12 00:16:43.77 [I] k2/torch/bin/decode.cu:204:int main(int, char**) Load HLG.pt
2021-10-12 00:16:46.867 [I] k2/torch/bin/decode.cu:212:int main(int, char**) Decoding
2021-10-12 00:16:47.194 [I] k2/torch/bin/decode.cu:258:int main(int, char**)
Decoding result:

./1089-134686-0000.wav
HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTEN SAUCE

./1089-134686-0001.wav
STUFFED INTO YOU HIS BELLY COUNSELLED HIM

./1089-134686-0002.wav
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./1089-134686-0003.wav
HELLO BERTIE ANY GOOD IN YOUR MIND

./1089-134686-0004.wav
NUMBER TEN FRESH NELLIE IS WAITING ON YOU GOOD NIGHT HUSBAND

./1089-134686-0005.wav
THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS

./1089-134686-0006.wav
THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL

./1089-134686-0007.wav
A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL

./1089-134686-0008.wav
THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF

./1089-134686-0009.wav
AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE

CPU + HLG decoding

$ ./bin/decode --use_gpu 0 --use_ctc_decoding 0 --jit_pt ./cpu_jit.pt --hlg ./HLG.pt --word_table ./words.txt ./1089-134686-0000.wav ./1089-134686-0001.wav ./1089-134686-0002.wav ./1089-134686-0003.wav ./1089-134686-0004.w
av ./1089-134686-0005.wav ./1089-134686-0006.wav ./1089-134686-0007.wav ./1089-134686-0008.wav ./1089-134686-0009.wav
2021-10-12 00:19:19.795 [I] k2/torch/bin/decode.cu:124:int main(int, char**) Device: cpu
2021-10-12 00:19:19.796 [I] k2/torch/bin/decode.cu:143:int main(int, char**) Load wave files
2021-10-12 00:19:19.802 [I] k2/torch/bin/decode.cu:150:int main(int, char**) Compute features
2021-10-12 00:19:19.934 [I] k2/torch/bin/decode.cu:158:int main(int, char**) Load neural network model
2021-10-12 00:19:20.450 [I] k2/torch/bin/decode.cu:177:int main(int, char**) Compute nnet_output
2021-10-12 00:19:34.963 [I] k2/torch/bin/decode.cu:204:int main(int, char**) Load HLG.pt
2021-10-12 00:19:37.507 [I] k2/torch/bin/decode.cu:212:int main(int, char**) Decoding
2021-10-12 00:19:39.265 [I] k2/torch/bin/decode.cu:258:int main(int, char**)
Decoding result:

./1089-134686-0000.wav
HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTEN SAUCE

./1089-134686-0001.wav
STUFFED INTO YOU HIS BELLY COUNSELLED HIM

./1089-134686-0002.wav
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./1089-134686-0003.wav
HELLO BERTIE ANY GOOD IN YOUR MIND

./1089-134686-0004.wav
NUMBER TEN FRESH NELLIE IS WAITING ON YOU GOOD NIGHT HUSBAND

./1089-134686-0005.wav
THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS

./1089-134686-0006.wav
THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL

./1089-134686-0007.wav
A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL

./1089-134686-0008.wav
THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF

./1089-134686-0009.wav
AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE

There are 10 test waves:

$ soxi *.wav

Input File     : '1089-134686-0000.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.44 = 166960 samples ~ 782.625 CDDA sectors
File Size      : 334k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0001.wav'                                                                                                     
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:03.27 = 52400 samples ~ 245.625 CDDA sectors
File Size      : 105k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0002.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:06.62 = 106000 samples ~ 496.875 CDDA sectors
File Size      : 212k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0003.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:02.68 = 42880 samples ~ 201 CDDA sectors
File Size      : 85.8k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0004.wav'                                                                                                     
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:05.22 = 83441 samples ~ 391.13 CDDA sectors
File Size      : 167k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0005.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:09.63 = 154160 samples ~ 722.625 CDDA sectors
File Size      : 308k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0006.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.55 = 168880 samples ~ 791.625 CDDA sectors
File Size      : 338k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0007.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:04.28 = 68400 samples ~ 320.625 CDDA sectors
File Size      : 137k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0008.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:06.73 = 107680 samples ~ 504.75 CDDA sectors
File Size      : 215k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Input File     : '1089-134686-0009.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:10.57 = 169200 samples ~ 793.125 CDDA sectors
File Size      : 338k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Total Duration of 10 files: 00:01:10.00

@csukuangfj csukuangfj added the ready Ready for review and trigger GitHub actions to run label Oct 11, 2021
@csukuangfj
Copy link
Collaborator Author

Ready for review.

@csukuangfj
Copy link
Collaborator Author

I just wrote a Colab notebook to demonstrate how to use this pull request with a torch scripted model from icefall.

Please see
https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing

Screen Shot 2021-10-12 at 3 04 45 PM

Refer to the documentation in icefall https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c
for how to generate the required files. The above Colab notebook is downloading some pre-trained models
from hugging face.

@danpovey
Copy link
Collaborator

Cool!!

@csukuangfj csukuangfj added ready Ready for review and trigger GitHub actions to run and removed ready Ready for review and trigger GitHub actions to run labels Oct 12, 2021
@csukuangfj
Copy link
Collaborator Author

Will merge it first so that @pkufool can continue his work.

@csukuangfj csukuangfj merged commit bc756af into k2-fsa:v2.0-pre Oct 29, 2021
csukuangfj added a commit that referenced this pull request Nov 4, 2022
* [WIP]: Move k2.Fsa to C++ (#814)

* Make k2 ragged tensor more PyTorch-y like.

* Refactoring: Start to add the wrapper class AnyTensor.

* Refactoring.

* initial attempt to support autograd.

* First working version with autograd for Sum().

* Fix comments.

* Support __getitem__ and pickling.

* Add more docs for k2.ragged.Tensor

* Put documentation in header files.

* Minor fixes.

* Fix a typo.

* Fix an error.

* Add more doc.

* Wrap RaggedShape.

* [Not for Merge]: Move k2.Fsa related code to C++.

* Remove extra files.

* Update doc URL. (#821)

* Support manipulating attributes of k2.ragged.Fsa.

* Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825)

* Support index 2-axes RaggedTensor, Support slicing for RaggedTensor

* Fix compiling errors

* Fix unit test

* Change RaggedTensor.data to RaggedTensor.values

* Fix style

* Add docs

* Run nightly-cpu when pushing code to nightly-cpu branch

* Prune with max_arcs in IntersectDense (#820)

* Add checking for array constructor

* Prune with max arcs

* Minor fix

* Fix typo

* Fix review comments

* Fix typo

* Release v1.8

* Create a ragged tensor from a regular tensor. (#827)

* Create a ragged tensor from a regular tensor.

* Add tests for creating ragged tensors from regular tensors.

* Add more tests.

* Print ragged tensors in a way like what PyTorch is doing.

* Fix test cases.

* Trigger GitHub actions manually. (#829)

* Run GitHub actions on merging. (#830)

* Support printing ragged tensors in a more compact way. (#831)

* Support printing ragged tensors in a more compact way.

* Disable support for torch 1.3.1

* Fix test failures.

* Add levenshtein alignment (#828)

* Add levenshtein graph

* Contruct k2.RaggedTensor in python part

* Fix review comments, return aux_labels in ctc_graph

* Fix tests

* Fix bug of accessing symbols

* Fix bug of accessing symbols

* Change argument name, add levenshtein_distance interface

* Fix test error, add tests for levenshtein_distance

* Fix review comments and add unit test for c++ side

* update the interface of levenshtein alignment

* Fix review comments

* Release v1.9

* Add Fsa.get_forward_scores.

* Implement backprop for Fsa.get_forward_scores()

* Construct RaggedArc from unary function tensor (#30)

* Construct RaggedArc from unary function tensor

* Move fsa_from_unary_ragged and fsa_from_binary_tensor to C++

* add unit test to from unary function; add more functions to fsa

* Remove some rabbish code

* Add more unit tests and docs

* Remove the unused code

* Fix review comments, propagate attributes in To()

* Change the argument type from RaggedAny to Ragged<int32_t> in autograd function

* Delete declaration for template function

* Apply suggestions from code review

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Fix documentation errors

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

Co-authored-by: Wei Kang <wkang@pku.org.cn>

* Remove pybind dependencies from RaggedArc. (#842)

* Convert py::object and torch::IValue to each other

* Remove py::object from RaggedAny

* Remove py::object from RaggedArc

* Move files to torch directory

* remove unused files

* Add unit tests

* Remove v2 folder

* Remove unused code

* Remove unused files

* Fix review comments & fix github actions

* Check Ivalue contains RaggedAny

* Minor fixes

* Add attributes related unit test for FsaClass

* Fix mutable_grad in older pytorch version

* Fix github actions

* Fix github action PYTHONPATH

* Fix github action PYTHONPATH

* Link pybind11::embed

* import torch first (to fix macos github actions)

* try to fix macos ci

* Revert "Remove pybind dependencies from RaggedArc. (#842)" (#855)

This reverts commit daa98e7.

* Support torchscript. (#839)

* WIP: Support torchscript.

* Test jit module with faked data.

I have compared the output from C++ with that from Python.
The sums of the tensors are equal.

* Use precomputed features to test the correctness.

* Build DenseFsaVec from a torch tensor.

* Get lattice for CTC decoding.

* Support CTC decoding.

* Link sentencepiece statically.

Link sentencepiece dynamically causes segmentation fault at the end
of the process.

* Support loading HLG.pt

* Refactoring.

* Implement HLG decoding.

* Add WaveReader to read wave sound files.

* Take soundfiles as inputs.

* Refactoring.

* Support GPU.

* Minor fixes.

* Fix typos.

* Use kaldifeat v1.7

* Add copyright info.

* Fix compilation for torch >= 1.9.0

* Minor fixes.

* Fix comments.

* Fix style issues.

* Fix compiler warnings.

* Use `torch::class_` to register custom classes. (#856)

* Remove unused code (#857)

* Update doc URL. (#821)

* Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825)

* Support index 2-axes RaggedTensor, Support slicing for RaggedTensor

* Fix compiling errors

* Fix unit test

* Change RaggedTensor.data to RaggedTensor.values

* Fix style

* Add docs

* Run nightly-cpu when pushing code to nightly-cpu branch

* Prune with max_arcs in IntersectDense (#820)

* Add checking for array constructor

* Prune with max arcs

* Minor fix

* Fix typo

* Fix review comments

* Fix typo

* Release v1.8

* Create a ragged tensor from a regular tensor. (#827)

* Create a ragged tensor from a regular tensor.

* Add tests for creating ragged tensors from regular tensors.

* Add more tests.

* Print ragged tensors in a way like what PyTorch is doing.

* Fix test cases.

* Trigger GitHub actions manually. (#829)

* Run GitHub actions on merging. (#830)

* Support printing ragged tensors in a more compact way. (#831)

* Support printing ragged tensors in a more compact way.

* Disable support for torch 1.3.1

* Fix test failures.

* Add levenshtein alignment (#828)

* Add levenshtein graph

* Contruct k2.RaggedTensor in python part

* Fix review comments, return aux_labels in ctc_graph

* Fix tests

* Fix bug of accessing symbols

* Fix bug of accessing symbols

* Change argument name, add levenshtein_distance interface

* Fix test error, add tests for levenshtein_distance

* Fix review comments and add unit test for c++ side

* update the interface of levenshtein alignment

* Fix review comments

* Release v1.9

* Support a[b[i]] where both a and b are ragged tensors. (#833)

* Display import error solution message on MacOS (#837)

* Fix installation doc. (#841)

* Fix installation doc.

Remove Windows support. Will fix it later.

* Fix style issues.

* fix typos in the install instructions (#844)

* make cmake adhere to the modernized way of finding packages outside default dirs (#845)

* import torch first in the smoke tests to preven SEGFAULT (#846)

* Add doc about how to install a CPU version of k2. (#850)

* Add doc about how to install a CPU version of k2.

* Remove property setter of Fsa.labels

* Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life.

* Support PyTorch 1.10. (#851)

* Fix test cases for k2.union() (#853)

* Revert "Construct RaggedArc from unary function tensor (#30)" (#31)

This reverts commit cca7a54.

* Remove unused code.

* Fix github actions.

Avoid downloading all git LFS files.

* Enable github actions for v2.0-pre branch.

Co-authored-by: Wei Kang <wkang@pku.org.cn>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com>

* Implements Cpp version FsaClass (#858)

* Add C++ version FsaClass

* Propagates attributes for CreateFsaVec

* Add more docs

* Remove the code that unnecessary needed currently

* Remove the code unnecessary for ctc decoding & HLG decoding

* Update k2/torch/csrc/deserialization.h

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Fix Comments

* Fix code style

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Using FsaClass for ctc decoding & HLG decoding (#862)

* Using FsaClass for ctc decoding & HLG decoding

* Update docs

* fix evaluating kFsaPropertiesValid (#866)

* Refactor deserialization code (#863)

* Fix compiler warnings about the usage of `tmpnam`.

* Refactor deserialization code.

* Minor fixes.

* Support rescoring with an n-gram LM during decoding (#867)

* Fix compiler warnings about the usage of `tmpnam`.

* Refactor deserialization code.

* Minor fixes.

* Add n-gram LM rescoring.

* Minor fixes.

* Clear cached FSA properties when its labels are changed.

* Fix typos.

* Refactor FsaClass. (#868)

Since FSAs in decoding contain only one or two attributes, we
don't need to use an IValue to add one more indirection. Just
check the type of the attribute and process it correspondingly.

* Refactor bin/decode.cu (#869)

* Add CTC decode.

* Add HLG decoding.

* Add n-gram LM rescoring.

* Remove unused files.

* Fix style issues.

* Add missing files.

* Add attention rescoring. (#870)

* WIP: Add attention rescoring.

* Finish attention rescoring.

* Fix style issues.

* Resolve comments. (#871)

* Resolve comments.

* Minor fixes.

* update v2.0-pre (#922)

* Update doc URL. (#821)

* Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825)

* Support index 2-axes RaggedTensor, Support slicing for RaggedTensor

* Fix compiling errors

* Fix unit test

* Change RaggedTensor.data to RaggedTensor.values

* Fix style

* Add docs

* Run nightly-cpu when pushing code to nightly-cpu branch

* Prune with max_arcs in IntersectDense (#820)

* Add checking for array constructor

* Prune with max arcs

* Minor fix

* Fix typo

* Fix review comments

* Fix typo

* Release v1.8

* Create a ragged tensor from a regular tensor. (#827)

* Create a ragged tensor from a regular tensor.

* Add tests for creating ragged tensors from regular tensors.

* Add more tests.

* Print ragged tensors in a way like what PyTorch is doing.

* Fix test cases.

* Trigger GitHub actions manually. (#829)

* Run GitHub actions on merging. (#830)

* Support printing ragged tensors in a more compact way. (#831)

* Support printing ragged tensors in a more compact way.

* Disable support for torch 1.3.1

* Fix test failures.

* Add levenshtein alignment (#828)

* Add levenshtein graph

* Contruct k2.RaggedTensor in python part

* Fix review comments, return aux_labels in ctc_graph

* Fix tests

* Fix bug of accessing symbols

* Fix bug of accessing symbols

* Change argument name, add levenshtein_distance interface

* Fix test error, add tests for levenshtein_distance

* Fix review comments and add unit test for c++ side

* update the interface of levenshtein alignment

* Fix review comments

* Release v1.9

* Support a[b[i]] where both a and b are ragged tensors. (#833)

* Display import error solution message on MacOS (#837)

* Fix installation doc. (#841)

* Fix installation doc.

Remove Windows support. Will fix it later.

* Fix style issues.

* fix typos in the install instructions (#844)

* make cmake adhere to the modernized way of finding packages outside default dirs (#845)

* import torch first in the smoke tests to preven SEGFAULT (#846)

* Add doc about how to install a CPU version of k2. (#850)

* Add doc about how to install a CPU version of k2.

* Remove property setter of Fsa.labels

* Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life.

* Support PyTorch 1.10. (#851)

* Fix test cases for k2.union() (#853)

* Fix out-of-boundary access (read). (#859)

* Update all the example codes in the docs (#861)

* Update all the example codes in the docs

I have run all the modified codes with  the newest version k2.

* do some changes

* Fix compilation errors with CUB 1.15. (#865)

* Update README. (#873)

* Update README.

* Fix typos.

* Fix ctc graph (make aux_labels of final arcs -1) (#877)

* Fix LICENSE location to k2 folder (#880)

* Release v1.11. (#881)

It contains bugfixes.

* Update documentation for hash.h (#887)

* Update documentation for hash.h

* Typo fix

* Wrap MonotonicLowerBound (#883)

* Wrap MonotonicLowerBound

* Add unit tests

* Support int64; update documents

* Remove extra commas after 'TOPSORTED' properity and fix RaggedTensor constructer parameter 'byte_offset' out-of-range bug. (#892)

Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>

* Fix small typos (#896)

* Fix k2.ragged.create_ragged_shape2 (#901)

Before the fix, we have to specify both `row_splits` and `row_ids`
while calling `k2.create_ragged_shape2` even if one of them is `None`.

After this fix, we only need to specify one of them.

* Add rnnt loss (#891)

* Add cpp code of mutual information

* mutual information working

* Add rnnt loss

* Add pruned rnnt loss

* Minor Fixes

* Minor fixes & fix code style

* Fix cpp style

* Fix code style

* Fix s_begin values in padding positions

* Fix bugs related to boundary; Fix s_begin padding value; Add more tests

* Minor fixes

* Fix comments

* Add boundary to pruned loss tests

* Use more efficient way to fix boundaries (#906)

* Release v1.12 (#907)

* Change the sign of the rnnt_loss and add reduction argument (#911)

* Add right boundary constrains for s_begin

* Minor fixes to the interface of rnnt_loss to make it return positive value

* Fix comments

* Release a new version

* Minor fixes

* Minor fixes to the docs

* Fix building doc. (#908)

* Fix building doc.

* Minor fixes.

* Minor fixes.

* Fix building doc (#912)

* Fix building doc

* Fix flake8

* Support torch 1.10.x (#914)

* Support torch 1.10.x

* Fix installing PyTorch.

* Update INSTALL.rst (#915)

* Update INSTALL.rst

Setting a few additional env variables to enable compilation from source *with CUDA GPU computation support enabled*

* Fix torch/cuda/python versions in the doc. (#918)

* Fix torch/cuda/python versions in the doc.

* Minor fixes.

* Fix building for CUDA 11.6 (#917)

* Fix building for CUDA 11.6

* Minor fixes.

* Implement Unstack (#920)

* Implement unstack

* Remove code does not relate to this PR

* Remove for loop on output dim; add Unstack ragged

* Add more docs

* Fix comments

* Fix docs & unit tests

* SubsetRagged & PruneRagged (#919)

* Extend interface of SubsampleRagged.

* Add interface for pruning ragged tensor.

* Draft of new RNN-T decoding method

* Implements SubsampleRaggedShape

* Implements PruneRagged

* Rename subsample-> subset

* Minor fixes

* Fix comments

Co-authored-by: Daniel Povey <dpovey@gmail.com>

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com>
Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com>
Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com>
Co-authored-by: Daniel Povey <dpovey@gmail.com>
Co-authored-by: drawfish <duisheng.chen@gmail.com>
Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>
Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org>

* Online decoding (#876)

* Add OnlineIntersectDensePruned

* Fix get partial results

* Support online decoding on intersect_dense_pruned

* Update documents

* Update v2.0-pre (#942)

* Update doc URL. (#821)

* Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825)

* Support index 2-axes RaggedTensor, Support slicing for RaggedTensor

* Fix compiling errors

* Fix unit test

* Change RaggedTensor.data to RaggedTensor.values

* Fix style

* Add docs

* Run nightly-cpu when pushing code to nightly-cpu branch

* Prune with max_arcs in IntersectDense (#820)

* Add checking for array constructor

* Prune with max arcs

* Minor fix

* Fix typo

* Fix review comments

* Fix typo

* Release v1.8

* Create a ragged tensor from a regular tensor. (#827)

* Create a ragged tensor from a regular tensor.

* Add tests for creating ragged tensors from regular tensors.

* Add more tests.

* Print ragged tensors in a way like what PyTorch is doing.

* Fix test cases.

* Trigger GitHub actions manually. (#829)

* Run GitHub actions on merging. (#830)

* Support printing ragged tensors in a more compact way. (#831)

* Support printing ragged tensors in a more compact way.

* Disable support for torch 1.3.1

* Fix test failures.

* Add levenshtein alignment (#828)

* Add levenshtein graph

* Contruct k2.RaggedTensor in python part

* Fix review comments, return aux_labels in ctc_graph

* Fix tests

* Fix bug of accessing symbols

* Fix bug of accessing symbols

* Change argument name, add levenshtein_distance interface

* Fix test error, add tests for levenshtein_distance

* Fix review comments and add unit test for c++ side

* update the interface of levenshtein alignment

* Fix review comments

* Release v1.9

* Support a[b[i]] where both a and b are ragged tensors. (#833)

* Display import error solution message on MacOS (#837)

* Fix installation doc. (#841)

* Fix installation doc.

Remove Windows support. Will fix it later.

* Fix style issues.

* fix typos in the install instructions (#844)

* make cmake adhere to the modernized way of finding packages outside default dirs (#845)

* import torch first in the smoke tests to preven SEGFAULT (#846)

* Add doc about how to install a CPU version of k2. (#850)

* Add doc about how to install a CPU version of k2.

* Remove property setter of Fsa.labels

* Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life.

* Support PyTorch 1.10. (#851)

* Fix test cases for k2.union() (#853)

* Fix out-of-boundary access (read). (#859)

* Update all the example codes in the docs (#861)

* Update all the example codes in the docs

I have run all the modified codes with  the newest version k2.

* do some changes

* Fix compilation errors with CUB 1.15. (#865)

* Update README. (#873)

* Update README.

* Fix typos.

* Fix ctc graph (make aux_labels of final arcs -1) (#877)

* Fix LICENSE location to k2 folder (#880)

* Release v1.11. (#881)

It contains bugfixes.

* Update documentation for hash.h (#887)

* Update documentation for hash.h

* Typo fix

* Wrap MonotonicLowerBound (#883)

* Wrap MonotonicLowerBound

* Add unit tests

* Support int64; update documents

* Remove extra commas after 'TOPSORTED' properity and fix RaggedTensor constructer parameter 'byte_offset' out-of-range bug. (#892)

Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>

* Fix small typos (#896)

* Fix k2.ragged.create_ragged_shape2 (#901)

Before the fix, we have to specify both `row_splits` and `row_ids`
while calling `k2.create_ragged_shape2` even if one of them is `None`.

After this fix, we only need to specify one of them.

* Add rnnt loss (#891)

* Add cpp code of mutual information

* mutual information working

* Add rnnt loss

* Add pruned rnnt loss

* Minor Fixes

* Minor fixes & fix code style

* Fix cpp style

* Fix code style

* Fix s_begin values in padding positions

* Fix bugs related to boundary; Fix s_begin padding value; Add more tests

* Minor fixes

* Fix comments

* Add boundary to pruned loss tests

* Use more efficient way to fix boundaries (#906)

* Release v1.12 (#907)

* Change the sign of the rnnt_loss and add reduction argument (#911)

* Add right boundary constrains for s_begin

* Minor fixes to the interface of rnnt_loss to make it return positive value

* Fix comments

* Release a new version

* Minor fixes

* Minor fixes to the docs

* Fix building doc. (#908)

* Fix building doc.

* Minor fixes.

* Minor fixes.

* Fix building doc (#912)

* Fix building doc

* Fix flake8

* Support torch 1.10.x (#914)

* Support torch 1.10.x

* Fix installing PyTorch.

* Update INSTALL.rst (#915)

* Update INSTALL.rst

Setting a few additional env variables to enable compilation from source *with CUDA GPU computation support enabled*

* Fix torch/cuda/python versions in the doc. (#918)

* Fix torch/cuda/python versions in the doc.

* Minor fixes.

* Fix building for CUDA 11.6 (#917)

* Fix building for CUDA 11.6

* Minor fixes.

* Implement Unstack (#920)

* Implement unstack

* Remove code does not relate to this PR

* Remove for loop on output dim; add Unstack ragged

* Add more docs

* Fix comments

* Fix docs & unit tests

* SubsetRagged & PruneRagged (#919)

* Extend interface of SubsampleRagged.

* Add interface for pruning ragged tensor.

* Draft of new RNN-T decoding method

* Implements SubsampleRaggedShape

* Implements PruneRagged

* Rename subsample-> subset

* Minor fixes

* Fix comments

Co-authored-by: Daniel Povey <dpovey@gmail.com>

* Add Hash64 (#895)

* Add hash64

* Fix tests

* Resize hash64

* Fix comments

* fix typo

* Modified rnnt (#902)

* Add modified mutual_information_recursion

* Add modified rnnt loss

* Using more efficient way to fix boundaries

* Fix modified pruned rnnt loss

* Fix the s_begin constrains of pruned loss for modified version transducer

* Fix Stack (#925)

* return the correct layer

* unskip the test

* Fix 'TypeError' of rnnt_loss_pruned function. (#924)

* Fix 'TypeError' of rnnt_loss_simple function.

Fix 'TypeError' exception when calling rnnt_loss_simple(..., return_grad=False)  at validation steps.

* Fix 'MutualInformationRecursionFunction.forward()' return type check error for pytorch < 1.10.x

* Modify return type.

* Add documents about class MutualInformationRecursionFunction.

* Formated code style.

* Fix rnnt_loss_smoothed return type.

Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>

* Support torch 1.11.0 and CUDA 11.5 (#931)

* Support torch 1.11.0 and CUDA 11.5

* Implement Rnnt decoding (#926)

* first working draft of rnnt decoding

* FormatOutput works...

* Different num frames for FormatOutput works

* Update docs

* Fix comments, break advance into several stages, add more docs

* Add python wrapper

* Add more docs

* Minor fixes

* Fix comments

* fix building docs (#933)

* Release v1.14

* Remove unused DiscountedCumSum. (#936)

* Fix compiler warnings. (#937)

* Fix compiler warnings.

* Minor fixes for RNN-T decoding. (#938)

* Minor fixes for RNN-T decoding.

* Removes arcs with label 0 from the TrivialGraph. (#939)

* Implement linear_fsa_with_self_loops. (#940)

* Implement linear_fsa_with_self_loops.

* Fix the pruning with max-states (#941)

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com>
Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com>
Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com>
Co-authored-by: Daniel Povey <dpovey@gmail.com>
Co-authored-by: drawfish <duisheng.chen@gmail.com>
Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>
Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org>
Co-authored-by: Wang, Guanbo <wgb14@outlook.com>

* update v2.0-pre (#953)

* Update doc URL. (#821)

* Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825)

* Support index 2-axes RaggedTensor, Support slicing for RaggedTensor

* Fix compiling errors

* Fix unit test

* Change RaggedTensor.data to RaggedTensor.values

* Fix style

* Add docs

* Run nightly-cpu when pushing code to nightly-cpu branch

* Prune with max_arcs in IntersectDense (#820)

* Add checking for array constructor

* Prune with max arcs

* Minor fix

* Fix typo

* Fix review comments

* Fix typo

* Release v1.8

* Create a ragged tensor from a regular tensor. (#827)

* Create a ragged tensor from a regular tensor.

* Add tests for creating ragged tensors from regular tensors.

* Add more tests.

* Print ragged tensors in a way like what PyTorch is doing.

* Fix test cases.

* Trigger GitHub actions manually. (#829)

* Run GitHub actions on merging. (#830)

* Support printing ragged tensors in a more compact way. (#831)

* Support printing ragged tensors in a more compact way.

* Disable support for torch 1.3.1

* Fix test failures.

* Add levenshtein alignment (#828)

* Add levenshtein graph

* Contruct k2.RaggedTensor in python part

* Fix review comments, return aux_labels in ctc_graph

* Fix tests

* Fix bug of accessing symbols

* Fix bug of accessing symbols

* Change argument name, add levenshtein_distance interface

* Fix test error, add tests for levenshtein_distance

* Fix review comments and add unit test for c++ side

* update the interface of levenshtein alignment

* Fix review comments

* Release v1.9

* Support a[b[i]] where both a and b are ragged tensors. (#833)

* Display import error solution message on MacOS (#837)

* Fix installation doc. (#841)

* Fix installation doc.

Remove Windows support. Will fix it later.

* Fix style issues.

* fix typos in the install instructions (#844)

* make cmake adhere to the modernized way of finding packages outside default dirs (#845)

* import torch first in the smoke tests to preven SEGFAULT (#846)

* Add doc about how to install a CPU version of k2. (#850)

* Add doc about how to install a CPU version of k2.

* Remove property setter of Fsa.labels

* Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life.

* Support PyTorch 1.10. (#851)

* Fix test cases for k2.union() (#853)

* Fix out-of-boundary access (read). (#859)

* Update all the example codes in the docs (#861)

* Update all the example codes in the docs

I have run all the modified codes with  the newest version k2.

* do some changes

* Fix compilation errors with CUB 1.15. (#865)

* Update README. (#873)

* Update README.

* Fix typos.

* Fix ctc graph (make aux_labels of final arcs -1) (#877)

* Fix LICENSE location to k2 folder (#880)

* Release v1.11. (#881)

It contains bugfixes.

* Update documentation for hash.h (#887)

* Update documentation for hash.h

* Typo fix

* Wrap MonotonicLowerBound (#883)

* Wrap MonotonicLowerBound

* Add unit tests

* Support int64; update documents

* Remove extra commas after 'TOPSORTED' properity and fix RaggedTensor constructer parameter 'byte_offset' out-of-range bug. (#892)

Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>

* Fix small typos (#896)

* Fix k2.ragged.create_ragged_shape2 (#901)

Before the fix, we have to specify both `row_splits` and `row_ids`
while calling `k2.create_ragged_shape2` even if one of them is `None`.

After this fix, we only need to specify one of them.

* Add rnnt loss (#891)

* Add cpp code of mutual information

* mutual information working

* Add rnnt loss

* Add pruned rnnt loss

* Minor Fixes

* Minor fixes & fix code style

* Fix cpp style

* Fix code style

* Fix s_begin values in padding positions

* Fix bugs related to boundary; Fix s_begin padding value; Add more tests

* Minor fixes

* Fix comments

* Add boundary to pruned loss tests

* Use more efficient way to fix boundaries (#906)

* Release v1.12 (#907)

* Change the sign of the rnnt_loss and add reduction argument (#911)

* Add right boundary constrains for s_begin

* Minor fixes to the interface of rnnt_loss to make it return positive value

* Fix comments

* Release a new version

* Minor fixes

* Minor fixes to the docs

* Fix building doc. (#908)

* Fix building doc.

* Minor fixes.

* Minor fixes.

* Fix building doc (#912)

* Fix building doc

* Fix flake8

* Support torch 1.10.x (#914)

* Support torch 1.10.x

* Fix installing PyTorch.

* Update INSTALL.rst (#915)

* Update INSTALL.rst

Setting a few additional env variables to enable compilation from source *with CUDA GPU computation support enabled*

* Fix torch/cuda/python versions in the doc. (#918)

* Fix torch/cuda/python versions in the doc.

* Minor fixes.

* Fix building for CUDA 11.6 (#917)

* Fix building for CUDA 11.6

* Minor fixes.

* Implement Unstack (#920)

* Implement unstack

* Remove code does not relate to this PR

* Remove for loop on output dim; add Unstack ragged

* Add more docs

* Fix comments

* Fix docs & unit tests

* SubsetRagged & PruneRagged (#919)

* Extend interface of SubsampleRagged.

* Add interface for pruning ragged tensor.

* Draft of new RNN-T decoding method

* Implements SubsampleRaggedShape

* Implements PruneRagged

* Rename subsample-> subset

* Minor fixes

* Fix comments

Co-authored-by: Daniel Povey <dpovey@gmail.com>

* Add Hash64 (#895)

* Add hash64

* Fix tests

* Resize hash64

* Fix comments

* fix typo

* Modified rnnt (#902)

* Add modified mutual_information_recursion

* Add modified rnnt loss

* Using more efficient way to fix boundaries

* Fix modified pruned rnnt loss

* Fix the s_begin constrains of pruned loss for modified version transducer

* Fix Stack (#925)

* return the correct layer

* unskip the test

* Fix 'TypeError' of rnnt_loss_pruned function. (#924)

* Fix 'TypeError' of rnnt_loss_simple function.

Fix 'TypeError' exception when calling rnnt_loss_simple(..., return_grad=False)  at validation steps.

* Fix 'MutualInformationRecursionFunction.forward()' return type check error for pytorch < 1.10.x

* Modify return type.

* Add documents about class MutualInformationRecursionFunction.

* Formated code style.

* Fix rnnt_loss_smoothed return type.

Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>

* Support torch 1.11.0 and CUDA 11.5 (#931)

* Support torch 1.11.0 and CUDA 11.5

* Implement Rnnt decoding (#926)

* first working draft of rnnt decoding

* FormatOutput works...

* Different num frames for FormatOutput works

* Update docs

* Fix comments, break advance into several stages, add more docs

* Add python wrapper

* Add more docs

* Minor fixes

* Fix comments

* fix building docs (#933)

* Release v1.14

* Remove unused DiscountedCumSum. (#936)

* Fix compiler warnings. (#937)

* Fix compiler warnings.

* Minor fixes for RNN-T decoding. (#938)

* Minor fixes for RNN-T decoding.

* Removes arcs with label 0 from the TrivialGraph. (#939)

* Implement linear_fsa_with_self_loops. (#940)

* Implement linear_fsa_with_self_loops.

* Fix the pruning with max-states (#941)

* Rnnt allow different encoder/decoder dims (#945)

* Allow different encoder and decoder dim in rnnt_pruning

* Bug fixes

* Supporting building k2 on Windows (#946)

* Fix nightly windows CPU build (#948)

* Fix nightly building k2 for windows.

* Run nightly build only if there are new commits.

* Check the versions of PyTorch and CUDA at the import time. (#949)

* Check the versions of PyTorch and CUDA at the import time.

* More straightforward message when CUDA support is missing (#950)

* Implement ArrayOfRagged (#927)

* Implement ArrayOfRagged

* Fix issues and pass tests

* fix style

* change few statements of functions and move the definiation of template Array1OfRagged to header file

* add offsets test code

* Fix precision (#951)

* Fix precision

* Using different pow version for windows and *nix

* Use int64_t pow

* Minor fixes

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com>
Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com>
Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com>
Co-authored-by: Daniel Povey <dpovey@gmail.com>
Co-authored-by: drawfish <duisheng.chen@gmail.com>
Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>
Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org>
Co-authored-by: Wang, Guanbo <wgb14@outlook.com>
Co-authored-by: Nickolay V. Shmyrev <nshmyrev@gmail.com>
Co-authored-by: LvHang <hanglyu1991@gmail.com>

* Add C++ Rnnt demo (#947)

* rnnt_demo compiles

* Change graph in RnntDecodingStream from shared_ptr to const reference

* Change out_map from Array1 to Ragged

* Add rnnt demo

* Minor fixes

* Add more docs

* Support log_add when getting best path

* Port kaldi::ParseOptions for parsing commandline options. (#974)

* Port kaldi::ParseOptions for parsing commandline options.

* Add more tests.

* More tests.

* Greedy search and modified beam search for pruned stateless RNN-T. (#975)

* First version of greedy search.

* WIP: Implement modified beam search and greedy search for pruned RNN-T.

* Implement modified beam search.

* Fix compiler warnings

* Fix style issues

* Update torch_api.h to include APIs for CTC decoding

Co-authored-by: Wei Kang <wkang@pku.org.cn>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com>
Co-authored-by: pingfengluo <pingfengluo@gmail.com>
Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com>
Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com>
Co-authored-by: Daniel Povey <dpovey@gmail.com>
Co-authored-by: drawfish <duisheng.chen@gmail.com>
Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com>
Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org>
Co-authored-by: Wang, Guanbo <wgb14@outlook.com>
Co-authored-by: Nickolay V. Shmyrev <nshmyrev@gmail.com>
Co-authored-by: LvHang <hanglyu1991@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Ready for review and trigger GitHub actions to run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants