[WIP]: Move k2.Fsa to C++ #814

csukuangfj · 2021-08-29T15:48:01Z

This is an initial attempt to move most of the code, if not all, about k2.Fsa to C++.

@danpovey I am not sure if this is the right direction to go.

My intention is to make k2 a pure C++ library and what we provide are just some *.so libraries.
(Of course, it does not change the way to install k2)

Everything currently written in Python can be moved to C++, I believe (though it needs some time to verify whether it is feasible).

The advantage is that it makes k2 more production-friendly.

The following is some example code for demonstration of the current commit in this pull-request.
(Note: k2.Fsa.arc_sort() supports autograd)

#!/usr/bin/env python3

import torch
from _k2.ragged import Fsa

s = """
0 1 2 10 0.2
0 1 1 20 0.3
1 2 -1 30 4.
2
"""
fsa = Fsa(s, ["aux_labels"])
fsa.requires_grad_(True)
print("fsa.scores:", fsa.scores)

sorted_fsa = fsa.arc_sort()
print("sorted_fsa.scores:", sorted_fsa.scores)
(sorted_fsa.scores * torch.tensor([10, 20, 30])).sum().backward()
print(fsa.grad)

print(fsa)
print("-" * 10, "after sort", "-" * 10)
print(sorted_fsa)

The output of the above code is:

fsa.scores: tensor([0.2000, 0.3000, 4.0000], requires_grad=True)
sorted_fsa.scores: tensor([0.3000, 0.2000, 4.0000], grad_fn=<ArcSortFunction>>)
tensor([20., 10., 30.])
k2.Fsa: 0 1 2 10 0.2
0 1 1 20 0.3
1 2 -1 30 4
2

---------- after sort ----------
k2.Fsa: 0 1 1 0.3
0 1 2 0.2
1 2 -1 4
2

csukuangfj · 2021-08-29T15:49:39Z

k2/python/csrc/torch/v2/autograd/arc_sort.h

+namespace k2 {
+
+// see https://pytorch.org/tutorials/advanced/cpp_autograd
+class ArcSortFunction : public torch::autograd::Function<ArcSortFunction> {


This is how we do backprop for ArcSort.

Ah OK, cool.

danpovey · 2021-08-30T03:29:18Z

docs/source/_k2/ragged/tensor.py

+    pass
+
+
+class Tensor(object):


I'm not sure that it is is a good idea to call this simply Tensor, IMO there is too much pontential for confusion here(?) RaggedTensor might be a possibility? Or just Ragged, maybe?

The Tensor is put inside the module k2.ragged; its fully qualified name is k2.ragged.Tensor.

If it is called RaggedTensor, shall we put it in k2, or k2.ragged? That is, shall we call it
k2.RaggedTensor or k2.ragged.RaggedTensor?

By the way, this file is going to be deleted as the documentation is now contained in the C++ header files.
This pull request is based on #812 .

If you like the idea (i.e., moving all possible parts from Python to C++), I will close this pullrequest and
add commits from the pull-request to #812

I think we could maybe keep it as k2.ragged.RaggedTensor but also import it into k2.
I still have to think about this. I'm impressed that you figured out how to do it, and we probably need to do this at
some point, but for me the main question in my mind is what is going to happen to class k2.Fsa. That has a bunch of
logic, e.g. about attributes, that I wonder how it will be transferred to C++; and once it is C++, who will be able to
maintain it!

danpovey · 2021-08-30T03:30:05Z

k2/python/csrc/torch/v2/doc/any.h

+
+Caution:
+  Currently, only support for dtypes ``torch.int32``, ``torch.float32``, and
+  ``torch.float64`` are implemented. We can support other types if needed.


I'm curious how we access arrays of type Arc from Python.

That is doable, like what we are currently doing. That is, convert Array1<Arc> to a 2-d int32 torch.Tensor, though the last column of the tensor actually contains floats, but is reinterpreted as ints.

csukuangfj · 2021-08-30T03:50:57Z

k2/python/csrc/torch/v2/ragged_arc.h

+  Ragged<Arc> fsa;
+  torch::Tensor scores;  // shares the same memory with fsa.values
+
+  std::unordered_map<std::string, torch::Tensor> tensor_attrs;


#814 (comment)

but for me the main question in my mind is what is going to happen to class k2.Fsa. That has a bunch of
logic, e.g. about attributes, that I wonder how it will be transferred to C++; and once it is C++, who will be able to
maintain it!

I would propose to let these three maps manage all attributes of k2.Fsa.

We can add

__getattr__

__setattr__
to RaggedArc when wrapping it to Python. In this way, we can support assigning
arbitrary attributes to an Fsa in Python.

If the attribute is a tensor, we save it to tensor_attrs.

If the attribute is a ragged tensor, we save it to ragged_tensor_attrs.

Otherwise, we save it to other_attrs.

all_attr_names is to ensure that an attribute has exactly only one type.

Ah, I see, I missed this.
I feel like the name RaggedArc is misleading, since this implements the full Fsa logic.
Maybe something like FsaWrapper or FsaClass would be clearer.
My main concern with this, I suppose, is that the code becomes less easy for others to understand,
view or maintain, because C++ is less accessible. But it does solve a potential problem with deployment
in a C++-only environment.
Also, if we go down this road, it's going to be a lot of work, and I wonder whether it can be broken up
into stages somehow? E.g. first adding the Any stuff, and later move class Fsa to C++?

Also, if we go down this road, it's going to be a lot of work, and I wonder whether it can be broken up
into stages somehow? E.g. first adding the Any stuff, and later move class Fsa to C++?

Yes, I agree. I will try to get RaggedAny done first.

My main concern with this, I suppose, is that the code becomes less easy for others to understand,

I think this can be mitigated by detailed documentation.

Sounds exciting!
@csukuangfj You could submit RaggedAny first, I'll take part in some stuff of moving class Fsa to C++, if possible.

@pkufool
Thanks!

csukuangfj · 2021-09-08T07:43:21Z

Shall we move this to a separate branch, say v2.0-pre?

danpovey · 2021-09-08T07:57:44Z

Yes, that's a good idea I think.

pkufool · 2021-09-09T02:55:50Z

Since this is not urgent, I want to have a try to take over this task, doing it gradually (about 30% of my time), finally finish this in about one or two month. @csukuangfj What do you think.

csukuangfj · 2021-09-09T03:13:00Z

Since this is not urgent, I want to have a try to take over this task, doing it gradually (about 30% of my time), finally finish this in about one or two months. @csukuangfj What do you think.

👍
You can go ahead and I will help you to do this.

pkufool · 2021-09-09T03:15:18Z

Since this is not urgent, I want to have a try to take over this task, doing it gradually (about 30% of my time), finally finish this in about one or two months. @csukuangfj What do you think.

👍
You can go ahead and I will help you to do this.

Thanks! of course, I need your help!

csukuangfj · 2021-09-12T11:24:34Z

k2/python/tests/fsa_v2_test.py

+                torch.cuda.set_device(1)
+                cls.devices.append(torch.device("cuda", 1))
+
+    def test_init_acceptor(self):


@danpovey

This shows how we access attributes of FSAs from Python, whose
underlying implementation is in C++.

Will show how to do autograd on these attributes for FSA operations in later commits.

k2-fsa#825) * Support index 2-axes RaggedTensor, Support slicing for RaggedTensor * Fix compiling errors * Fix unit test * Change RaggedTensor.data to RaggedTensor.values * Fix style * Add docs * Run nightly-cpu when pushing code to nightly-cpu branch

* Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo

* Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases.

* Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures.

csukuangfj · 2021-09-17T11:31:03Z

There are several TODOs:

fsa_from_unary_function_tensor

k2/k2/python/k2/utils.py

Line 423 in 8030001

def fsa_from_unary_function_tensor(src: Fsa, dest_arcs: _k2.RaggedArc,
fsa_from_unary_function_ragged

k2/k2/python/k2/utils.py

Line 467 in 8030001

def fsa_from_unary_function_ragged(src: Fsa,
fsa_from_binary_function_tensor

k2/k2/python/k2/utils.py

Line 552 in 8030001

def fsa_from_binary_function_tensor(a_fsa: Fsa, b_fsa: Fsa,
_IndexSelectFunction

k2/k2/python/k2/ops.py

Line 27 in 8030001

class _IndexSelectFunction(torch.autograd.Function):

After the above items are done, we can start to move fsa_algo.py to C++.

@pkufool You can select which item(s) you want to implement and I will take the remaining.

pkufool · 2021-09-18T02:26:52Z

@csukuangfj I'll try the first two first.

* Add levenshtein graph * Contruct k2.RaggedTensor in python part * Fix review comments, return aux_labels in ctc_graph * Fix tests * Fix bug of accessing symbols * Fix bug of accessing symbols * Change argument name, add levenshtein_distance interface * Fix test error, add tests for levenshtein_distance * Fix review comments and add unit test for c++ side * update the interface of levenshtein alignment * Fix review comments

csukuangfj · 2021-09-19T08:25:22Z

k2/python/csrc/torch/v2/autograd/get_forward_scores.h

+             Its dtype is torch.float64 if use_double_scores is True.
+             The dtype is torch.float32 if use_double_scores is False.
+   */
+  static torch::Tensor forward(torch::autograd::AutogradContext *ctx,


This shows how to implement autograd for fsa.get_forward_scores().

* Construct RaggedArc from unary function tensor * Move fsa_from_unary_ragged and fsa_from_binary_tensor to C++ * add unit test to from unary function; add more functions to fsa * Remove some rabbish code * Add more unit tests and docs * Remove the unused code * Fix review comments, propagate attributes in To() * Change the argument type from RaggedAny to Ragged<int32_t> in autograd function * Delete declaration for template function * Apply suggestions from code review Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Fix documentation errors Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

csukuangfj · 2021-09-30T11:10:45Z

Merging it to the branch v2.0-pre and we (@pkufool ) will add more commits incrementally.

* [WIP]: Move k2.Fsa to C++ (#814) * Make k2 ragged tensor more PyTorch-y like. * Refactoring: Start to add the wrapper class AnyTensor. * Refactoring. * initial attempt to support autograd. * First working version with autograd for Sum(). * Fix comments. * Support __getitem__ and pickling. * Add more docs for k2.ragged.Tensor * Put documentation in header files. * Minor fixes. * Fix a typo. * Fix an error. * Add more doc. * Wrap RaggedShape. * [Not for Merge]: Move k2.Fsa related code to C++. * Remove extra files. * Update doc URL. (#821) * Support manipulating attributes of k2.ragged.Fsa. * Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825) * Support index 2-axes RaggedTensor, Support slicing for RaggedTensor * Fix compiling errors * Fix unit test * Change RaggedTensor.data to RaggedTensor.values * Fix style * Add docs * Run nightly-cpu when pushing code to nightly-cpu branch * Prune with max_arcs in IntersectDense (#820) * Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo * Release v1.8 * Create a ragged tensor from a regular tensor. (#827) * Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases. * Trigger GitHub actions manually. (#829) * Run GitHub actions on merging. (#830) * Support printing ragged tensors in a more compact way. (#831) * Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures. * Add levenshtein alignment (#828) * Add levenshtein graph * Contruct k2.RaggedTensor in python part * Fix review comments, return aux_labels in ctc_graph * Fix tests * Fix bug of accessing symbols * Fix bug of accessing symbols * Change argument name, add levenshtein_distance interface * Fix test error, add tests for levenshtein_distance * Fix review comments and add unit test for c++ side * update the interface of levenshtein alignment * Fix review comments * Release v1.9 * Add Fsa.get_forward_scores. * Implement backprop for Fsa.get_forward_scores() * Construct RaggedArc from unary function tensor (#30) * Construct RaggedArc from unary function tensor * Move fsa_from_unary_ragged and fsa_from_binary_tensor to C++ * add unit test to from unary function; add more functions to fsa * Remove some rabbish code * Add more unit tests and docs * Remove the unused code * Fix review comments, propagate attributes in To() * Change the argument type from RaggedAny to Ragged<int32_t> in autograd function * Delete declaration for template function * Apply suggestions from code review Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Fix documentation errors Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> Co-authored-by: Wei Kang <wkang@pku.org.cn> * Remove pybind dependencies from RaggedArc. (#842) * Convert py::object and torch::IValue to each other * Remove py::object from RaggedAny * Remove py::object from RaggedArc * Move files to torch directory * remove unused files * Add unit tests * Remove v2 folder * Remove unused code * Remove unused files * Fix review comments & fix github actions * Check Ivalue contains RaggedAny * Minor fixes * Add attributes related unit test for FsaClass * Fix mutable_grad in older pytorch version * Fix github actions * Fix github action PYTHONPATH * Fix github action PYTHONPATH * Link pybind11::embed * import torch first (to fix macos github actions) * try to fix macos ci * Revert "Remove pybind dependencies from RaggedArc. (#842)" (#855) This reverts commit daa98e7. * Support torchscript. (#839) * WIP: Support torchscript. * Test jit module with faked data. I have compared the output from C++ with that from Python. The sums of the tensors are equal. * Use precomputed features to test the correctness. * Build DenseFsaVec from a torch tensor. * Get lattice for CTC decoding. * Support CTC decoding. * Link sentencepiece statically. Link sentencepiece dynamically causes segmentation fault at the end of the process. * Support loading HLG.pt * Refactoring. * Implement HLG decoding. * Add WaveReader to read wave sound files. * Take soundfiles as inputs. * Refactoring. * Support GPU. * Minor fixes. * Fix typos. * Use kaldifeat v1.7 * Add copyright info. * Fix compilation for torch >= 1.9.0 * Minor fixes. * Fix comments. * Fix style issues. * Fix compiler warnings. * Use `torch::class_` to register custom classes. (#856) * Remove unused code (#857) * Update doc URL. (#821) * Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825) * Support index 2-axes RaggedTensor, Support slicing for RaggedTensor * Fix compiling errors * Fix unit test * Change RaggedTensor.data to RaggedTensor.values * Fix style * Add docs * Run nightly-cpu when pushing code to nightly-cpu branch * Prune with max_arcs in IntersectDense (#820) * Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo * Release v1.8 * Create a ragged tensor from a regular tensor. (#827) * Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases. * Trigger GitHub actions manually. (#829) * Run GitHub actions on merging. (#830) * Support printing ragged tensors in a more compact way. (#831) * Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures. * Add levenshtein alignment (#828) * Add levenshtein graph * Contruct k2.RaggedTensor in python part * Fix review comments, return aux_labels in ctc_graph * Fix tests * Fix bug of accessing symbols * Fix bug of accessing symbols * Change argument name, add levenshtein_distance interface * Fix test error, add tests for levenshtein_distance * Fix review comments and add unit test for c++ side * update the interface of levenshtein alignment * Fix review comments * Release v1.9 * Support a[b[i]] where both a and b are ragged tensors. (#833) * Display import error solution message on MacOS (#837) * Fix installation doc. (#841) * Fix installation doc. Remove Windows support. Will fix it later. * Fix style issues. * fix typos in the install instructions (#844) * make cmake adhere to the modernized way of finding packages outside default dirs (#845) * import torch first in the smoke tests to preven SEGFAULT (#846) * Add doc about how to install a CPU version of k2. (#850) * Add doc about how to install a CPU version of k2. * Remove property setter of Fsa.labels * Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life. * Support PyTorch 1.10. (#851) * Fix test cases for k2.union() (#853) * Revert "Construct RaggedArc from unary function tensor (#30)" (#31) This reverts commit cca7a54. * Remove unused code. * Fix github actions. Avoid downloading all git LFS files. * Enable github actions for v2.0-pre branch. Co-authored-by: Wei Kang <wkang@pku.org.cn> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com> * Implements Cpp version FsaClass (#858) * Add C++ version FsaClass * Propagates attributes for CreateFsaVec * Add more docs * Remove the code that unnecessary needed currently * Remove the code unnecessary for ctc decoding & HLG decoding * Update k2/torch/csrc/deserialization.h Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Fix Comments * Fix code style Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> * Using FsaClass for ctc decoding & HLG decoding (#862) * Using FsaClass for ctc decoding & HLG decoding * Update docs * fix evaluating kFsaPropertiesValid (#866) * Refactor deserialization code (#863) * Fix compiler warnings about the usage of `tmpnam`. * Refactor deserialization code. * Minor fixes. * Support rescoring with an n-gram LM during decoding (#867) * Fix compiler warnings about the usage of `tmpnam`. * Refactor deserialization code. * Minor fixes. * Add n-gram LM rescoring. * Minor fixes. * Clear cached FSA properties when its labels are changed. * Fix typos. * Refactor FsaClass. (#868) Since FSAs in decoding contain only one or two attributes, we don't need to use an IValue to add one more indirection. Just check the type of the attribute and process it correspondingly. * Refactor bin/decode.cu (#869) * Add CTC decode. * Add HLG decoding. * Add n-gram LM rescoring. * Remove unused files. * Fix style issues. * Add missing files. * Add attention rescoring. (#870) * WIP: Add attention rescoring. * Finish attention rescoring. * Fix style issues. * Resolve comments. (#871) * Resolve comments. * Minor fixes. * update v2.0-pre (#922) * Update doc URL. (#821) * Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825) * Support index 2-axes RaggedTensor, Support slicing for RaggedTensor * Fix compiling errors * Fix unit test * Change RaggedTensor.data to RaggedTensor.values * Fix style * Add docs * Run nightly-cpu when pushing code to nightly-cpu branch * Prune with max_arcs in IntersectDense (#820) * Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo * Release v1.8 * Create a ragged tensor from a regular tensor. (#827) * Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases. * Trigger GitHub actions manually. (#829) * Run GitHub actions on merging. (#830) * Support printing ragged tensors in a more compact way. (#831) * Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures. * Add levenshtein alignment (#828) * Add levenshtein graph * Contruct k2.RaggedTensor in python part * Fix review comments, return aux_labels in ctc_graph * Fix tests * Fix bug of accessing symbols * Fix bug of accessing symbols * Change argument name, add levenshtein_distance interface * Fix test error, add tests for levenshtein_distance * Fix review comments and add unit test for c++ side * update the interface of levenshtein alignment * Fix review comments * Release v1.9 * Support a[b[i]] where both a and b are ragged tensors. (#833) * Display import error solution message on MacOS (#837) * Fix installation doc. (#841) * Fix installation doc. Remove Windows support. Will fix it later. * Fix style issues. * fix typos in the install instructions (#844) * make cmake adhere to the modernized way of finding packages outside default dirs (#845) * import torch first in the smoke tests to preven SEGFAULT (#846) * Add doc about how to install a CPU version of k2. (#850) * Add doc about how to install a CPU version of k2. * Remove property setter of Fsa.labels * Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life. * Support PyTorch 1.10. (#851) * Fix test cases for k2.union() (#853) * Fix out-of-boundary access (read). (#859) * Update all the example codes in the docs (#861) * Update all the example codes in the docs I have run all the modified codes with the newest version k2. * do some changes * Fix compilation errors with CUB 1.15. (#865) * Update README. (#873) * Update README. * Fix typos. * Fix ctc graph (make aux_labels of final arcs -1) (#877) * Fix LICENSE location to k2 folder (#880) * Release v1.11. (#881) It contains bugfixes. * Update documentation for hash.h (#887) * Update documentation for hash.h * Typo fix * Wrap MonotonicLowerBound (#883) * Wrap MonotonicLowerBound * Add unit tests * Support int64; update documents * Remove extra commas after 'TOPSORTED' properity and fix RaggedTensor constructer parameter 'byte_offset' out-of-range bug. (#892) Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> * Fix small typos (#896) * Fix k2.ragged.create_ragged_shape2 (#901) Before the fix, we have to specify both `row_splits` and `row_ids` while calling `k2.create_ragged_shape2` even if one of them is `None`. After this fix, we only need to specify one of them. * Add rnnt loss (#891) * Add cpp code of mutual information * mutual information working * Add rnnt loss * Add pruned rnnt loss * Minor Fixes * Minor fixes & fix code style * Fix cpp style * Fix code style * Fix s_begin values in padding positions * Fix bugs related to boundary; Fix s_begin padding value; Add more tests * Minor fixes * Fix comments * Add boundary to pruned loss tests * Use more efficient way to fix boundaries (#906) * Release v1.12 (#907) * Change the sign of the rnnt_loss and add reduction argument (#911) * Add right boundary constrains for s_begin * Minor fixes to the interface of rnnt_loss to make it return positive value * Fix comments * Release a new version * Minor fixes * Minor fixes to the docs * Fix building doc. (#908) * Fix building doc. * Minor fixes. * Minor fixes. * Fix building doc (#912) * Fix building doc * Fix flake8 * Support torch 1.10.x (#914) * Support torch 1.10.x * Fix installing PyTorch. * Update INSTALL.rst (#915) * Update INSTALL.rst Setting a few additional env variables to enable compilation from source *with CUDA GPU computation support enabled* * Fix torch/cuda/python versions in the doc. (#918) * Fix torch/cuda/python versions in the doc. * Minor fixes. * Fix building for CUDA 11.6 (#917) * Fix building for CUDA 11.6 * Minor fixes. * Implement Unstack (#920) * Implement unstack * Remove code does not relate to this PR * Remove for loop on output dim; add Unstack ragged * Add more docs * Fix comments * Fix docs & unit tests * SubsetRagged & PruneRagged (#919) * Extend interface of SubsampleRagged. * Add interface for pruning ragged tensor. * Draft of new RNN-T decoding method * Implements SubsampleRaggedShape * Implements PruneRagged * Rename subsample-> subset * Minor fixes * Fix comments Co-authored-by: Daniel Povey <dpovey@gmail.com> Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com> Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com> Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com> Co-authored-by: Daniel Povey <dpovey@gmail.com> Co-authored-by: drawfish <duisheng.chen@gmail.com> Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org> * Online decoding (#876) * Add OnlineIntersectDensePruned * Fix get partial results * Support online decoding on intersect_dense_pruned * Update documents * Update v2.0-pre (#942) * Update doc URL. (#821) * Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825) * Support index 2-axes RaggedTensor, Support slicing for RaggedTensor * Fix compiling errors * Fix unit test * Change RaggedTensor.data to RaggedTensor.values * Fix style * Add docs * Run nightly-cpu when pushing code to nightly-cpu branch * Prune with max_arcs in IntersectDense (#820) * Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo * Release v1.8 * Create a ragged tensor from a regular tensor. (#827) * Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases. * Trigger GitHub actions manually. (#829) * Run GitHub actions on merging. (#830) * Support printing ragged tensors in a more compact way. (#831) * Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures. * Add levenshtein alignment (#828) * Add levenshtein graph * Contruct k2.RaggedTensor in python part * Fix review comments, return aux_labels in ctc_graph * Fix tests * Fix bug of accessing symbols * Fix bug of accessing symbols * Change argument name, add levenshtein_distance interface * Fix test error, add tests for levenshtein_distance * Fix review comments and add unit test for c++ side * update the interface of levenshtein alignment * Fix review comments * Release v1.9 * Support a[b[i]] where both a and b are ragged tensors. (#833) * Display import error solution message on MacOS (#837) * Fix installation doc. (#841) * Fix installation doc. Remove Windows support. Will fix it later. * Fix style issues. * fix typos in the install instructions (#844) * make cmake adhere to the modernized way of finding packages outside default dirs (#845) * import torch first in the smoke tests to preven SEGFAULT (#846) * Add doc about how to install a CPU version of k2. (#850) * Add doc about how to install a CPU version of k2. * Remove property setter of Fsa.labels * Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life. * Support PyTorch 1.10. (#851) * Fix test cases for k2.union() (#853) * Fix out-of-boundary access (read). (#859) * Update all the example codes in the docs (#861) * Update all the example codes in the docs I have run all the modified codes with the newest version k2. * do some changes * Fix compilation errors with CUB 1.15. (#865) * Update README. (#873) * Update README. * Fix typos. * Fix ctc graph (make aux_labels of final arcs -1) (#877) * Fix LICENSE location to k2 folder (#880) * Release v1.11. (#881) It contains bugfixes. * Update documentation for hash.h (#887) * Update documentation for hash.h * Typo fix * Wrap MonotonicLowerBound (#883) * Wrap MonotonicLowerBound * Add unit tests * Support int64; update documents * Remove extra commas after 'TOPSORTED' properity and fix RaggedTensor constructer parameter 'byte_offset' out-of-range bug. (#892) Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> * Fix small typos (#896) * Fix k2.ragged.create_ragged_shape2 (#901) Before the fix, we have to specify both `row_splits` and `row_ids` while calling `k2.create_ragged_shape2` even if one of them is `None`. After this fix, we only need to specify one of them. * Add rnnt loss (#891) * Add cpp code of mutual information * mutual information working * Add rnnt loss * Add pruned rnnt loss * Minor Fixes * Minor fixes & fix code style * Fix cpp style * Fix code style * Fix s_begin values in padding positions * Fix bugs related to boundary; Fix s_begin padding value; Add more tests * Minor fixes * Fix comments * Add boundary to pruned loss tests * Use more efficient way to fix boundaries (#906) * Release v1.12 (#907) * Change the sign of the rnnt_loss and add reduction argument (#911) * Add right boundary constrains for s_begin * Minor fixes to the interface of rnnt_loss to make it return positive value * Fix comments * Release a new version * Minor fixes * Minor fixes to the docs * Fix building doc. (#908) * Fix building doc. * Minor fixes. * Minor fixes. * Fix building doc (#912) * Fix building doc * Fix flake8 * Support torch 1.10.x (#914) * Support torch 1.10.x * Fix installing PyTorch. * Update INSTALL.rst (#915) * Update INSTALL.rst Setting a few additional env variables to enable compilation from source *with CUDA GPU computation support enabled* * Fix torch/cuda/python versions in the doc. (#918) * Fix torch/cuda/python versions in the doc. * Minor fixes. * Fix building for CUDA 11.6 (#917) * Fix building for CUDA 11.6 * Minor fixes. * Implement Unstack (#920) * Implement unstack * Remove code does not relate to this PR * Remove for loop on output dim; add Unstack ragged * Add more docs * Fix comments * Fix docs & unit tests * SubsetRagged & PruneRagged (#919) * Extend interface of SubsampleRagged. * Add interface for pruning ragged tensor. * Draft of new RNN-T decoding method * Implements SubsampleRaggedShape * Implements PruneRagged * Rename subsample-> subset * Minor fixes * Fix comments Co-authored-by: Daniel Povey <dpovey@gmail.com> * Add Hash64 (#895) * Add hash64 * Fix tests * Resize hash64 * Fix comments * fix typo * Modified rnnt (#902) * Add modified mutual_information_recursion * Add modified rnnt loss * Using more efficient way to fix boundaries * Fix modified pruned rnnt loss * Fix the s_begin constrains of pruned loss for modified version transducer * Fix Stack (#925) * return the correct layer * unskip the test * Fix 'TypeError' of rnnt_loss_pruned function. (#924) * Fix 'TypeError' of rnnt_loss_simple function. Fix 'TypeError' exception when calling rnnt_loss_simple(..., return_grad=False) at validation steps. * Fix 'MutualInformationRecursionFunction.forward()' return type check error for pytorch < 1.10.x * Modify return type. * Add documents about class MutualInformationRecursionFunction. * Formated code style. * Fix rnnt_loss_smoothed return type. Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> * Support torch 1.11.0 and CUDA 11.5 (#931) * Support torch 1.11.0 and CUDA 11.5 * Implement Rnnt decoding (#926) * first working draft of rnnt decoding * FormatOutput works... * Different num frames for FormatOutput works * Update docs * Fix comments, break advance into several stages, add more docs * Add python wrapper * Add more docs * Minor fixes * Fix comments * fix building docs (#933) * Release v1.14 * Remove unused DiscountedCumSum. (#936) * Fix compiler warnings. (#937) * Fix compiler warnings. * Minor fixes for RNN-T decoding. (#938) * Minor fixes for RNN-T decoding. * Removes arcs with label 0 from the TrivialGraph. (#939) * Implement linear_fsa_with_self_loops. (#940) * Implement linear_fsa_with_self_loops. * Fix the pruning with max-states (#941) Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com> Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com> Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com> Co-authored-by: Daniel Povey <dpovey@gmail.com> Co-authored-by: drawfish <duisheng.chen@gmail.com> Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org> Co-authored-by: Wang, Guanbo <wgb14@outlook.com> * update v2.0-pre (#953) * Update doc URL. (#821) * Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825) * Support index 2-axes RaggedTensor, Support slicing for RaggedTensor * Fix compiling errors * Fix unit test * Change RaggedTensor.data to RaggedTensor.values * Fix style * Add docs * Run nightly-cpu when pushing code to nightly-cpu branch * Prune with max_arcs in IntersectDense (#820) * Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo * Release v1.8 * Create a ragged tensor from a regular tensor. (#827) * Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases. * Trigger GitHub actions manually. (#829) * Run GitHub actions on merging. (#830) * Support printing ragged tensors in a more compact way. (#831) * Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures. * Add levenshtein alignment (#828) * Add levenshtein graph * Contruct k2.RaggedTensor in python part * Fix review comments, return aux_labels in ctc_graph * Fix tests * Fix bug of accessing symbols * Fix bug of accessing symbols * Change argument name, add levenshtein_distance interface * Fix test error, add tests for levenshtein_distance * Fix review comments and add unit test for c++ side * update the interface of levenshtein alignment * Fix review comments * Release v1.9 * Support a[b[i]] where both a and b are ragged tensors. (#833) * Display import error solution message on MacOS (#837) * Fix installation doc. (#841) * Fix installation doc. Remove Windows support. Will fix it later. * Fix style issues. * fix typos in the install instructions (#844) * make cmake adhere to the modernized way of finding packages outside default dirs (#845) * import torch first in the smoke tests to preven SEGFAULT (#846) * Add doc about how to install a CPU version of k2. (#850) * Add doc about how to install a CPU version of k2. * Remove property setter of Fsa.labels * Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life. * Support PyTorch 1.10. (#851) * Fix test cases for k2.union() (#853) * Fix out-of-boundary access (read). (#859) * Update all the example codes in the docs (#861) * Update all the example codes in the docs I have run all the modified codes with the newest version k2. * do some changes * Fix compilation errors with CUB 1.15. (#865) * Update README. (#873) * Update README. * Fix typos. * Fix ctc graph (make aux_labels of final arcs -1) (#877) * Fix LICENSE location to k2 folder (#880) * Release v1.11. (#881) It contains bugfixes. * Update documentation for hash.h (#887) * Update documentation for hash.h * Typo fix * Wrap MonotonicLowerBound (#883) * Wrap MonotonicLowerBound * Add unit tests * Support int64; update documents * Remove extra commas after 'TOPSORTED' properity and fix RaggedTensor constructer parameter 'byte_offset' out-of-range bug. (#892) Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> * Fix small typos (#896) * Fix k2.ragged.create_ragged_shape2 (#901) Before the fix, we have to specify both `row_splits` and `row_ids` while calling `k2.create_ragged_shape2` even if one of them is `None`. After this fix, we only need to specify one of them. * Add rnnt loss (#891) * Add cpp code of mutual information * mutual information working * Add rnnt loss * Add pruned rnnt loss * Minor Fixes * Minor fixes & fix code style * Fix cpp style * Fix code style * Fix s_begin values in padding positions * Fix bugs related to boundary; Fix s_begin padding value; Add more tests * Minor fixes * Fix comments * Add boundary to pruned loss tests * Use more efficient way to fix boundaries (#906) * Release v1.12 (#907) * Change the sign of the rnnt_loss and add reduction argument (#911) * Add right boundary constrains for s_begin * Minor fixes to the interface of rnnt_loss to make it return positive value * Fix comments * Release a new version * Minor fixes * Minor fixes to the docs * Fix building doc. (#908) * Fix building doc. * Minor fixes. * Minor fixes. * Fix building doc (#912) * Fix building doc * Fix flake8 * Support torch 1.10.x (#914) * Support torch 1.10.x * Fix installing PyTorch. * Update INSTALL.rst (#915) * Update INSTALL.rst Setting a few additional env variables to enable compilation from source *with CUDA GPU computation support enabled* * Fix torch/cuda/python versions in the doc. (#918) * Fix torch/cuda/python versions in the doc. * Minor fixes. * Fix building for CUDA 11.6 (#917) * Fix building for CUDA 11.6 * Minor fixes. * Implement Unstack (#920) * Implement unstack * Remove code does not relate to this PR * Remove for loop on output dim; add Unstack ragged * Add more docs * Fix comments * Fix docs & unit tests * SubsetRagged & PruneRagged (#919) * Extend interface of SubsampleRagged. * Add interface for pruning ragged tensor. * Draft of new RNN-T decoding method * Implements SubsampleRaggedShape * Implements PruneRagged * Rename subsample-> subset * Minor fixes * Fix comments Co-authored-by: Daniel Povey <dpovey@gmail.com> * Add Hash64 (#895) * Add hash64 * Fix tests * Resize hash64 * Fix comments * fix typo * Modified rnnt (#902) * Add modified mutual_information_recursion * Add modified rnnt loss * Using more efficient way to fix boundaries * Fix modified pruned rnnt loss * Fix the s_begin constrains of pruned loss for modified version transducer * Fix Stack (#925) * return the correct layer * unskip the test * Fix 'TypeError' of rnnt_loss_pruned function. (#924) * Fix 'TypeError' of rnnt_loss_simple function. Fix 'TypeError' exception when calling rnnt_loss_simple(..., return_grad=False) at validation steps. * Fix 'MutualInformationRecursionFunction.forward()' return type check error for pytorch < 1.10.x * Modify return type. * Add documents about class MutualInformationRecursionFunction. * Formated code style. * Fix rnnt_loss_smoothed return type. Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> * Support torch 1.11.0 and CUDA 11.5 (#931) * Support torch 1.11.0 and CUDA 11.5 * Implement Rnnt decoding (#926) * first working draft of rnnt decoding * FormatOutput works... * Different num frames for FormatOutput works * Update docs * Fix comments, break advance into several stages, add more docs * Add python wrapper * Add more docs * Minor fixes * Fix comments * fix building docs (#933) * Release v1.14 * Remove unused DiscountedCumSum. (#936) * Fix compiler warnings. (#937) * Fix compiler warnings. * Minor fixes for RNN-T decoding. (#938) * Minor fixes for RNN-T decoding. * Removes arcs with label 0 from the TrivialGraph. (#939) * Implement linear_fsa_with_self_loops. (#940) * Implement linear_fsa_with_self_loops. * Fix the pruning with max-states (#941) * Rnnt allow different encoder/decoder dims (#945) * Allow different encoder and decoder dim in rnnt_pruning * Bug fixes * Supporting building k2 on Windows (#946) * Fix nightly windows CPU build (#948) * Fix nightly building k2 for windows. * Run nightly build only if there are new commits. * Check the versions of PyTorch and CUDA at the import time. (#949) * Check the versions of PyTorch and CUDA at the import time. * More straightforward message when CUDA support is missing (#950) * Implement ArrayOfRagged (#927) * Implement ArrayOfRagged * Fix issues and pass tests * fix style * change few statements of functions and move the definiation of template Array1OfRagged to header file * add offsets test code * Fix precision (#951) * Fix precision * Using different pow version for windows and *nix * Use int64_t pow * Minor fixes Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com> Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com> Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com> Co-authored-by: Daniel Povey <dpovey@gmail.com> Co-authored-by: drawfish <duisheng.chen@gmail.com> Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org> Co-authored-by: Wang, Guanbo <wgb14@outlook.com> Co-authored-by: Nickolay V. Shmyrev <nshmyrev@gmail.com> Co-authored-by: LvHang <hanglyu1991@gmail.com> * Add C++ Rnnt demo (#947) * rnnt_demo compiles * Change graph in RnntDecodingStream from shared_ptr to const reference * Change out_map from Array1 to Ragged * Add rnnt demo * Minor fixes * Add more docs * Support log_add when getting best path * Port kaldi::ParseOptions for parsing commandline options. (#974) * Port kaldi::ParseOptions for parsing commandline options. * Add more tests. * More tests. * Greedy search and modified beam search for pruned stateless RNN-T. (#975) * First version of greedy search. * WIP: Implement modified beam search and greedy search for pruned RNN-T. * Implement modified beam search. * Fix compiler warnings * Fix style issues * Update torch_api.h to include APIs for CTC decoding Co-authored-by: Wei Kang <wkang@pku.org.cn> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: Jan "yenda" Trmal <jtrmal@gmail.com> Co-authored-by: pingfengluo <pingfengluo@gmail.com> Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com> Co-authored-by: Ludwig Kürzinger <lumaku@users.noreply.github.com> Co-authored-by: Daniel Povey <dpovey@gmail.com> Co-authored-by: drawfish <duisheng.chen@gmail.com> Co-authored-by: gzchenduisheng <gzchenduisheng@corp.netease.com> Co-authored-by: alexei-v-ivanov <alexei_v_ivanov@ieee.org> Co-authored-by: Wang, Guanbo <wgb14@outlook.com> Co-authored-by: Nickolay V. Shmyrev <nshmyrev@gmail.com> Co-authored-by: LvHang <hanglyu1991@gmail.com>

csukuangfj added 15 commits August 25, 2021 22:27

Make k2 ragged tensor more PyTorch-y like.

f48563b

Refactoring: Start to add the wrapper class AnyTensor.

06a6d20

Refactoring.

2a70298

initial attempt to support autograd.

6bc05bf

First working version with autograd for Sum().

c7bb9d5

Fix comments.

d569b42

Support __getitem__ and pickling.

dcea808

Add more docs for k2.ragged.Tensor

cb4f00f

Put documentation in header files.

1b5c015

Minor fixes.

a8d4a8e

Fix a typo.

1f78c93

Fix an error.

892fb04

Add more doc.

fb96d97

Wrap RaggedShape.

2f01361

[Not for Merge]: Move k2.Fsa related code to C++.

626cc7a

csukuangfj commented Aug 29, 2021

View reviewed changes

danpovey reviewed Aug 30, 2021

View reviewed changes

csukuangfj commented Aug 30, 2021

View reviewed changes

Merge remote-tracking branch 'dan/master' into fsa

0e60a69

csukuangfj changed the base branch from master to v2.0-pre September 8, 2021 09:06

csukuangfj changed the title ~~[Not for merge]: Move k2.Fsa to C++~~ [WIP]: Move k2.Fsa to C++ Sep 8, 2021

csukuangfj added 2 commits September 8, 2021 17:18

Remove extra files.

f11947e

Update doc URL. (k2-fsa#821)

9ac1e78

Merge remote-tracking branch 'dan/master' into fsa

44ff35b

Support manipulating attributes of k2.ragged.Fsa.

1dc7e1e

csukuangfj commented Sep 12, 2021

View reviewed changes

pkufool and others added 7 commits September 14, 2021 12:18

Prune with max_arcs in IntersectDense (k2-fsa#820)

2c28070

* Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo

Release v1.8

210175c

Create a ragged tensor from a regular tensor. (k2-fsa#827)

33a212c

* Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases.

Trigger GitHub actions manually. (k2-fsa#829)

971af7d

Run GitHub actions on merging. (k2-fsa#830)

646704e

Support printing ragged tensors in a more compact way. (k2-fsa#831)

8030001

* Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures.

csukuangfj and others added 6 commits September 18, 2021 17:10

Merge remote-tracking branch 'dan/master' into fsa

7029b1f

Release v1.9

f2fd997

Add Fsa.get_forward_scores.

b2cb9c0

Merge remote-tracking branch 'dan/master' into fsa

13408aa

Implement backprop for Fsa.get_forward_scores()

cbff6a1

csukuangfj commented Sep 19, 2021

View reviewed changes

csukuangfj merged commit 08198a9 into k2-fsa:v2.0-pre Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]: Move k2.Fsa to C++ #814

[WIP]: Move k2.Fsa to C++ #814

csukuangfj commented Aug 29, 2021

csukuangfj Aug 29, 2021

danpovey Aug 30, 2021

danpovey Aug 30, 2021

csukuangfj Aug 30, 2021

danpovey Aug 30, 2021

danpovey Aug 30, 2021

csukuangfj Aug 30, 2021

csukuangfj Aug 30, 2021

danpovey Aug 30, 2021

csukuangfj Aug 30, 2021

csukuangfj Aug 30, 2021

pkufool Aug 30, 2021

csukuangfj Aug 30, 2021

csukuangfj commented Sep 8, 2021

danpovey commented Sep 8, 2021

pkufool commented Sep 9, 2021

csukuangfj commented Sep 9, 2021

pkufool commented Sep 9, 2021

csukuangfj Sep 12, 2021

csukuangfj commented Sep 17, 2021 •

edited

Loading

pkufool commented Sep 18, 2021

csukuangfj Sep 19, 2021

csukuangfj commented Sep 30, 2021

[WIP]: Move k2.Fsa to C++ #814

[WIP]: Move k2.Fsa to C++ #814

Conversation

csukuangfj commented Aug 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented Sep 8, 2021

danpovey commented Sep 8, 2021

pkufool commented Sep 9, 2021

csukuangfj commented Sep 9, 2021

pkufool commented Sep 9, 2021

Choose a reason for hiding this comment

csukuangfj commented Sep 17, 2021 • edited Loading

pkufool commented Sep 18, 2021

Choose a reason for hiding this comment

csukuangfj commented Sep 30, 2021

csukuangfj commented Sep 17, 2021 •

edited

Loading