Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build ctc graph from symbols in batch mode #776

Merged
merged 7 commits into from
Jul 8, 2021

Conversation

pkufool
Copy link
Collaborator

@pkufool pkufool commented Jul 7, 2021

See k2-fsa/snowfall#220

The python api and results are as follows:

image

@pzelasko
Copy link
Contributor

pzelasko commented Jul 7, 2021

Maybe a naive question -- what is the difference between this implementation, and doing it via Python-level k2 operations? I remember there was some code from @csukuangfj for MMI that batched all the FSA ops for a given list of utterances. Just curious what is the gain.

k2/csrc/fsa_algo.cu Outdated Show resolved Hide resolved
k2/python/k2/__init__.py Outdated Show resolved Hide resolved
k2/python/k2/fsa_algo.py Show resolved Hide resolved
k2/python/k2/fsa_algo.py Outdated Show resolved Hide resolved
k2/python/k2/fsa_algo.py Outdated Show resolved Hide resolved
k2/python/tests/ctc_graph_test.py Outdated Show resolved Hide resolved
k2/python/tests/ctc_graph_test.py Show resolved Hide resolved
k2/python/tests/ctc_graph_test.py Outdated Show resolved Hide resolved
@csukuangfj
Copy link
Collaborator

csukuangfj commented Jul 7, 2021

Maybe a naive question -- what is the difference between this implementation, and doing it via Python-level k2 operations? I remember there was some code from @csukuangfj for MMI that batched all the FSA ops for a given list of utterances. Just curious what is the gain.

@pzelasko
There are several differences that I can think of (there may be more):

(1) There are no optional silences before and after each word in this pull-request.
(2) This pull-request constructs the resulting graph directly, instead of creating it from k2.intersect. It is presumably faster
as it involves fewer kernel calls, though it needs benchmarks.
(3) For CTC training, we don't need to construct the graph ctc_topo anymore with this pull-request.
(4) The labels in this pull-request are input tokens, and its aux_labels are also tokens/epsilons. While in snowfall, the labels are phones and aux_labels are words/epsilons.
(5) In snowfall, it supports words with multiple pronunciations since there is a lexicon, while this pull-request
supports only one.
(6) In snowfall, it supports both MMI and CTC. This pull-request supports only CTC training since there is no bigram P.

Create an FsaVec containing ctc graph FSAs, given a list of sequences of
symbols

@param [in] symbols Input symbol sequences (must not contain
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a check inside the kernel that none of the input symbols is -1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a checking in the kernel set_num_arcs, which will enumerate all the symbols. I think it's enough.

sym_state_idx01 = state_idx01 / 2 - fsa_idx0,
remainder = state_idx01 % 2,
current_num_arcs = 2; // normally there are two arcs, self-loop
// and arc points to the next state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// and arc points to the next state
// and arc pointing to the next state

} else {
int32_t current_symbol = symbol_data[sym_state_idx01],
// we set the next symbol of the last symbol to -1, so
// the following if clause will always be true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the comment:

so the following if clause will always be true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that the last symbol state would always have 3 arcs. we handle next_symbol here, first, to avoid segment fault error, second, to confirm that current_symbol != next_symbol so we will assign the last symbol state 3 arcs.

will explain more in the docs.

});

ExclusiveSum(num_states_for, &num_states_for);
Array1<int32_t> &fsa_to_states_row_splits = num_states_for;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to introduce another name for num_states_for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing ExclusiveSum, num_states_for is actually the row_splits, using different names here just for easy understanding, we'll use fsa_to_states_row_splits to construct ragged_shape below.

if (need_arc_map) tensor = ToTorch(arc_map);
return std::make_pair(graph, tensor);
},
py::arg("labels"), py::arg("need_arc_map") = true, py::arg("gpu_id"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
py::arg("labels"), py::arg("need_arc_map") = true, py::arg("gpu_id"));
py::arg("symbols"), py::arg("need_arc_map") = true, py::arg("gpu_id"));

m.def(
"ctc_graph",
[](const Ragged<int32_t> &symbols, bool need_arc_map = true,
int32_t /*unused_gpu_id*/)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last argument has no default value but it is after an argument with default value.
It's not valid C++, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the error in actions, it could compile successfully on linux, will fix it.

@danpovey
Copy link
Collaborator

danpovey commented Jul 8, 2021

I think there should be a boolean option to specify the type of CTC topology: "standard" or "simplified", where the "standard" one makes the blank mandatory between a pair of identical symbols.

@pkufool
Copy link
Collaborator Author

pkufool commented Jul 8, 2021

I think there should be a boolean option to specify the type of CTC topology: "standard" or "simplified", where the "standard" one makes the blank mandatory between a pair of identical symbols.

Ok, will add the option.

@pkufool
Copy link
Collaborator Author

pkufool commented Jul 8, 2021

Add the option standard, default True, the standard one makes the blank mandatory between a pair of identical symbols.

An example to demonstrate their difference is as follow:

image

@danpovey
Copy link
Collaborator

danpovey commented Jul 8, 2021 via email

k2/csrc/fsa_algo.cu Outdated Show resolved Hide resolved
// There is no arcs for final states
if (sym_state_idx01 == sym_final_state) {
current_num_arcs = 0;
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {
} else if(!standard) {
current_num_arcs = 3;
} else {
// same as before the latest change
}

For non-standard topo, current_num_arcs is always 3. Put it into
a separate if statement can save some work.

k2/python/k2/fsa_algo.py Outdated Show resolved Hide resolved
k2/csrc/fsa_algo.cu Outdated Show resolved Hide resolved
k2/csrc/fsa_algo.cu Outdated Show resolved Hide resolved
pkufool and others added 2 commits July 9, 2021 06:24
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
@csukuangfj csukuangfj merged commit fd59f07 into k2-fsa:master Jul 8, 2021
@pkufool pkufool deleted the ctc_graph branch July 8, 2021 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants