Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Move connect topsort addepsilonselfloops to C++ #838

Closed
wants to merge 5 commits into from

Conversation

pkufool
Copy link
Collaborator

@pkufool pkufool commented Oct 1, 2021

  • Move connect to C++
  • Move top_sort to C++
  • Move add_epsilon_self_loops to C++
  • Refine CreateFsaVec
  • Refine ToString

@pkufool pkufool changed the title Move connect topsort addepsilonandselfloops to C++ [WIP] Move connect topsort addepsilonandselfloops to C++ Oct 1, 2021
@pkufool pkufool changed the title [WIP] Move connect topsort addepsilonandselfloops to C++ [WIP] Move connect topsort addepsilonselfloops to C++ Oct 2, 2021
@pkufool pkufool changed the title [WIP] Move connect topsort addepsilonselfloops to C++ Move connect topsort addepsilonselfloops to C++ Oct 4, 2021
@pkufool
Copy link
Collaborator Author

pkufool commented Oct 4, 2021

Ready for reviewing, will submit other code to another PR to avoid making a too big pull request.

@pkufool pkufool mentioned this pull request Oct 28, 2021
5 tasks
@pkufool pkufool changed the title Move connect topsort addepsilonselfloops to C++ [WIP] Move connect topsort addepsilonselfloops to C++ Oct 28, 2021
@danpovey
Copy link
Collaborator

I have a general concern with this direction that it is making things less easy to understand.
I know I have mentioned this possibility before, but I am thinking we could have a "fake" Python
version of the library that is provided for documentation reasons, with implementations that just say "pass",
that indicates to users what the library does. Then if someone does, for instance, "git grep 'def epsilon_top_sort'" or something like that, they would find the fake library. I have a hard time with the discoverability of documentation right now. Any comments?

@csukuangfj
Copy link
Collaborator

After writing #839, i.e, to run a pre-trained model in C++, I realized we actually don't
need to move everything to C++ in order to deploy a trained model purely in C++.

There are the pros and cons I can think of about moving everything to C++.

Pros

  • We only need to maintain one version as the training and decoding share the same kind of code
  • It will probably be faster for the training as everything is in C++ (but this needs some benchmarks)

Cons

  • It makes the code less readable and understandable.
  • It probably affects the decoding speed since the code is written for both training and decoding, not specific to decoding.
  • It raises the bar for contribution. Contributors have to have a good command of C++.

But @pkufool has started the work and put some effort in this way.

@pkufool
Copy link
Collaborator Author

pkufool commented Oct 29, 2021

I think the key point is how we deal with Fsa class, we need Fsa class to propagate attributes for both training and decoding. Can we keep two versions of Fsa class, python for training and C++ for decoding. Obviously, the C++ version will be less complicated, we don't need to implement autograd thing.
As for the effort I put in current direction, it doesn't matter for me, I hope we can find the best solution for this issue.

@danpovey
Copy link
Collaborator

I agree that having 2 versions of the code (python and C++) is probably the best solution for now.
In any case, we now have more experience with the C++ side, so if in future we need to revisit this we will know more about how to do it.

@pkufool
Copy link
Collaborator Author

pkufool commented Nov 2, 2021

See the comments above, closing this pull request.

@pkufool pkufool closed this Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants