MAGE Roadmap

kyojindo edited this page Dec 19, 2012 · 1 revision

Let us start with a gathering of possible evolutions of MAGE:

  • The handling of the matrix operations in the HTS code is done quite by hand. Libraries like CBLAS, LAPACK or Armadillo (just a C++ wrapper of LAPACK) give a handy API over SSE-optimized linear algebra, including matrix operations. Using such a lib in the parameter generation algo might free quite a lot of CPU.

  • We discussed about aligning MAGE loading of a voice on how it works for video players: you load the voice from its path, and then in the background the underlying trajectory and synthesis engines that are needed to properly load that voice are evaluated and - if there - automatically loaded. If not: missing engine !

  • MAGE tends to be a shell around specific implementations for trajectory generation and synthesis. The question of API flexibility (adapt to any new engine) vs. usability (understanding how to us it easily) needs to be addressed. A set( "pitch", "overwrite", 354.4f ) is flexible but overwritePitch( 354.4f ) is more intuitive. Problem: intuitive calls might become engine-specific. Idea: having anyway quite a low-level engine access with string keys and values, but offering the developers of engine the ability to extend the MAGE shell API with easy-to-use methods, eventually forcing the overloading of virtual methods within the MAGE shell to maintain some consistency between the different engine, e.g. setPitch() always called setPitch().

  • The two above-mentioned items implicitly means engines become a plugin of the MAGE shell, meaning MAGE and engines could be assembled for creating a new project as binaries, and eventually swapped (if compatible of course) without recompiling (such as updating a codec does not require to recompile the big media play app.

  • The big fluffy un-speech-ization idea: cf. eNTERFACE 2013 project. Fully move towards config files and parsers that define the whole topology of the system: number of streams, their names, model/frame formats, use of GV or not, etc. This goes with the whole "get rid of that Constants.h file" trend.

  • We talked about a MAGE-wide logging mechanism, so to be able to probe any part of the software.

  • Implement an underlying scheduler, so to be able to take decisions based on timings and stage of progress of various tasks such as loading of a new voice, progression in the computation of trajectory, etc. And in the case of a runtime issue, being able to trig a special callback to take a decision before time's out.

  • PSOLA vocoder: yummy !

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.