New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding words or converting .wav file #4
Comments
Adding words is tricky. The examples (with the exception of Tom's Diner) are extracted from commercial ROMs mastered by the Texas Instruments mastering hardware, for the TMS5220 family of chips. I'm not aware of any open source encoders - just some old hacked DOS versions that I've never managed to get running. (If you find other ROMs, the speech strings should work as is. If not, try bit reversing each byte, as there is no standard way of mapping bits to bytes on those ROMs) The encoder section of the repo contains the code I used to encode the Toms Diner example. I'm not happy with it - the pitch detection makes some bad mistakes, often off by an octave - but its the best I could come up with at the time. Get Freemat running, and run the 'romgen' file. Any 8kHz 16bit mono WAV should work the same way. It looks like LPC-10 uses a very similar algorithm, with different coefficient encoding and bit mapping. That is publicly available, with source - but the official distribution is machine translated from FORTRAN and it shows. Not ideal, but probably enough for a DSP coder to produce something usable. In fact most LPC 10 pole 8kHz codecs could be ported to Talkie, given the time and expertise. The best source for the coefficient mapping on the TMS5220 is probably the emulator used by the MAME project, which was the inspiration for Talkie. There is another option - the TI-99/4A cartridge 'Terminal Emulator II' includes an English -> Phoneme -> TMS5220 bitstream mapper that uses a set of rules to pronounce any word. Unfortunately its written in Graphics Programming Language, which is poorly documented and has no good reverse engineering tools. Those rules could be ported to Arduino, and would give an infinite vocabulary synthesiser in about 8Kbytes - but it is a lot of work. Neither of those are trivial jobs, but they could be achieved. However, you might find bulk memory (eg. an SD card) and a WAV file might be more appropriate to your purposes. Adafruit's Wave Shield is a good start (although I prefer to use on-chip PWM instead of an external ADC). |
Hello Peter, Edit: Thanks for your time, |
Hello, Thanks |
Having a look through my old code, 'calc' is what you should be running. That generate tomsDinerStream.csv. As the csv file contains LPC coefficient parameters, having the file isn't much use unless you're trying to generate a Toms Diner bitstream. Its clear that encoding is a problem, and the current encoder really isn't in a friendly usable state. I'll see what I can do to rectify that in the near future. For now, you might have more luck using QBOX Pro ( http://www.ninerpedia.org/index.php/Development_resources ). I couldn't get it working here, but it does contain proper compression algorithms designed by same team that designed the original speech chips, so it could well be worth the effort trying. Tell it to encode for the TMS5220. You might need to reverse the bit order in each byte of the output. If you have success with QBOX Pro, please report back so others can save time. |
Hello Peter, |
Great news! You're most of the way there. The long binary string is what needs to be in the word dictionary: uint8_t spCUSTOM[] PROGMEM = {
// 11100001 10100111 01110011 01101110 11001001 00110 ...
0xe1,0xa7,0x73,0x6e,0xc9, ...
} Or it might be reverse bit order where the LSB is read first, in which case: uint8_t spCUSTOM[] PROGMEM = {
// 11100001 10100111 01110011 01101110 11001001 00110 ...
0x87,0xe5,0xce,0x76,0x93, ...
} A quick test should make it obvious which is which. You can pad out the bits of the final byte with '0's or '1's - it doesn't matter. (When I was coding Talkie, the available speech ROMs were inconsistent in bit order. Thats the reason for the 'rev' function in the Talkie code. I can't remember which order it is now. Note to self - document better!) I have a horrible feeling that I generated the TomsDiner script by running that binary string through a series of reg-ex search and replaces until it was a C formatted hex array. It would have been much more productive to write a quick python script. Is that enough information for you to throw together a script at your end, to get your example working? By the way, the last bit is telling you how many speech frames there are. 55 Voiced, 0 Unvoiced (there should probably be more than that - the encoder hasn't identified any 's', 'th', 'f', 'p' or 'b' sounds), 2 Repeated frames and 2 Silent frames. |
As a point of information, there is also this speech code for the AVR: It has amazingly compact text to speech using phoneme codes. --- bill |
Interesting. Cheers Bill! My, that package looks extremely like "Speech!" by Superior Software for the BBC Micro. I wonder if the author had access to the source code. |
I was able to get QboxPro to generate an LPC file, but it's a heck of a messy toolchain. Here's some of my notes, in case anyone finds it useful. QBoxPro is a 16bit win3.1 application, so it will not run under modern OS (such as win7 x64 bit). I created a virtual XP 32bit machine and installed it there. The QBoxPro zip file that's floating around doesn't work unless you unzip it to c:\QBOX and copy the ini file to c:\windows. More info on getting it running, including a link to the zip file, can be found on this thread: http://atariage.com/forums/topic/218272-trying-to-generate-some-ti99-speech/page-2 Once it's running you need to process your WAV files. This link walks you through that procedure: http://furrtek.free.fr/index.php?a=speakandspell&ss=9&i=2 Start on step 3 (you shouldn't need to use DOSBOX or edit the ini file) and ignore the thing on step 6 about coefficient needing to be 10. I was able to preview (listen to) the original file, but not the compressed file in QBoxPro. After that, you want to take the .bin file, dump it out as hex, and import it into your words. |
Hi Peter I am trying desperately to make the the array but can't make it thanks a lot |
Hi deladriere & going-digital, With Freemat; I've achieved to create 10101010101010 (like deladrier) Below small part of song encoded. I tested, sounds well! ;-) @deladriere if you want, send to me your binary, I convert for you. @going-digital Peter, thanks for this interesting library
|
@trevor-sonic : Well done ! |
Hi @deladriere , |
This method of creating the data arrays is way to messy in my opinion. --- bill |
@bperrybap You're right, it would be nice to have an application which do everything in one. The heart of hex creator is:
I hope this helps. |
@bperrybap can't wait to test your solution ! I am already thinking of a batch process ;-) |
what would be really cool is to use Sox to convert audio to LPC10 and then calculate all the parameters directly |
hi can any one tell me step by step how to make my own wrds from WAV,? i dont know how to get binary frome wav file,, |
I know it is old thread by I was messing with the QBOX and it does works well (under Windows XP mode but you have to disable 80×87 emulator or it will crash in WIN87EM.DLL) except there are various parameters to set and for some reason the suzane vega I converted play slower...and has nice robotic voice that ends in a ghost voice :-). This is really ancient technology, but actually works pretty well on arduino.
|
which function is being used for print binary numbers before frame? I got an output of Frames and RomSize. But, not getting the binary series before Frames:. |
Hi there, just a shameless plug: To create bitstreams you can either use BlueWizard (https://github.com/patrick99e99/BlueWizard) or my command line port Python Wizard (https://github.com/ptwz/python_wizard). |
@ptwz: I've successfully compiled and installed Python Wizard on my laptop running Ubuntu Xenial and Spyder 2.7 and have tried to encode a wav file through the command line interface using default settings with no additional options; however I get the following error message relating to the command: "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" I think the assert(max(array)) function may be deprecated in more recent releases of Python/NumPy and wonder if this is what is causing the problem; and if there is a workaround. I'm very keen to get PythonWizard working and would be very grateful for your expertise and help Full error message attached below. Dave |
@Dave67 Hope I could help, |
Many thanks for your help and for spotting the stereo file! I'm currently using the default settings on python_wizard. Do you have any tips for how to set the option parameters to values which will:
Best wishes Dave const unsigned char [] PROGMEM = {0x88,0xCE,0xAA,0xB4,0xAC,0x1C,0x12,0xA6,0x59,0x89,0xE2,0x8C,0x49,0xD8,0xAA,0xC2,0xD2,0xC2,0x16,0x23,0x86,0x09,0x49,0xB5,0xD8,0x02,0x98,0x3A,0x54,0x5C,0x53,0x1B,0x1B,0x87,0x23,0xF1,0x0C,0x2E,0xEC,0xE2,0xB2,0x94,0xD5,0x28,0xB3,0xB2,0x3B,0x2E,0xC0,0x54,0x2E,0x27,0xAB,0xDA,0x62,0x34,0x6A,0x9F,0xAC,0x39,0x73,0xF3,0xAA,0xBC,0xB2,0x16,0xC4,0x24,0x3B,0x4A,0x49,0xB3,0x92,0x4C,0x4D,0x26,0x21,0x0D,0x9A,0x2B,0xB5,0x18,0x9B,0x2C,0x28,0xC9,0x92,0x94,0x2D,0xC4,0xA4,0x5D,0xCB,0xD2,0x76,0x03,0xCE,0xE9,0x48,0xC0,0x19,0xC1,0x09,0x38,0x7B,0x2C,0x01,0xFB,0x8D,0x97,0x77,0x78,0x26,0xA9,0x9A,0x35,0xDA,0x9E,0x98,0xB4,0xF2,0xF6,0xE8,0x7B,0x26,0xD4,0x8A,0x47,0x6D,0xA8,0x9E,0xD8,0xCB,0x1F,0xA7,0x21,0x29,0xD1,0x6C,0x5B,0xA4,0x9A,0xA4,0x58,0xAA,0xB4,0x11,0x1B,0x8B,0x16,0xCA,0xF0,0x94,0xA4,0xAC,0xCA,0x71,0xCC,0x5D,0x86,0xB6,0xEB,0x32,0xF5,0x70,0x92,0xFC,0x9E,0x43,0x58,0x35,0x4E,0x58,0x46,0x14,0x22,0xAB,0x5A,0xA5,0x1F,0x8A,0x58,0x7A,0x2A,0x95,0x71,0x48,0x66,0x9E,0xA9,0x94,0xA6,0xCE,0x59,0x74,0xD7,0x92,0x99,0x0A,0x11,0xED,0x19,0xCA,0xCC,0xCF,0x20,0xBC,0x3C,0x59,0x3B,0x60,0xC6,0xB4,0x04,0x5C,0x13,0x16,0x80,0xEF,0xD6,0x12,0xF0,0x65,0x71,0x9B,0xB3,0x91,0x4A,0xDB,0x58,0x63,0x29,0x46,0x2A,0x6D,0x1A,0xB5,0x25,0x1B,0xC9,0xF0,0xAE,0x15,0xD6,0x14,0x34,0x3C,0xD6,0x94,0xC9,0x4A,0x70,0x57,0x5B,0xDB,0x2A,0x2B,0x3E,0x5C,0x72,0xEC,0x08,0xB6,0xF8,0x50,0xF5,0x56,0x4D,0xD8,0x1A,0x4A,0xCC,0x52,0xF5,0x03}; |
@ptwz how can we input our wav file in the python wizard |
When using using the command line and you want to convert the file called MYFILE.wav, use: |
Ok ... Thanks |
@ptwz sorry for disturbing but the command is still not working ... could you please guide me through it .. I opened the command line and entered the given command but it did not work ... any help would be highly appreciated 😃 |
@ptwz First of all thank you for writing the python code for the conversion of wav files into the bit stream that would be used for the Talkie TTS library. Unfortunately I met some error while running it. Here are the things I did before running it. Installed the following: I tried using your python code by using the line below on the windos 10 command line python_wizard -f arduino testing.wav > testing.h where testing.wav is my wav file and testing.h is the file that will contain the bitstreams (I think :) ). I got an error message it is of the form File "python_wizard", line 64 Is there something wrong that I did or maybe it is a line in the python code that is no longer compatible with the current version? Best Regards and Thank You :) Johnson |
Hi Johnson, Best regards, |
@heartlesspra use Audacity or SoX |
hello, i downloaded python 3.7 and used the command mentioned (python_wizard -f arduino MYFILE.wav > MYFILE.h) but it keeps telling me syntax error, so can you please tell me specific steps to follow? |
Well, please give me some more insight to help you out:
|
Okay thank you for your help. |
Hello,
First, This is an excellent library, very useful. Thanks!!!
This is really not an issue more of a question.
But I don't no how to contact you otherwise.
How do I add/create my own words, or convert a .wav file like what was done for Tom's Dinner?
Is there a tutorial on that?
I need the word "humidity" in a male voice.
Thanks
Elac
The text was updated successfully, but these errors were encountered: