Sinewave Speech

A Python implementation of sinewave speech. Converts WAV files of human-speech to sinewave speech using linear predictive coding (LPC). This is a "simplified" representation of the speech with a small number of frequency and amplitude modulated sine waves. It is surprising how much remains intelligible after this transformation.

Listen to some sounds. If you've not listened to these before, listen to the sinewave version first!

Ex. 1 (sinewave) | Ex. 1 (original) | -d 4 --high 300 --low 100 -o 4
Ex. 2 (sinewave) | Ex. 2 (original) | -o 5 --window 200 --low 200
Ex. 3 (sinewave) | Ex. 3 (original) | -o 4
Ex. 4 (sinewave) | Ex. 4 (original) | -o 5 --high 2800 -d 8 --window 250
Ex. 5 (sinewave) | Ex. 5 (original) | -d 12 --high 2000 --window 90
Ex. 6 (sinewave) | Ex. 6 (original) | -d 8 --high 2500 --low 330 --window 90
Ex. 7 (buzz) | Ex. 7 (original) | --buzz 80 --window 300 -d 8 --high 2000
Ex. 8 (noise) | Ex. 8 (original) | --noise --low 200
Ex. 9 (sinewave) | Ex. 9 (original) | -d 2 --high 2800 --window 1500 -o 13

(these would probably be better if I spoke more clearly...)

Usage

Requires scipy and numpy only.

Examples of use:

    python sws.py hello.wav    # converts hello.wav to hello_sws.wav

    # explicit output name
    python sws.py lpc.wav lpc_modified.wav

More examples:

    # uses six sine wave components (order 6)
    python sws.py hello.wav -o 6 

    # sets the pre-filtering bandpass to be [40, 4000]Hz, and decimates 
    # by a factor of 4 before resynthesizing
    python sws.py hello.wav  --low 40 --high 4000 -d 4

    # uses modulation of a buzz (pulse train) @ 100Hz instead of sines
    python sws.py hello.wav --buzz 100

    # uses modulation of white noise instead of sinewaves
    python sws.py hello.wav --noise

Command line parameters

    usage: sws.py [-h] [--low LOW] [--high HIGH] [--order ORDER] [--bw_amp BW_AMP]
            [--decimate DECIMATE] [--window WINDOW] [--interpolate] [--sine]
            [--buzz BUZZ] [--noise] [--overlap OVERLAP]
            input_wav [output_wav]

    positional arguments:
    input_wav             The input file, as a WAV file; ideally 44.1KHz mono.
    output_wav            The output file to write to; defaults to
                            <input>_sws.wav

    optional arguments:
    -h, --help            show this help message and exit
    --low LOW             Lowpass filter cutoff
    --high HIGH           Highpass filter cutoff
    --order ORDER, -o ORDER
                            Number of components in synthesis
    --bw_amp BW_AMP       Amplitude scaling by bandwidth; larger values flatten
                            amplitude; smaller values emphasise stronger formants
    --decimate DECIMATE, -d DECIMATE
                            Sample rate decimation before analysis
    --window WINDOW, -w WINDOW
                            LPC window size; smaller means faster changing signal;
                            larger is smoother
    --interpolate, -i     Enable interpolation
    --sine, -s            Resynthesise using sinewave speech (default)
    --buzz BUZZ, -b BUZZ  Resynthesie using buzz at given frequency (Hz)
    --noise, -n           Resynthesize using filtered white noise
    --overlap OVERLAP, -l OVERLAP
                            Window overlap, as fraction of the window length

Technical details

Use 16 bit, 44.1KHz mono WAV files as input for best results.

Transformation steps

Input amplitude normalised
Bandpass filtered to [low, high] (order 4 Butterworth filter, non-causal)
Pre-emphasis applied to slightly emphasise higher frequencies
Decimated by a factor d
Windowed into chunks of length window, overlapping by default by window/2
Autocorrelation of signal computed
RMS power of each window computed
LPC computed from autocorrelation using Levinson-Durbin
LPC converted to line spectral pairs (LSP)
LSP converted to order (frequency, bandwidth) bands
Window by window, each band resynthesied using sinewave oscillators , weighting amplitude by inverse (exponential) bandwidth
Windows summed w/Hann window applied
Overall amplitude envelope applied from estimated RMS power of each chunk
Up-sampled by a factor of d and amplitude normalised

This implementation uses LPC estimation to estimate the formant centres using line spectral pairs to estimate formant frequencies and bandwidths. Amplitude of sinusoidal components is inversely proportional to bandwidth when resynthesis is performed, so noiser tracks become quieter.

Suggestions (especially pull requests!) on how to improve the quality of the output would be most welcome.

Acknowledgements

Levinson-Durbin iteration by David Cournapeau MIT Licensed
ModFM synthesis algorithm used in buzz mode from Lazzarini, V., & Timoney, J. (2010). New perspectives on distortion synthesis for virtual analog oscillators. Computer Music Journal, 34(1), 28-40.
The example sentences are read from this list, originally from IEEE Subcommittee on Subjective Measurements IEEE Recommended Practices for Speech Quality Measurements. IEEE Transactions on Audio and Electroacoustics. vol 17, 227-46, 1969

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.vscode		.vscode
imgs		imgs
sounds		sounds
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
make_demos.py		make_demos.py
requirements.txt		requirements.txt
sws.py		sws.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sinewave Speech

Usage

Command line parameters

Technical details

Transformation steps

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

johnhw/sinewave_speech

Folders and files

Latest commit

History

Repository files navigation

Sinewave Speech

Usage

Command line parameters

Technical details

Transformation steps

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages