Browser-Based AI Text-to-Song Generator
Transform text and lyrics into music โ entirely in your browser, no server required.
SonicAI is a real-time text-to-song generation system that runs entirely client-side in the browser. It converts lyrics into musical compositions using the Web Audio API for instrument synthesis and formant vocal modeling for singing voices, paired with the browser's native Speech Synthesis API for lyric vocalization.
Zero dependencies. Single HTML file. Instant music.
| Feature | Description |
|---|---|
| ๐น 6 Genres | Pop, Rock, Jazz, Electronic, Classical, Lo-fi โ each with unique scales, chords, and drum patterns |
| ๐ค Dual Vocal Engine | Formant synthesizer for melodic "aah/ooh" vocals + Speech Synthesis for lyric articulation |
| ๐ผ Text-to-Melody | Character-to-note mapping algorithm that generates scale-appropriate melodies from any text |
| ๐ Live Visualizer | Real-time 80-bar frequency spectrum analyzer with genre-colored gradients |
| ๐ฎ๐ฉ Bilingual Presets | 5 original Indonesian songs with English translations |
| ๐๏ธ Full Controls | Genre, key, tempo (60โ180 BPM), volume, play/pause/stop |
| ๐ฑ Responsive | Works on desktop and mobile browsers |
| โก Zero Dependencies | Single index.html file โ no npm, no build step, no server |
Visit sonic-ai-dun.vercel.app โ no installation needed.
git clone https://github.com/romizone/sonic-ai.git
cd sonic-ai
open index.htmlpython3 -m http.server 3000
# Open http://localhost:3000โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Input Text โโโโโโถโ Text-to-Melody โโโโโโถโ Melody โโโโ
โ / Lyrics โ โ (char โ note) โ โ Oscillator โ โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโ
โ Genre Config โโโโโโถโ Chord Engine โโโโโโถโ Pad Synth โโโโผโโถโ Master โ
โ (scale,bpm) โ โ (I-IV-V-I etc) โ โ + Bass โ โ โ Gain โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ โ
โ โ โโโโโโโค
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ โReverbโ
โ Formant โโโโโโถโ Bandpass Filter โโโโโโถโ Vocal โโโโค โ โโโโฌโโโ
โ Vocal Syn โ โ Chain (F1,F2,F3) โ โ "Aah/Ooh" โ โ โ โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโ
โ Drum โโโโโโถโ Kick/Snare/HH โโโโโโถโ Drum Bus โโโโ โโโถโ Analyser โ
โ Pattern โ โ Synthesis โ โ โ โ + Output โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ
| # | Song | Artist | Genre | BPM | Key |
|---|---|---|---|---|---|
| 1 | Senja di Sudirman | Dian Sastro | Pop | 110 | C |
| 2 | Hujan di Senopati | Wulan | Jazz | 95 | Am |
| 3 | Kereta Terakhir | Davina | Rock | 125 | Em |
| 4 | Kopi dan Janji | Titi Kamal | Lo-fi | 85 | F |
| 5 | Lampu Kota | Davina | Electronic | 128 | G |
All songs feature Jakarta city themes with both Indonesian and English lyrics.
| Layer | Technology | Purpose |
|---|---|---|
| Melody | OscillatorNode + LowpassFilter | Genre-specific waveforms with vibrato |
| Vocals | 3x Detuned Oscillators + BandpassFilter chain | Formant synthesis (vowel-like "aah/ooh") |
| Speech | SpeechSynthesisUtterance | Lyric articulation with pitch/rate tuning |
| Chords | Detuned OscillatorNode pairs + LowpassFilter | Pad sounds with smooth crossfade envelopes |
| Bass | OscillatorNode + LowpassFilter (400Hz) | Genre-specific waveform bass lines |
| Drums | OscillatorNode + AudioBuffer (noise) | Synthesized kick, snare, hi-hat |
| Reverb | ConvolverNode (procedural impulse) | Exponential decay with early reflections |
| Visualizer | AnalyserNode + Canvas 2D | 80-bar frequency spectrum at 2x resolution |
Each genre defines a unique combination of:
- Scale โ Ionian, Minor, Blues, Pentatonic, Dorian
- Chord Progression โ I-IV-V-I, i-iv-III-i, Imaj7-IVmaj7, etc.
- Waveforms โ sine, triangle, sawtooth, square
- Formant Frequencies โ F1/F2/F3 tuning for vocal character
- Drum Pattern โ Beat placement and swing feel
- Effects โ Reverb decay (1.2sโ4.0s), filter cutoff, wet/dry mix
| SonicAI | Suno AI | |
|---|---|---|
| Voice Quality | Formant synth + TTS | Neural vocal synthesis |
| Latency | Instant (client-side) | Secondsโminutes |
| Dependencies | None (browser only) | GPU servers |
| Offline | Full support | Requires internet |
| Cost | Free & open source | Subscription |
| Privacy | 100% local | Data sent to servers |
| File Size | ~40KB | Multi-GB models |
The full technical paper is available at SonicAI_Paper.pdf, covering:
- System architecture and audio signal flow
- Text-to-melody conversion algorithm
- Formant vocal synthesis with bandpass filter chains
- Genre-adaptive chord progression and drum pattern engines
- Chrome autoplay policy compliance
- Comparison with deep learning approaches
- Limitations and future work
This project is open source and available under the MIT License.
Romi Nur Ismanto
- Email: rominur@gmail.com
- GitHub: @romizone
Built with Web Audio API | Deployed on Vercel | Made with โค๏ธ