An ESP32-based pronunciation training device that uses Azure Speech Services to assess spoken language and provide real-time feedback through an LCD display and RGB LED.
Idiomic is an IoT device designed for language learning. Users speak into a microphone while holding a button, and the device records their speech, sends it to Azure's pronunciation assessment API, and displays the results. The system supports multiple languages including English, Spanish, French, German, Italian, Portuguese, Japanese, Chinese (Mandarin), and Korean.
- ESP32 microcontroller
- INMP441 I2S MEMS microphone
- 16x2 I2C LCD display (HD44780 compatible, address 0x27)
- Common cathode RGB LED
- Momentary push button
| Component | Pin |
|---|---|
| RGB LED Red | GPIO 18 |
| RGB LED Green | GPIO 19 |
| RGB LED Blue | GPIO 21 |
| I2S Word Select (WS) | GPIO 25 |
| I2S Serial Clock (SCK) | GPIO 26 |
| I2S Serial Data (SD) | GPIO 27 |
| Button | GPIO 23 |
| I2C SDA | GPIO 14 |
| I2C SCL | GPIO 4 |
- Hold the button to record (momentary press-and-hold)
- Release to stop recording
- Auto-stops after 5 seconds maximum
- Blue pulsing LED indicates recording in progress
- Automatic pronunciation assessment via Azure Speech Services
- Scores displayed on LCD and indicated by LED color:
- Green: 80% or higher
- Yellow-green: 60-79%
- Yellow: 40-59%
- Red: Below 40%
- Shows the phrase to practice
- Displays score after assessment
- Supports Pinyin tone marks with visual indicators:
-flat tone (first tone)/rising tone (second tone)vfalling-rising tone (third tone)\falling tone (fourth tone)
- Auto-scrolls text longer than 16 characters
- Configuration page accessible via the device's IP address
- Set reference text for pronunciation assessment
- Set LCD display text (romanization with accents/tones)
- Select target language
- View assessment results and detailed scores (accuracy, fluency, completeness)
- Play back recorded audio
- English (US and UK)
- Spanish (Spain and Mexico)
- French
- German
- Italian
- Portuguese (Brazil)
- Japanese
- Chinese (Mandarin)
- Korean
Before uploading, configure the following constants in the code:
const char* ssid = "YOUR_WIFI_SSID";
const char* password = "YOUR_WIFI_PASSWORD";
const char* azureKey = "YOUR_AZURE_SPEECH_KEY";
const char* azureRegion = "YOUR_AZURE_REGION";- ESP32 Arduino Core
- LiquidCrystal_I2C library
- Built-in libraries: WiFi, WebServer, Wire, SPIFFS, driver/i2s
- Power on the device and wait for WiFi connection
- Note the IP address displayed on the LCD
- Access the web interface from a browser on the same network
- Configure the language and reference phrase
- Hold the button and speak the phrase
- Release the button to trigger assessment
- View your score on the LCD and web interface
- Sample rate: 16000 Hz
- Bit depth: 16-bit
- Channels: Mono
- Format: WAV (PCM)
The device uses SPIFFS for persistent storage:
/recording.wav- Latest audio recording/config.txt- Saved configuration (reference text, language, romanization)
This project is provided as-is for educational purposes.