Idiomic

An ESP32-based pronunciation training device that uses Azure Speech Services to assess spoken language and provide real-time feedback through an LCD display and RGB LED.

Overview

Idiomic is an IoT device designed for language learning. Users speak into a microphone while holding a button, and the device records their speech, sends it to Azure's pronunciation assessment API, and displays the results. The system supports multiple languages including English, Spanish, French, German, Italian, Portuguese, Japanese, Chinese (Mandarin), and Korean.

Hardware Requirements

ESP32 microcontroller
INMP441 I2S MEMS microphone
16x2 I2C LCD display (HD44780 compatible, address 0x27)
Common cathode RGB LED
Momentary push button

Pin Configuration

Component	Pin
RGB LED Red	GPIO 18
RGB LED Green	GPIO 19
RGB LED Blue	GPIO 21
I2S Word Select (WS)	GPIO 25
I2S Serial Clock (SCK)	GPIO 26
I2S Serial Data (SD)	GPIO 27
Button	GPIO 23
I2C SDA	GPIO 14
I2C SCL	GPIO 4

Features

Recording

Hold the button to record (momentary press-and-hold)
Release to stop recording
Auto-stops after 5 seconds maximum
Blue pulsing LED indicates recording in progress

Assessment

Automatic pronunciation assessment via Azure Speech Services
Scores displayed on LCD and indicated by LED color:
- Green: 80% or higher
- Yellow-green: 60-79%
- Yellow: 40-59%
- Red: Below 40%

LCD Display

Shows the phrase to practice
Displays score after assessment
Supports Pinyin tone marks with visual indicators:
- - flat tone (first tone)
- / rising tone (second tone)
- v falling-rising tone (third tone)
- \ falling tone (fourth tone)
Auto-scrolls text longer than 16 characters

Web Interface

Configuration page accessible via the device's IP address
Set reference text for pronunciation assessment
Set LCD display text (romanization with accents/tones)
Select target language
View assessment results and detailed scores (accuracy, fluency, completeness)
Play back recorded audio

Supported Languages

English (US and UK)
Spanish (Spain and Mexico)
French
German
Italian
Portuguese (Brazil)
Japanese
Chinese (Mandarin)
Korean

Configuration

Before uploading, configure the following constants in the code:

const char* ssid = "YOUR_WIFI_SSID";
const char* password = "YOUR_WIFI_PASSWORD";
const char* azureKey = "YOUR_AZURE_SPEECH_KEY";
const char* azureRegion = "YOUR_AZURE_REGION";

Dependencies

ESP32 Arduino Core
LiquidCrystal_I2C library
Built-in libraries: WiFi, WebServer, Wire, SPIFFS, driver/i2s

Usage

Power on the device and wait for WiFi connection
Note the IP address displayed on the LCD
Access the web interface from a browser on the same network
Configure the language and reference phrase
Hold the button and speak the phrase
Release the button to trigger assessment
View your score on the LCD and web interface

Audio Format

Sample rate: 16000 Hz
Bit depth: 16-bit
Channels: Mono
Format: WAV (PCM)

File Storage

The device uses SPIFFS for persistent storage:

/recording.wav - Latest audio recording
/config.txt - Saved configuration (reference text, language, romanization)

License

This project is provided as-is for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
main.ino		main.ino

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Idiomic

Overview

Hardware Requirements

Pin Configuration

Features

Recording

Assessment

LCD Display

Web Interface

Supported Languages

Configuration

Dependencies

Usage

Audio Format

File Storage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Idiomic

Overview

Hardware Requirements

Pin Configuration

Features

Recording

Assessment

LCD Display

Web Interface

Supported Languages

Configuration

Dependencies

Usage

Audio Format

File Storage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages