Mac Dia Server

FastAPI server providing an OpenAI-compatible Text-to-Speech (TTS) API endpoint, utilizing mlx-audio for generation on Apple Silicon.

About Dia-1.6B

Dia-1.6B is a state-of-the-art open-source text-to-speech (TTS) model developed by Nari Labs, featuring 1.6 billion parameters. This service is dedicated to providing TTS capabilities using the fixed model mlx-community/Dia-1.6B-4bit. Key features include:

Multi-speaker dialogue generation using [S1], [S2] tags in the input text
Fine-grained control over voice, emotion, and speaking style
Support for non-verbal expressions like laughter, coughing, and more
Voice cloning capabilities for personalized speech synthesis
Optimized for English language generation

Dia-1.6B is comparable in performance to leading commercial TTS solutions, while remaining fully open and customizable for research and production use.

Setup

Install Dependencies: Requires Python 3.12+ and uv.

uv venv  # Create virtual environment
source .venv/bin/activate
uv pip install .
# Special attention might be needed for installing mlx and mlx-audio.
# Follow official MLX documentation.
uv run start.py

Configure API Key: Create a .env file in the project root:
```
API_KEY=your_actual_api_key
```
Run the Server:
```
uv run start.py
```

API Endpoint

URL: /v1/audio/speech
Method: POST
Authentication: Authorization: Bearer <YOUR_API_KEY>
Request Body: (See OpenAI TTS API documentation)
- model (string): e.g., "tts-1"
- input (string): Text to synthesize.
- voice (string): e.g., "alloy"
- response_format (string, optional): e.g., "mp3", defaults to "mp3".
- speed (float, optional): Speed, defaults to 1.
Response: Audio stream in the specified format.

CURl

  curl -X 'POST' \
  'http://localhost:8000/v1/audio/speech' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer " \
  -d '{
  "model": "string",
  "input": "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face.",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1
 }' \
    --output speech.mp3

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@/yourfile \
  -F model=mlx-community/whisper-large-v3-turbo \
  -F language=en

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.cursor/rules		.cursor/rules
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh
start.py		start.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mac Dia Server

About Dia-1.6B

Setup

API Endpoint

CURl

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mac Dia Server

About Dia-1.6B

Setup

API Endpoint

CURl

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages