ComfyUI-speech-dataset-toolkit

Overview

Basic audio tools using torchaudio for ComfyUI. It is assumed to assist in the speech dataset creation for ASR, TTS, etc.

Features

Basic
- Load & Save audio
Edit
- Cut and Trim
- Split and Join
- Silence
- Resample
Visualization
- WaveForm
- Specgram
- Spectrogram
- MelFilterBank
- Pitch
AI

Requirement

Install torchaudio according to your environment.

cd custom_nodes
git clone https://github.com/kale4eat/ComfyUI-speech-dataset-toolkit.git
cd ComfyUI-speech-dataset-toolkit
pip3 install torchaudio --index-url https://download.pytorch.org/whl/cu121
pip3 install -r requirements.txt

If you use silero-vad, install onnxruntime according to your environment.

pip install onnxruntime-gpu

Usage

At first startup, audio_input and audio_output folder is created.

ComfyUI
├── audio_input
├── audio_output
├── custom_nodes
│   └── ComfyUI-speech-dataset-toolkit
...

Fisrt of all, use a Load Audio node to load audio.

Please put the audio files you wish to process in a audio_input folder in advance. If you've added files while the app is running, please reload the page (press F5).

audio, the data type of ComfyUI flow, consists of waveform and sample rate. Many nodes of this extension handle this data.

For example, Demucs separate drums, bass, vocals and other stems. Each of them is audio data.

Finally, use a Save Audio node to save audio. The audio is saved to audio_output folder.

Note

There are some unsettled policies, destructive changes may be made.

This repository does not contain the nodes such as numerical operations and string processing.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ai		ai
images		images
web/js		web/js
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
basic_nodes.py		basic_nodes.py
edit_nodes.py		edit_nodes.py
filter_nodes.py		filter_nodes.py
folder_util.py		folder_util.py
node_def.py		node_def.py
requirements.txt		requirements.txt
server.py		server.py
spec_nodes.py		spec_nodes.py
visualize_node.py		visualize_node.py

License

kale4eat/ComfyUI-speech-dataset-toolkit

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-speech-dataset-toolkit

Overview

Features

Requirement

Usage

Note

Inspiration

About

Resources

License

Stars

Watchers

Forks

Languages