Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Audio Component #5966

Merged
merged 68 commits into from Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
daa2545
replace <audio> with wavesurfer: add recording, playing and trimming,…
hannahblair Oct 17, 2023
77e1995
add changeset
gradio-pr-bot Oct 17, 2023
11bb06c
Merge branch 'v4' into waveform-audio
abidlabs Oct 18, 2023
05b9351
merge cleanup
hannahblair Oct 18, 2023
f7427f1
improving recording styling
hannahblair Oct 18, 2023
821ddb2
add recording timer
hannahblair Oct 18, 2023
81fc459
Merge branch 'v4' into waveform-audio
hannahblair Oct 19, 2023
787e476
add trim region duration
hannahblair Oct 18, 2023
55fe1f1
allow trimming recordings
hannahblair Oct 19, 2023
a79fe29
Merge branch 'v4' into waveform-audio
hannahblair Oct 19, 2023
e3cf88a
clean up playing logic
hannahblair Oct 19, 2023
2b3b76f
add pause_recording event
hannahblair Oct 19, 2023
c05d721
remove crop min/max
hannahblair Oct 20, 2023
6e16c29
Merge branch 'v4' into waveform-audio
hannahblair Oct 20, 2023
c3cb90d
add waveform options param
hannahblair Oct 21, 2023
e530210
remove trimmingmode and use mode
hannahblair Oct 21, 2023
50bcbf8
Merge branch 'v4' into waveform-audio
hannahblair Oct 23, 2023
dd058c0
streaming + cleanup
hannahblair Oct 23, 2023
15e46dd
add changeset
gradio-pr-bot Oct 23, 2023
9f5beed
Merge branch 'v4' into waveform-audio
hannahblair Oct 23, 2023
ecbc4b3
clean up types
hannahblair Oct 23, 2023
000b0a0
Merge branch 'waveform-audio' of github.com:gradio-app/gradio into wa…
hannahblair Oct 23, 2023
22904fa
mobile adjustments
hannahblair Oct 23, 2023
d868bbf
add min/max length + trim accessibility
hannahblair Oct 23, 2023
8f111eb
Merge branch 'v4' into waveform-audio
hannahblair Oct 23, 2023
2e41dc7
Merge branch 'v4' into waveform-audio
abidlabs Oct 23, 2023
961e399
update pnpm lock
abidlabs Oct 23, 2023
383cea4
amend source to a list and allow source switching
hannahblair Oct 24, 2023
b441b34
fix no microphone found logic
hannahblair Oct 24, 2023
0774e18
Merge branch 'v4' into waveform-audio
hannahblair Oct 24, 2023
605459b
change undo logic to reset trims
hannahblair Oct 24, 2023
256cbf9
Merge branch 'waveform-audio' of github.com:gradio-app/gradio into wa…
hannahblair Oct 24, 2023
8415736
fix conflict
abidlabs Oct 24, 2023
d1b67f0
tweaks
hannahblair Oct 24, 2023
84b1ddc
Merge branch 'waveform-audio' of github.com:gradio-app/gradio into wa…
hannahblair Oct 24, 2023
c6a0f0c
tweak reset logic
hannahblair Oct 24, 2023
a215271
ensure recording is sent to backend
hannahblair Oct 25, 2023
b21cad8
fix audio duration reactivity
hannahblair Oct 26, 2023
acd7474
list tweak
hannahblair Oct 26, 2023
ffdadcb
clean up
hannahblair Oct 26, 2023
913fee9
change source -> sources + restore wasm changes
hannahblair Oct 26, 2023
dbbdef1
Merge branch 'v4' into waveform-audio
hannahblair Oct 26, 2023
432dcdb
formatting
hannahblair Oct 26, 2023
7748b86
fix tests
hannahblair Oct 26, 2023
97bf63a
fix test
hannahblair Oct 26, 2023
2ae39e2
add default sources value in fe + fix audio demos
hannahblair Oct 26, 2023
348927a
fix audio file name test
hannahblair Oct 26, 2023
9a386f6
Merge branch 'v4' into waveform-audio
hannahblair Oct 26, 2023
275640b
add better sources typing
hannahblair Oct 26, 2023
ff3ceef
Merge branch 'waveform-audio' of github.com:gradio-app/gradio into wa…
hannahblair Oct 26, 2023
62e660a
ui test tweaks
hannahblair Oct 27, 2023
1021a9e
add default value in templates.py
hannahblair Oct 27, 2023
3b719e7
formatting
hannahblair Oct 27, 2023
2c03894
Merge branch 'v4' into waveform-audio
hannahblair Oct 27, 2023
0d379b4
remove unused prop
hannahblair Oct 27, 2023
e748918
Merge branch 'waveform-audio' of github.com:gradio-app/gradio into wa…
hannahblair Oct 27, 2023
d6a963c
add audio story
hannahblair Oct 27, 2023
c60fdde
add changeset
gradio-pr-bot Oct 27, 2023
0ebdf17
revert sources changes
hannahblair Oct 27, 2023
a4468ef
Merge branch 'waveform-audio' of github.com:gradio-app/gradio into wa…
hannahblair Oct 27, 2023
8861dcc
remove story id
hannahblair Oct 27, 2023
91ff1b9
fix be test
hannahblair Oct 27, 2023
323b7e5
fix be test
hannahblair Oct 27, 2023
81abbef
fix notebooks
pngwn Oct 27, 2023
33fd591
formatting
hannahblair Oct 27, 2023
ee3e016
Merge branch 'waveform-audio' of github.com:gradio-app/gradio into wa…
hannahblair Oct 27, 2023
a1ed438
fix test
hannahblair Oct 27, 2023
653ccf7
fix test again
hannahblair Oct 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 10 additions & 0 deletions .changeset/great-moles-matter.md
@@ -0,0 +1,10 @@
---
"@gradio/app": minor
"@gradio/audio": minor
"@gradio/icons": minor
"@gradio/storybook": minor
"@gradio/utils": minor
"gradio": minor
---

feat:Improve Audio Component
2 changes: 1 addition & 1 deletion demo/asr/run.ipynb
@@ -1 +1 @@
{"cells": [{"cell_type": "markdown", "id": "302934307671667531413257853548643485645", "metadata": {}, "source": ["# Gradio Demo: asr"]}, {"cell_type": "code", "execution_count": null, "id": "272996653310673477252411125948039410165", "metadata": {}, "outputs": [], "source": ["!pip install -q gradio torch torchaudio transformers"]}, {"cell_type": "code", "execution_count": null, "id": "288918539441861185822528903084949547379", "metadata": {}, "outputs": [], "source": ["import gradio as gr\n", "from transformers import pipeline\n", "import numpy as np\n", "\n", "transcriber = pipeline(\"automatic-speech-recognition\", model=\"openai/whisper-base.en\")\n", "\n", "def transcribe(audio):\n", " sr, y = audio\n", " y = y.astype(np.float32)\n", " y /= np.max(np.abs(y))\n", "\n", " return transcriber({\"sampling_rate\": sr, \"raw\": y})[\"text\"]\n", "\n", "\n", "demo = gr.Interface(\n", " transcribe,\n", " gr.Audio(source=\"microphone\"),\n", " \"text\",\n", ")\n", "\n", "if __name__ == \"__main__\":\n", " demo.launch()\n"]}], "metadata": {}, "nbformat": 4, "nbformat_minor": 5}
{"cells": [{"cell_type": "markdown", "id": "302934307671667531413257853548643485645", "metadata": {}, "source": ["# Gradio Demo: asr"]}, {"cell_type": "code", "execution_count": null, "id": "272996653310673477252411125948039410165", "metadata": {}, "outputs": [], "source": ["!pip install -q gradio torch torchaudio transformers"]}, {"cell_type": "code", "execution_count": null, "id": "288918539441861185822528903084949547379", "metadata": {}, "outputs": [], "source": ["import gradio as gr\n", "from transformers import pipeline\n", "import numpy as np\n", "\n", "transcriber = pipeline(\"automatic-speech-recognition\", model=\"openai/whisper-base.en\")\n", "\n", "def transcribe(audio):\n", " sr, y = audio\n", " y = y.astype(np.float32)\n", " y /= np.max(np.abs(y))\n", "\n", " return transcriber({\"sampling_rate\": sr, \"raw\": y})[\"text\"]\n", "\n", "\n", "demo = gr.Interface(\n", " transcribe,\n", " gr.Audio(sources=[\"microphone\"]),\n", " \"text\",\n", ")\n", "\n", "if __name__ == \"__main__\":\n", " demo.launch()\n"]}], "metadata": {}, "nbformat": 4, "nbformat_minor": 5}
2 changes: 1 addition & 1 deletion demo/asr/run.py
Expand Up @@ -14,7 +14,7 @@ def transcribe(audio):

demo = gr.Interface(
transcribe,
gr.Audio(source="microphone"),
gr.Audio(sources=["microphone"]),
"text",
)

Expand Down
2 changes: 1 addition & 1 deletion demo/blocks_kitchen_sink/run.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion demo/blocks_kitchen_sink/run.py
Expand Up @@ -167,7 +167,7 @@ def clear():
with gr.Tab("Audio"):
with gr.Row():
gr.Audio()
gr.Audio(source="microphone")
gr.Audio(sources=["microphone"])
gr.Audio(join(KS_FILES, "cantina.wav"))
with gr.Tab("Other"):
# gr.Image(source="webcam")
Expand Down
2 changes: 1 addition & 1 deletion demo/kitchen_sink/run.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion demo/kitchen_sink/run.py
Expand Up @@ -107,7 +107,7 @@ def fn(
gr.Image(label="Webcam", source="webcam"),
gr.Video(label="Video"),
gr.Audio(label="Audio"),
gr.Audio(label="Microphone", source="microphone"),
gr.Audio(label="Microphone", sources=["microphone"]),
gr.File(label="File"),
gr.Dataframe(label="Dataframe", headers=["Name", "Age", "Gender"]),
],
Expand Down
2 changes: 1 addition & 1 deletion demo/main_note/run.ipynb
@@ -1 +1 @@
{"cells": [{"cell_type": "markdown", "id": "302934307671667531413257853548643485645", "metadata": {}, "source": ["# Gradio Demo: main_note"]}, {"cell_type": "code", "execution_count": null, "id": "272996653310673477252411125948039410165", "metadata": {}, "outputs": [], "source": ["!pip install -q gradio scipy numpy matplotlib"]}, {"cell_type": "code", "execution_count": null, "id": "288918539441861185822528903084949547379", "metadata": {}, "outputs": [], "source": ["# Downloading files from the demo repo\n", "import os\n", "os.mkdir('audio')\n", "!wget -q -O audio/cantina.wav https://github.com/gradio-app/gradio/raw/main/demo/main_note/audio/cantina.wav\n", "!wget -q -O audio/recording1.wav https://github.com/gradio-app/gradio/raw/main/demo/main_note/audio/recording1.wav"]}, {"cell_type": "code", "execution_count": null, "id": "44380577570523278879349135829904343037", "metadata": {}, "outputs": [], "source": ["from math import log2, pow\n", "import os\n", "\n", "import numpy as np\n", "from scipy.fftpack import fft\n", "\n", "import gradio as gr\n", "\n", "A4 = 440\n", "C0 = A4 * pow(2, -4.75)\n", "name = [\"C\", \"C#\", \"D\", \"D#\", \"E\", \"F\", \"F#\", \"G\", \"G#\", \"A\", \"A#\", \"B\"]\n", "\n", "\n", "def get_pitch(freq):\n", " h = round(12 * log2(freq / C0))\n", " n = h % 12\n", " return name[n]\n", "\n", "\n", "def main_note(audio):\n", " rate, y = audio\n", " if len(y.shape) == 2:\n", " y = y.T[0]\n", " N = len(y)\n", " T = 1.0 / rate\n", " yf = fft(y)\n", " yf2 = 2.0 / N * np.abs(yf[0 : N // 2])\n", " xf = np.linspace(0.0, 1.0 / (2.0 * T), N // 2)\n", "\n", " volume_per_pitch = {}\n", " total_volume = np.sum(yf2)\n", " for freq, volume in zip(xf, yf2):\n", " if freq == 0:\n", " continue\n", " pitch = get_pitch(freq)\n", " if pitch not in volume_per_pitch:\n", " volume_per_pitch[pitch] = 0\n", " volume_per_pitch[pitch] += 1.0 * volume / total_volume\n", " volume_per_pitch = {k: float(v) for k, v in volume_per_pitch.items()}\n", " return volume_per_pitch\n", "\n", "\n", "demo = gr.Interface(\n", " main_note,\n", " gr.Audio(source=\"microphone\"),\n", " gr.Label(num_top_classes=4),\n", " examples=[\n", " [os.path.join(os.path.abspath(''),\"audio/recording1.wav\")],\n", " [os.path.join(os.path.abspath(''),\"audio/cantina.wav\")],\n", " ],\n", " interpretation=\"default\",\n", ")\n", "\n", "if __name__ == \"__main__\":\n", " demo.launch()\n"]}], "metadata": {}, "nbformat": 4, "nbformat_minor": 5}
{"cells": [{"cell_type": "markdown", "id": "302934307671667531413257853548643485645", "metadata": {}, "source": ["# Gradio Demo: main_note"]}, {"cell_type": "code", "execution_count": null, "id": "272996653310673477252411125948039410165", "metadata": {}, "outputs": [], "source": ["!pip install -q gradio scipy numpy matplotlib"]}, {"cell_type": "code", "execution_count": null, "id": "288918539441861185822528903084949547379", "metadata": {}, "outputs": [], "source": ["# Downloading files from the demo repo\n", "import os\n", "os.mkdir('audio')\n", "!wget -q -O audio/cantina.wav https://github.com/gradio-app/gradio/raw/main/demo/main_note/audio/cantina.wav\n", "!wget -q -O audio/recording1.wav https://github.com/gradio-app/gradio/raw/main/demo/main_note/audio/recording1.wav"]}, {"cell_type": "code", "execution_count": null, "id": "44380577570523278879349135829904343037", "metadata": {}, "outputs": [], "source": ["from math import log2, pow\n", "import os\n", "\n", "import numpy as np\n", "from scipy.fftpack import fft\n", "\n", "import gradio as gr\n", "\n", "A4 = 440\n", "C0 = A4 * pow(2, -4.75)\n", "name = [\"C\", \"C#\", \"D\", \"D#\", \"E\", \"F\", \"F#\", \"G\", \"G#\", \"A\", \"A#\", \"B\"]\n", "\n", "\n", "def get_pitch(freq):\n", " h = round(12 * log2(freq / C0))\n", " n = h % 12\n", " return name[n]\n", "\n", "\n", "def main_note(audio):\n", " rate, y = audio\n", " if len(y.shape) == 2:\n", " y = y.T[0]\n", " N = len(y)\n", " T = 1.0 / rate\n", " yf = fft(y)\n", " yf2 = 2.0 / N * np.abs(yf[0 : N // 2])\n", " xf = np.linspace(0.0, 1.0 / (2.0 * T), N // 2)\n", "\n", " volume_per_pitch = {}\n", " total_volume = np.sum(yf2)\n", " for freq, volume in zip(xf, yf2):\n", " if freq == 0:\n", " continue\n", " pitch = get_pitch(freq)\n", " if pitch not in volume_per_pitch:\n", " volume_per_pitch[pitch] = 0\n", " volume_per_pitch[pitch] += 1.0 * volume / total_volume\n", " volume_per_pitch = {k: float(v) for k, v in volume_per_pitch.items()}\n", " return volume_per_pitch\n", "\n", "\n", "demo = gr.Interface(\n", " main_note,\n", " gr.Audio(sources=[\"microphone\"]),\n", " gr.Label(num_top_classes=4),\n", " examples=[\n", " [os.path.join(os.path.abspath(''),\"audio/recording1.wav\")],\n", " [os.path.join(os.path.abspath(''),\"audio/cantina.wav\")],\n", " ],\n", " interpretation=\"default\",\n", ")\n", "\n", "if __name__ == \"__main__\":\n", " demo.launch()\n"]}], "metadata": {}, "nbformat": 4, "nbformat_minor": 5}
2 changes: 1 addition & 1 deletion demo/main_note/run.py
Expand Up @@ -42,7 +42,7 @@ def main_note(audio):

demo = gr.Interface(
main_note,
gr.Audio(source="microphone"),
gr.Audio(sources=["microphone"]),
gr.Label(num_top_classes=4),
examples=[
[os.path.join(os.path.dirname(__file__),"audio/recording1.wav")],
Expand Down