Audio Information Research Lab

All

32 repositories

Transkun
Public
A simple yet effective Audio-to-Midi Automatic Piano Transcription system
Python
•
MIT License
•27•0•0•0•Updated Sep 28, 2024Sep 28, 2024
HARP
Public
A sample editing application allowing for hosted, asynchronous, remote processing of audio with machine learning by routing through Gradio endpoints.
HTML
•
BSD 3-Clause "New" or "Revised" License
•7•0•0•0•Updated Sep 17, 2024Sep 17, 2024
Y-vector
Public
Y-vector: Multiscale Waveform Encoder for Speaker Embedding
Python
•
MIT License
•9•0•0•0•Updated Sep 15, 2024Sep 15, 2024
Filler-semi-CRF
Public
Codebase for "Transcription free filler word detection with Neural semi-CRFs" [ICASSP2023]
Python
•
MIT License
•3•0•0•0•Updated Sep 15, 2024Sep 15, 2024
AIR-ASVspoof
Public
Implementation of the paper "One-class Learning towards Generalized Voice Spoofing Detection"
Jupyter Notebook
•
MIT License
•32•0•0•0•Updated Sep 14, 2024Sep 14, 2024
pyharp
Public
Companion repository which facilitates the creation of Gradio endpoints which are accessible from within Digital Audio Workstations (DAWs) through HARP.
Python
•
BSD 3-Clause "New" or "Revised" License
•4•0•0•0•Updated Sep 6, 2024Sep 6, 2024
SynthTab
Public
Official Repository for ICASSP 2024 Paper "SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription"
Python
•
Other
•4•0•0•0•Updated Aug 22, 2024Aug 22, 2024
MSOC
Public
Python
•1•0•0•0•Updated Jul 29, 2024Jul 29, 2024
BeatNet
Public
BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. (ISMIR 2021's paper implementation).
Python
•
Creative Commons Attribution 4.0 International
•77•0•0•0•Updated May 29, 2024May 29, 2024
Cacophony
Public
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
Python
•
MIT License
•4•0•0•0•Updated Apr 26, 2024Apr 26, 2024
lhvqt
Public
Frontend filterbank learning module with HVQT initialization capabilities.
Python
•
MIT License
•3•0•0•0•Updated Feb 27, 2024Feb 27, 2024
InvitedTalk
Public
Invited talk at group meeting of AIR lab
invited
0•0•0•0•Updated Dec 5, 2023Dec 5, 2023
HRTF_field_norm
Public
Official Implementation of our WASPAA 2023 paper "Mitigating Cross-Database Differences for Learning Unified HRTF Representation"
Python
•
BSD 3-Clause "New" or "Revised" License
•3•0•0•0•Updated Dec 3, 2023Dec 3, 2023
hrtf_field
Public
Official implementation of the ICASSP 2023 paper "HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields"
Python
•
MIT License
•2•0•0•0•Updated Dec 3, 2023Dec 3, 2023
control-vc
Public
This is the implementation for "ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm"
Python
•
Other
•18•0•0•0•Updated Nov 29, 2023Nov 29, 2023
1D-StateSpace
Public
This repository contains the implementation of an efficient joint beat, downbeat, tempo, and meter tracking system using a compact 1D probabilistic state space and a jump-back reward technique. ICASSP 2022.
Python
•
MIT License
•14•0•0•0•Updated Nov 28, 2023Nov 28, 2023
SpeechEmotionAVLearning
Public
Official Implementation of our ICASSP 2024 paper "Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech"
HTML
•2•0•0•0•Updated Nov 25, 2023Nov 25, 2023
HBAS_chapter_voice3
Public
Official implementation of the handbook chapter "Generalizing Voice Presentation Attack Detection to Unseen Synthetic Attacks and Channel Variation"
Python
•
MIT License
•1•0•0•0•Updated Oct 2, 2023Oct 2, 2023
amt-tools
Public
Machine learning tools and framework for automatic music transcription.
Python
•
MIT License
•4•0•0•0•Updated Jul 30, 2023Jul 30, 2023
harana
Public
A neural semi-CRF model for harmonic analysis
Python
•
MIT License
•1•0•0•0•Updated Jul 14, 2023Jul 14, 2023
emotalkingface
Public
The code for the TMM paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"
Python
•
MIT License
•32•0•0•0•Updated Apr 9, 2023Apr 9, 2023
samo
Public
Official Implementation of our ICASSP 2023 paper "SAMO: SPEAKER ATTRACTOR MULTI-CENTER ONE-CLASS LEARNING FOR VOICE ANTI-SPOOFING"
Python
•
MIT License
•10•0•0•0•Updated Apr 5, 2023Apr 5, 2023
guitar-transcription-with-inhibition
Public
Code for the paper "A Data-Driven Methodology for Considering Feasibility and Pairwise Likelihood in Deep Learning Based Guitar Tablature Transcription Systems".
Python
•
MIT License
•2•0•0•0•Updated Dec 14, 2022Dec 14, 2022
GenerativeSourceSeparation
Public
Open source code for the paper 'Music Source Separation with Generative Flow'
Jupyter Notebook
•
MIT License
•1•1•0•0•Updated Nov 18, 2022Nov 18, 2022
DrawAndListen
Public
Code for the paper "Draw and Listen! A Sketch-based System for Music Inpainting", TISMIR 2022
Python
•
MIT License
•2•0•0•0•Updated Nov 4, 2022Nov 4, 2022
Singing-Vocal-Beat-Tracking
Public
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations.
Python
•
MIT License
•4•0•0•0•Updated Sep 4, 2022Sep 4, 2022
BachDuet
Public
BachDuet enables a human performer to improvise a duet counterpoint with a computer agent in real time.
Python
•2•0•0•0•Updated Aug 8, 2022Aug 8, 2022
DyViSE
Public
Official implementation of our MMSP 2022 paper, "Dynamic vision-guided speaker embedding for audio-visual speaker diarization"
Python
•2•0•0•0•Updated Jul 5, 2022Jul 5, 2022
SASV_PR
Public
Official implementation of the Odyssey paper "A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification"
Python
•
MIT License
•5•0•0•0•Updated Jun 24, 2022Jun 24, 2022
sparse-analytic-filters
Public
Code for the paper "Learning Sparse Analytic Filters for Piano Transcription".
Python
•
MIT License
•3•0•0•0•Updated Jun 22, 2022Jun 22, 2022