audio-sound-and-speech

repository of audio,sound and speech related paper ,tools and docs

Papers

https://github.com/google/uis-rnn This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization. https://arxiv.org/abs/1810.04719

https://github.com/philipperemy/deep-speaker

https://github.com/qqueing/DeepSpeaker-pytorch

https://arxiv.org/abs/1604.07160 Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

SURREY-CVSSP SYSTEM FOR DCASE2017 CHALLENGE TASK4

https://arxiv.org/find/all/1/all:+dcase/0/1/0/all/0/1 https://wenku.baidu.com/view/b223255b3186bceb18e8bb71.html https://etymo.io/search/Dcase https://arxiv.org/abs/1612.01611v1 https://arxiv.org/abs/1607.03681v2 https://arxiv.org/abs/1703.06902v1 https://arxiv.org/abs/1609.06026v3 http://karol.piczak.com/papers/Piczak2015-ESC-ConvNet.pdf https://arxiv.org/pdf/1609.05234.pdf(https://github.com/spragunr/deep_q_rl)

https://vijaychan.github.io/Publications/2011%20-%20Survey%20and%20evaluation%20of%20audio%20fingerprinting%20schemes%20for%20mobile%20audio%20search.pdf SURVEY AND EVALUATION OF AUDIO FINGERPRINTING SCHEMES FOR MOBILE QUERY-BY-EXAMPLE APPLICATIONS

Tools and code

https://github.com/google/uis-rnn This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization. https://arxiv.org/abs/1810.04719

https://github.com/dake/openVP 声纹识别

https://github.com/tensorflow/models/tree/master/research/audioset CNN Architectures for Large-Scale Audio Classification

http://projects.csail.mit.edu/soundnet/ SoundNet: Learning Sound�Representations from Unlabeled Video

https://github.com/tyiannak/pyAudioAnalysis http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0144610 https://github.com/librosa/librosa https://github.com/readbeyond/aeneas https://github.com/CPJKU/madmom https://github.com/aalireza/SimpleAudioIndexer https://github.com/craffel/mir_eval

Audio/Sound event detection:

https://github.com/gorinars/dcase16-cnn https://github.com/liuhuang31/dcase17_cnn https://github.com/kahst/AcousticEventDetection https://github.com/nationalparkservice/acoustic_discovery

可视化：

https://github.com/TUT-ARG/sed_vis

https://github.com/TUT-ARG/TUT_Rare_sound_events_mixture_synthesizer

http://tut-arg.github.io/sed_eval/:评估工具

https://github.com/TUT-ARG/sed_vis ：可视工具

https://github.com/znichols/racKet https://github.com/justinsalamon/UrbanSound8K-JAMS http://bmcfee.github.io/papers/scipy2015_librosa.pdf

https://github.com/andabi/voice-vector A deep neural network for finding text-independent speaker embedding written in tensorflow and tensorpack

musical fingerprinting systems:

https://github.com/echonest/echoprint-server Server components for Echoprint https://github.com/beetbox/pyacoustid Python bindings for Chromaprint acoustic fingerprinting and the Acoustid Web service https://acoustid.org AcoustID is a project providing complete audio identification service, based entirely on open source software. https://labrosa.ee.columbia.edu/matlab/audfprint/ audfprint is a (compiled) Matlab script that can take a list of soundfiles and create a database of landmarks, and then subsequently take one or more query audio files and match them against the previously-created database.

https://github.com/dpwe/audfprint Landmark-based audio fingerprinting

https://github.com/spotify/echoprint-server Server for the Echoprint audio fingerprint system https://github.com/worldveil/dejavu Audio fingerprinting and recognition in Python https://github.com/jameslyons/python_speech_features This library provides common speech features for ASR including MFCCs and filterbank energies.