Visually-Indicated-Sounds

Video-to-audio AI

Insert Description

Audioset processing (see dedicated repo): I wrote some code to download video-audio pairs from youtube and store them on AWS S3, together with the strongly labelled annotations from AudioSet (100k+ videos).
Labels augmentation with GPT: I augment Audioset labels to identify sound emitters objects and classify as sound effect (SFX) vs ambience (AMB), by repeatetely calling OpenAI and applying majority voting.
I use ImageBind model (fork repo) to generate embeddings. Imagebind is a multimodal encoder that maps video, audio and text to the same embeddings space.
I migrate the embeddings to Pinecone generating a vector database. Pipeline
Semantic search
Eval
Streamlit app

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
utils		utils
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
Final Report.pdf		Final Report.pdf
GPTLabelsAugmentation.ipynb		GPTLabelsAugmentation.ipynb
Imagebind.ipynb		Imagebind.ipynb
PineconeConnectin.ipynb		PineconeConnectin.ipynb
Progress Report.pdf		Progress Report.pdf
README.md		README.md
SceneSplitting.ipynb		SceneSplitting.ipynb
SemanticSearch.ipynb		SemanticSearch.ipynb
TimeSformer.ipynb		TimeSformer.ipynb
parallel_pipeline.py		parallel_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

utils

utils

.gitignore

.gitignore

EDA.ipynb

EDA.ipynb

Final Report.pdf

Final Report.pdf

GPTLabelsAugmentation.ipynb

GPTLabelsAugmentation.ipynb

Imagebind.ipynb

Imagebind.ipynb

PineconeConnectin.ipynb

PineconeConnectin.ipynb

Progress Report.pdf

Progress Report.pdf

README.md

README.md

SceneSplitting.ipynb

SceneSplitting.ipynb

SemanticSearch.ipynb

SemanticSearch.ipynb

TimeSformer.ipynb

TimeSformer.ipynb

parallel_pipeline.py

parallel_pipeline.py

Repository files navigation

Visually-Indicated-Sounds

About

Releases

Packages

Languages

giorgiodemarchi/Visually-Indicated-Sounds

Folders and files

Latest commit

History

Repository files navigation

Visually-Indicated-Sounds

About

Resources

Stars

Watchers

Forks

Languages