Skip to content

Commit

Permalink
new papers and removed a gem for compilation on my home computer
Browse files Browse the repository at this point in the history
  • Loading branch information
macarbonneau committed Oct 31, 2023
1 parent 60557fc commit a1ec432
Show file tree
Hide file tree
Showing 10 changed files with 52 additions and 28 deletions.
1 change: 0 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ group :jekyll_plugins do
gem 'jekyll-toc'
gem 'jekyll-twitter-plugin'
gem 'jemoji'
gem 'mini_racer'
gem 'unicode_utils'
gem 'webrick'
end
Expand Down
31 changes: 19 additions & 12 deletions _bibliography/papers.bib
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,22 @@ @ARTICLE{Carbonneau2022
selected={true}
}

@misc{vanniekerk2023rhythm,
title={Rhythm Modeling for Voice Conversion},
author={Benjamin {van Niekerk} and Marc-André Carbonneau and Herman Kamper},
year={2023},
eprint={2307.06040},
journal ={arXiv},
preview={Urhythmic.png},
bibtex_show={true},
selected={true},
}
@ARTICLE{vanniekerk2023rhythm,
author={{van Niekerk}, Benjamin and Carbonneau, Marc-André and Kamper, Herman},
journal={IEEE Signal Processing Letters},
title={Rhythm Modeling for Voice Conversion},
year={2023},
volume={30},
number={},
pages={1297-1301},
preview={Urhythmic.png},
bibtex_show={true},
selected={true},
abstract={Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic—an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments. Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody.},
doi={10.1109/LSP.2023.3313515}
}




@article{langevin_energy_2021,
Expand Down Expand Up @@ -232,9 +238,10 @@ @inproceedings{10.1145/3424636.3426898
booktitle = {Proceedings of the 13th ACM SIGGRAPH Conference on Motion, Interaction and Games},
articleno = {9},
numpages = {11},
keywords = {artist controlled character creation, fine facial features, face texture generation, image-to-image translation},
keywords = {artist controlled character creation, image-to-image translation, face texture generation, fine facial features},
location = {Virtual Event, SC, USA},
series = {MIG '20},
preview={facemig.png},
bibtex_show={true},
selected={false},
}

18 changes: 9 additions & 9 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,18 +119,18 @@ bing_site_verification: # out your bing-site-verification ID (Bing Webmaster)
# Blog
# -----------------------------------------------------------------------------

blog_name: News # blog_name will be displayed in your blog page
blog_nav_title: news # your blog must have a title for it to be displayed in the nav bar
blog_description: This is unlikely to be up-to-date :)
permalink: /blog/:year/:title/
#blog_name: News # blog_name will be displayed in your blog page
#blog_nav_title: news # your blog must have a title for it to be displayed in the nav bar
#blog_description: This is unlikely to be up-to-date :)
#permalink: /blog/:year/:title/

# Pagination
pagination:
enabled: true
# pagination:
# enabled: true

related_blog_posts:
enabled: true
max_related: 5
# related_blog_posts:
# enabled: true
# max_related: 5

# -----------------------------------------------------------------------------
# Collections
Expand Down
6 changes: 4 additions & 2 deletions _news/announcement_1.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
---
layout: post
date: 2023-06-03 15:59:00-0400
date: 2023-07-15 15:59:00-0400
inline: true
related_posts: false
---

I am moving my old [personal page](https://sites.google.com/site/marcandrecarbonneau/publications) to github.
Ubisoft had published a [blog page](https://www.ubisoft.com/en-us/studio/laforge/news/5ADkkY0BMG9vNSDuUMtkeg/zeroeggs-zeroshot-examplebased-gesture-generation-from-speech) describing our system for gesture generation conditioned on speech.
\
This system was presented in ["ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech"](https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14734) and showcased on [2 minute papers](https://www.youtube.com/watch?v=Dt0cA2phKfU&ab_channel=TwoMinutePapers).
14 changes: 14 additions & 0 deletions _news/announcement_EDM_sound.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
layout: post
date: 2023-10-27 15:59:00-0400
inline: true
related_posts: false
---

Our paper EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis has been accepted for presentation at the NeurIPS Workshop on ML for Audio. This work has been done in collaboration with colleagues from Rochester University.
\
\
In this paper, we propose a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). We also revealed a potential concern regarding diffusion based audio generation models that they tend to generate duplication of the training data.
\
\
Check out the [project page](https://agentcooper2002.github.io/EDMSound/)!
8 changes: 5 additions & 3 deletions _news/announcement_Urhythmic.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
---
layout: post
date: 2023-07-31 15:59:00-0400
date: 2023-09-21 15:59:00-0400
inline: true
related_posts: false
---

We released on Arxiv our latest research effort on voice conversion. In this paper we model the natural rhythm of speakers to perform conversion while respecting the target speaker's natural rhythm. We do more than approximating the global speech rate, we model duration for sonorants, obstruents, and silences.

Our paper ["Rhythm Modeling for Voice Conversion"](https://ieeexplore.ieee.org/document/10246359) has been published in IEEE Signal Processing Letters. We also released it on [Arxiv](https://arxiv.org/abs/2307.06040).
\
In this paper we model the natural rhythm of speakers to perform conversion while respecting the target speaker's natural rhythm. We do more than approximating the global speech rate, we model duration for sonorants, obstruents, and silences.
\
Check out the [demo page](https://ubisoft-laforge.github.io/speech/urhythmic/)!

2 changes: 1 addition & 1 deletion _pages/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Here's a subset of my research interests:


Since 2017, I work as a research scientist at Ubisoft in the [La Forge lab](https://www.ubisoft.com/en-us/studio/laforge).
I lead a group of resarchers applying the latest techniques in machine learning, speech, signal processingm, computer vision & graphics, animation to video games.
I lead a group of resarchers applying the latest techniques in machine learning, speech, signal processing, computer vision & graphics, animation to video games.

Before that, as a PhD student, I was affiliated with two labs:
- LIVIA - [Laboratory for Imagery, Vision and Artificial Intelligence](https://liviamtl.ca/)
Expand Down
Binary file removed assets/img/BrainStorm_1.jpg.crdownload
Binary file not shown.
Binary file removed assets/img/BrainStorm_2.jpg.crdownload
Binary file not shown.
Binary file added assets/img/facemig.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a1ec432

Please sign in to comment.