[20220904] Weekly AI ArXiv 만담 - 65회차 #65

jungwoo-ha · 2022-09-02T13:20:00Z

News

Conferences
- ICDM 2022 (11.28 - 12. 1, Orlando, US) Notification: 모두 축하드립니다.
- EMNLP 2022: Rebuttal 1주 연장 (9.4 AoE 까지)
- AI Rush Conference 2022 (9.7 오후 1시부터)
디지털플랫폼정부위원회 출범
- 6개분과: AI&Data, 인프라, 서비스, 일하는 방식, 생태계, 정보보호
- Not 전자정부 extension But 국가전략산업-Moonshot
미, AI 반도체 중국 수출중단…중 "패권주의" 반발
모두 태풍 유의하세요!

ArXiv

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
- AI2, NYU, 존스홉킨스 연구진이 함께한 NLP 연구자들이 인식하고 있는 현재 NLP연구에 대한 인식 조사 결과
- 조사대상: NLP연구자 480명 2022년 5-6월동안 진행 (데모그래픽은 논문 참조)
- 두 종류의 질문: 1) 질문에 대해 4단계 응답, 2) 질문에 NLP 커뮤니티는 얼마나 동의할지 예상 해보시오.
  (이걸로 underestimate or overestimate 를 판단할 수 있음)
- 핵심 질문 5가지와 결과
  - 많은 응답자들이 scale만 키우는 걸로 모든 실제적 문제 해결가능하다에 대해 동의하지 않음. 일부는 AGI로 가는 한걸음이라고 생각은 하는 듯
  - 많은 연구자들이 연구의 과학적 가치에 대해서는 조금 회의적
  - 많은 연구자들이 관련분야와 협업의 중요성은 인식하면서도 커뮤니티는 그렇지 않다고 생각하는 듯...
  - 많은 연구자들이 NLP연구가 과거에도 미래에도 기여할 것이라 생각. 그러나 일부는 큰 risk를 초래할 수도 있다고 생각
  - 다음 10년간 가장 중요한 부분은 HW/data scaling보다 "문제 정의와 설계"가 더 중요할 것이라고...
흥미있는 논문들
- Augraphy: A Data Augmentation Library for Document Images
  - OCR하시는 분들 께 희소식! 출력물에 팩스 까지 aug한 python 기반 문서 이미지 data aug
- Visual Prompting via Image Inpainting (from UC Berkeley, Tel Aviv Univ.)
  - Pretrained 이미지 모델 (decoder있는)에 prompt 로 in-output 이미지를 붙이고 빈부분을 inpainting하는 것만 해도 된다??
  - MAE-VQGAN 구조. 프로젝트 페이지는 https://yossigandelsman.github.io/visual_prompt/
- Stable Diffusion을 Slack 에 연동하는 코드: https://github.com/hunkim/slack_diffusion

ghlee3401 · 2022-09-03T17:01:19Z

Arxiv

Mel Spectrogram Inversion with Stable Pitch
- ISMIR 2022 / Apple / Vocoder
- Sample URL : https://machinelearning.apple.com/research/mel-spectrogram (아직 페이지가 없음)
- Problem : speech와 다르게 musical sound는 여러 개의 음이 겹쳐있고 구조가 복잡하기 때문에, 기존에 speech를 위한 보코더를 music 생성에 사용하였을 때 phase를 제대로 복원하지 못해서 artifact가 생겨남
- Method
  - 이 논문에서는 magnitude와 phase를 mel-spectrogram으로 예측하여 이를 iSTFT를 취하여 복원하는 phase-gradient 보코더를 제안함
  - Figure 2를 보면 Magnitude는 해석이 쉬운 반면 phase는 해석이 어렵기 때문에 phase의 gradient를 frequency 축(세로)으로, frame(가로) 축으로 구하여 예측하고 integration시켜 phase를 복원하는 방법을 사용
  - 즉, gradient를 구하는 방법과 integration 하는 방법에 따라서 성능이 달라질 수 있음
- Experiment
  - mel-gan, hifigan, cargan, diffwave와 reconstruction을 비교하였음
  - 13시간의 ambient music loops를 사용
  - 논문에서 정의한 harmonic error로 살펴보면 여러 semitone(반음)간격으로 음이 나올 때 pitch의 안정성이 phase -gradient가 가장 나은 것을 보여줌
- Results
  - Ambient music, NSynth dataset, one second long notes and chords (N+C)데이터를 이용하여 평가
  - hifigan이 FAD가 phase-radient보다 ambient, N+C데이터셋에 대하여 제안한 모델보다 좋지만, pitch의 안정성은 떨어지는 것으로 보임
  - diffwave의 경우 고주파 쪽에 hiss 노이즈가 생기지만 harmonic component에 대해서 안정적으로 합성
  - cargan의 경우 chunk단위로 autoregressive하게 생성하는 모델인데 chunk의 경계에서 노이즈가 발생
  - melgan의 경우 기계음이나 artifact가 많이 발생하였음
  - phase-gradient의 neural net의 속도는 빠르지만, phase integration 과정이 autoregressive로 이루어져 느림
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks
- INTERSPEECH2022 / Google LLC, DeepMind / TTS
- Sample URL : https://google.github.io/chive-prosody/chive-bert-synthetic/
- Problem : Tacotron 기반의 TTS모델을 accent transfer에 사용하였을 때, early stopping, word skipping, word repetition 등의 문제가 발생하여 충분한 양의 데이터 셋이 확보 되어야 함
- Method : Tacotron 기반(speaker와 accent를 condition)으로 accent transfer 모델 (TTS 모델)을 만들어서 speech를 합성하고 합성한 데이터를 이용하여 CHiVE-BERT로 prosody embedding을 만들고 WaveGRU를 이용하여 wav로 합성
- Experiment
  - US (North American accent) : 255K utterance (34 speakers)
  - AU (Australian accent) : 54K utterance (5 speakers)
  - GB (British accent) : 60K utterance (8 speakers)
  - IN (Indian accent) : 34K utterance (7 speakers)
- Results
  - Naturalness (short-form) : 짧은 문장에 대하여 neutral 과 news reading sytle 사이의 차이
  - Accent quality : native accent speaker에게 음성 샘플을 들려주고 해당 억양이 적절한지 평가
  - Naturalness (long-form) : 긴 문장에 대하여 accent가 잘 전달이 되었는지 평가
  - News-reading style appropriateness : news 의 억양으로 적절한지 평가
  - 결과적으로는 synthetic speech의 quality가 최종 결과에 영향을 미친다는 것 (당연하지 않나?)
  - 최종 결과가 MOS 4.0을 넘으니까 쓸만하다는 것 (??)
  - 이러한 task를 하기 위해서 synthetic data를 얼마나 사용할 것인지 실제 recording과 비율은 어떻게 할 것인지 합성음이 얼마나 자연스러워야되는지에 대해서 논의하고 실험을 설계해야함 (당연하지 않나???)
- 흥미로운 연구
  - Turn-Taking Prediction for Natural Conversational Speech
    - INTERSPEECH2022 / Google / Conversational speech
    - 대화 데이터로 만드는 speech recognition을 하는 것으로 음이 길어진다든지, 간투어가 있을 때 어떤 말을 하고자 하는지 인식하기 위한 연구

hollobit · 2022-09-04T12:37:31Z

“당신 목소리를 ‘백인처럼’ 바꿔줍니다”… 美 스타트업 ‘콜센터 음성 변환’ 기술 논란

https://www.chosun.com/international/international_general/2022/08/26/FYKTIJQ5D5BVBNULZKBW7BQ5QA/

AI-Generated image 서비스 관련 뉴스들

https://multimodal.art/news/1-week-of-stable-diffusion stablediffusion이 출시되고 1주일 동안 벌어진 일들
https://promptbase.com/marketplace 프롬프트 마켓플레이스
https://lexica.art/ 프롬프트 검색엔진 - Search over 5M+ Stable Diffusion images and prompts
https://arstechnica.com/information-technology/2022/08/ai-wins-state-fair-art-contest-annoys-humans/ Midjourney 를 이용해 만든 이미지로 미술대회 우승
- https://imnews.imbc.com/news/2022/world/article/6404569_35680.html

https://andys.page/posts/how-to-draw/ Stable Diffusion img2img을 이용해서 4.2GB GPUB에서 멋진 그림을 만든 과정에 대한 소개

https://twitter.com/karenxcheng/status/1564626773001719813?s=21&t=m8OhuBHEhiq63-zuSKFh7g DALL-E를 이용하여 비디오에서 패션 생성

DALL-E outpainting

https://openai.com/blog/dall-e-introducing-outpainting/
https://petapixel.com/2022/09/01/dall-e-announce-feature-that-allows-you-to-extend-an-image-beyond-border/

inpainting - 이미 생성되거나 업로드된 이미지 내에서 변경
outpainting - 원본 이미지를 확장하여 모든 종횡비로 대규모 이미지를 생성

veritas9872 · 2022-09-04T13:17:36Z

딥러닝과 언어학, 논리, 심리학의 연관성에 대해 심도 있는 논의를 진행한 책 소개해드립니다.
하버드대학의 스티븐 핑커 교수님 저서로 핵심 주제는 논리성과 이성에 대한 것이지만 딥러닝과 논리에 대해 매우 수준 높은 논의가 있어 공유해드립니다.
https://stevenpinker.com/publications/rationality-what-it-why-it-seems-so-scarce-and-why-it-matters

haebom · 2022-09-04T13:33:07Z

카카오브레인에서 2022년 8월 31일 Text-Image Pairing 데이터셋 🐺 Coyo를 공개 했습니다.

Github : https://github.com/kakaobrain/coyo-dataset
HuggingFace : https://huggingface.co/datasets/kakaobrain/coyo-700m

관련기사 : https://www.aitimes.kr/news/articleView.html?idxno=25882

‘코요’는 카카오브레인이 앞서 지난 4월에 공개한 초거대 AI 이미지 생성 모델 ‘RQ-트랜스포머(Transformer)’와 AI 아티스트 ‘칼로(Karlo)’ 개발에 적용됐으며, 기술의 우수성을 종합적으로 인정받아 6월에 열린 세계적 학술대회 CVPR 2022에서 해당 논문의 발표 기회를 얻었다. 또, 최근 현대미술가 고상우, 삼성전자 ‘갤럭시 북 아트 프로젝트’ 와의 협업을 통해 AI 아티스트 ‘칼로’의 활용 가능성을 입증한 바 있다.

아참 그리고 말씀 못드렸는데 카카오브레인 절찬 채용 중입니다!

https://careers.kakaobrain.com/careers

jungwoo-ha closed this as completed Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20220904] Weekly AI ArXiv 만담 - 65회차 #65

[20220904] Weekly AI ArXiv 만담 - 65회차 #65

jungwoo-ha commented Sep 2, 2022 •

edited

Loading

ghlee3401 commented Sep 3, 2022 •

edited

Loading

hollobit commented Sep 4, 2022 •

edited

Loading

veritas9872 commented Sep 4, 2022

haebom commented Sep 4, 2022 •

edited

Loading

[20220904] Weekly AI ArXiv 만담 - 65회차 #65

[20220904] Weekly AI ArXiv 만담 - 65회차 #65

Comments

jungwoo-ha commented Sep 2, 2022 • edited Loading

News

ArXiv

ghlee3401 commented Sep 3, 2022 • edited Loading

Arxiv

hollobit commented Sep 4, 2022 • edited Loading

“당신 목소리를 ‘백인처럼’ 바꿔줍니다”… 美 스타트업 ‘콜센터 음성 변환’ 기술 논란

AI-Generated image 서비스 관련 뉴스들

DALL-E outpainting

veritas9872 commented Sep 4, 2022

haebom commented Sep 4, 2022 • edited Loading

아참 그리고 말씀 못드렸는데 카카오브레인 절찬 채용 중입니다!

jungwoo-ha commented Sep 2, 2022 •

edited

Loading

ghlee3401 commented Sep 3, 2022 •

edited

Loading

hollobit commented Sep 4, 2022 •

edited

Loading

haebom commented Sep 4, 2022 •

edited

Loading