[20220529] Weekly AI ArXiv 만담 - 53회차 #53

jungwoo-ha · 2022-05-29T03:28:55Z

News
- Conferences
  - ECCV 2022 리뷰 & Rebuttal (내일 새벽 3시?)
  - NeurIPS 2022 서플 데드라인 끝 - 올해농사는 EMNLP만 남았네요~
  - AAAI 2023 데드라인 나왔어요. 8월 8일 (앱스), 8월 15일 (논문)
- Huggingface Endpoint 중 하나로 MS Azure 가
ArXiv
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
  - DALLE2의 대항마로 구글이 내놓은 초거대 Text-to-image 생성 AI
  - 요즘 대세에 맞게 이미지 생성은 diffusion model을 씀. T2I도 Super resolution도
  - Text -> 64x64 (UNet) -> 256 x 256 -> 1024x1024 순서로.
  - 특이한 부분은 멀티모달 pretrained embedding이 아니라 (CLIP 같은) T5-XXL을 사용 (text만 학습한 LM)
  - Diffusion model개선을 위해 dynamic sampling 제안
  - 근본없던 성능 평가 프로토콜을 위해 DrawBench 제안
  - https://imagen.research.google/
- AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
  - ImageNet pretrained 모델을 adapter-based learning하는 연구 (from 홍콩대, Tencent AI)
  - 모델구조는: http://www.shoufachen.com/adaptformer-page/ 에서 티저영상을 보시기를
  - 기본적으로 MAE와 VideoMAE pretraining하고 MLP 부분에 bottleneck 스타일의 추가 파라미터를 사용 (약간 LoRA 랑도 비슷한데)
- 흥미있는 연구
  - Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models Data-driven modeling of protein structure and sequence
    - Diffusion model (DDPM)으로 단백질 구조 생성도
  - Towards Learning Universal Hyperparameter Optimizers with Transformers
    - Transformer로 Hyperparameter도 최적화를 배우는.. 오랜만에 보는 Nando의 연구
  - How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
    - ViT 훈련 하실 분들 필독서! TMLR 첫 논문 같군용~

ghlee0304 · 2022-05-29T04:02:33Z

Arxiv

Self-Supervised Speech Representation Learning: A Review
- Speech Representation을 위한 SSL 관련 연구 정리 논문
- 이전에도 있었음 Audio Self-supervised Learning: A Survey
  -> SSL에 관심이 있으신 분들은 두 개의 논문을 비교하여 읽으면서 공부하면 베스트!
- Speech를 위한 SSL을 나오게 된 배경, Speech representation 관련 approaches, pre-training과 평가를 위한 데이터 셋, SSL technique을 평가하기 위한 실험 설정 방법 등 speech SSL과 관련된 내용들이 포함되어 있음
- Speech가 Computer Vision과 NLP와 다른 특징을 가지는 것에 대한 설명에 공감
- 음성 합성과 관련한 SSL 적용 사례
  - ProsoSpeech (2022) : speech에서 context 정보를 추출하기 위해서 BERT와 유사한 방법을 사용하고, VQ bottleneck feature를 이용하여 speaker와 content 정보를 제외한 prosody 정보를 추출
  - GenerSpeech (2022) : wav2vec 2.0을 이용하되 speaker와 emotion을 분류하는 fine-tune 학습을 진행하여 speech의 global style vector를 추출하는데 사용
  - IQDUBBING (2022) : prosody vector를 추출하기 위하여 pre=trained VQ-Wav2Vec 모델을 이용
- SSL overview
- SSL models
- SSL approach 요약
- SSL 데이터셋
SUSing: SU-net for Singing Voice Synthesis
- Singing Voice Synthesis (SVS) 논문으로 SU-net 구조를 이용하여 만든 것이 특징
- WGANSing, BEGANSing 에서 U-net 구조를 사용한 적이 있어서 엄청 새로운 컨셉은 아닌듯
- pre-net에서 music score의 duration 정보를 이용하여 phoneme과 note를 정렬하고 임베딩한 후 concat
- 기존 U-net과 마찬가지로 layer를 지날 때 마다 길이는 반으로 줄고 채널은 2배, 업샘플링 시 skip connection을 이용해서 low-level의 정보를 흘려주는 것은 동일
- autoregressive 방식으로 mel-spectrogram segment와 embedding vector를 concat하여 SU-net의 입력으로 사용
- Pooling 시, global pooling이 아닌 stripe pooling을 사용
  - 특정 frequency에 대한 시간 축의 long-term dependency를 배울 수 있음
  - 특정 시점에 대하여 주파수 축의 harmonic 간의 관계를 학습할 수 있음
  - 각각 local 한 정보를 배우면서 관계 없는 정보를 고려하지 않아도 된다는 장점이 있음
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
- https://transpeech.github.io/
- speech-to-speech translation (S2ST) system을 제안
- S2ST에서는 같은 content에 대하여 text translation에서 사용하는 language token과 SSL representation 사이에 차이가 발생
- 이유는? speaker identity, rhythm, pitch, and energy 때문에!
- 이 논문에서는 Bilateral Pertubation(BiP) 기술을 제안 :
- CTC loss를 이용하여 SSL model을 fine-tune하는 기법
- acoustic 특성에 관계 없이 deterministic representation을 생성하기 위함
- information enhancement에서는 content 정보를 유지한 채 random resampling, formant shifting, pitch randomization 등을 이용하여 variation이 다양한 speech $\hat{S}$ 를 생성하여 HuBERT를 fine-tune
- Style normalization에서는 피치를 shifting하고 energy를 normalization하여 평균 acoustic condition을 가지는 $\bar{S}$ speech를 생성 하여 HuBERT를 fine-tune
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
- 코드 URL : https://github.com/PaddlePaddle/PaddleSpeech
- speech 관련 툴킷
- 바이두에서 공개

hollobit · 2022-05-29T12:46:23Z

인공지능이 무기 된 시대, 중점 육성할 3가지 AI 기술은

https://www.sciencetimes.co.kr/news/%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5%EC%9D%B4-%EB%AC%B4%EA%B8%B0-%EB%90%9C-%EC%8B%9C%EB%8C%80-%EC%A4%91%EC%A0%90-%EC%9C%A1%EC%84%B1%ED%95%A0-3%EA%B0%80%EC%A7%80-ai-%EA%B8%B0%EC%88%A0%EC%9D%80/

SW정책연구소 보고서…"지능형반도체·자율무기·GAN"

"AI는 기존 인간의 능력 이상으로 모방과 위변조, 전략적 판단 등을 수행함으로써 적대 세력이 완성도 높은 AI를 보유했을 경우 국가
안보에 매우 큰 위협"

미국 AI국가안보위원회(NSCAI)11)는 자율무기 금지 반대, 예산투자촉구(연 350억불), 반도체 자국생산 추진 등의 성명을 발표(`21.03.)

중국은 「중국제조2025(15)」를 통해 반도체 자급률 제고 및 「차세대 AI 발전계획(17)」을 토대로 미국을 제치고 AI 세계 1위를 목표

세계 정부, 협‧단체, 학계는 AI의 무분별한 활용 및 비인도적 차원의활용 금지를 표방하며 최근 AI 윤리 및 신뢰성 이슈를 중요 과제로 선정

Large Language Models are Zero-Shot Reasoners - 신비로운 LLM

Simply adding “Let’s think step by step” before each answer

GPT-3로 MultiArith 벤치마크에서 17.7%에서 78.7%로, 수학문제 데이터인 GSM8K 벤치마크 데이터셋에서 10.4% 에서 40.7%로 성능 향상 되었다고

Autoformalization with Large Language Models

몇 가지 예를 보여주고 나면 대형 언어 모델은 자연어 수학 문장을 formal specifications으로 변환할 수 있음

Google blog - Language Models Perform Reasoning via Chain of Thought

https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html?fbclid=IwAR3zuHCSvAd5TUi7Aa-HiFy-c4739GzMkzpbMLicuFhKekOlZkfiIdNp62E
Chain of Thought Prompting Elicits Reasoning in Large Language Models - https://arxiv.org/abs/2201.11903
52주차 최승준 님의 소개 - [20220522] Weekly AI ArXiv 만담 - 52회차 #52 (comment)

jungwoo-ha closed this as completed Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20220529] Weekly AI ArXiv 만담 - 53회차 #53

[20220529] Weekly AI ArXiv 만담 - 53회차 #53

jungwoo-ha commented May 29, 2022 •

edited

Loading

ghlee0304 commented May 29, 2022 •

edited

Loading

hollobit commented May 29, 2022 •

edited

Loading

[20220529] Weekly AI ArXiv 만담 - 53회차 #53

[20220529] Weekly AI ArXiv 만담 - 53회차 #53

Comments

jungwoo-ha commented May 29, 2022 • edited Loading

ghlee0304 commented May 29, 2022 • edited Loading

Arxiv

hollobit commented May 29, 2022 • edited Loading

인공지능이 무기 된 시대, 중점 육성할 3가지 AI 기술은

Large Language Models are Zero-Shot Reasoners - 신비로운 LLM

jungwoo-ha commented May 29, 2022 •

edited

Loading

ghlee0304 commented May 29, 2022 •

edited

Loading

hollobit commented May 29, 2022 •

edited

Loading