[20220807] Weekly AI ArXiv 만담 - 61회차 #61

jungwoo-ha · 2022-08-06T04:27:30Z

News

Conference
- NeurIPS 2022: Rebuttal & Author-Reviewer Period
- AAAI 2023: ABS 8.8 (AoE), Full 8.15 (AoE)
- KCCV 2022: 8월 8일 ~ 10일 코엑스 오디토리움
AI만든 정책을 중심으로 한 정당이 의회선거에? 덴마크의 Synthetic Party
- 국내뉴스: https://n.news.naver.com/mnews/article/092/0002264669?sid=105
CVPR 2022 저자 / 기관별 순위
AI미래포럼 금융AI 웨비나: AI 금융을 재정의 하다
- 8월 11일, 오전 10시 ~ 11시 30분
클로바 스튜디오(하이퍼클로바) 신청 계속 받고 있습니다

ArXiv

Blender Bot 3.0: OPT-175B 로 만든 safety까지 개선시켜나가는 챗봇
- 메타의 Blender Bot 시리즈 세번째
- 3가지 크기 중 30B와 175B는 OPT를 활용함. OPT만으로 만들어진 건 아님
- 2.0처럼 Internet search 와 Long-term 메모리 활용
- Deployment 이후 성능 개선과 Safety를 집중 고려
- OPT와 동일하게 코드, 데이터, 모델 모두 공개 (큰모델은 API, 작은 모델은 체크포인트)
[흥미있는 논문]
- Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
  - 매우 다양한 소속 조합의 저자들, Causal inference 연구에 관심 있는 분들은 공부하는 느낌으로 (TACL 억셉)
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  - Text-to-image에서 이미지를 활용 inversion 스타일로 입력찾기?

ghlee3401 · 2022-08-06T15:34:21Z

Arxiv

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation
1. Keyword : INTERSPEECH2022 / Amazon Alexa, TTS Research / TTS, data augmentation
2. Problem : 다양한 언어에 대하여 expressive style을 가진 데이터를 얻는 것이 어렵고 비용이 많이 듦
3. Method : low-resource target dataset을 학습하기 위하여 VC 를 이용하여 data augmentation을 수행
  - target speaker에 대하여 1시간 분량의 conversational speech 밖에 없음
  - target speaker와 같은 국가, 성별을 가지는 neutral style의 supporting speaker 8~10시간 분량 존재
  - 모든 모듈 (VC, F0 predictor, TTS)은 각 스텝별 모든 데이터를 학습하고 target으로 fine-tune을 함
  - Step1) VC 모델인 Copycat을 수정하여 F0 trajectory들을 이용하는 VC 모델을 학습
  - Step2) F0 Predictor를 학습하여 F0 trajectory들을 예측하는데 사용 (VC inference 시 사용)
  - Step3) TTS 모델을 target speaker data와 augmented data를 이용하여 학습
4. DataSet
  - target speaker : 9 speakers (5 females, 4 males) in 5 difference locales (캐나다 불어, 불어, 이탈리아, 독일, 스페인)
  - supporting speaker : 각각 같은 국가 같은 성별의 neutral dataset이 8~10시간 씩 있음
5. Results :
  - 실험 결과 9개의 화자 중 5개가 유의미한 결과를 보임
  - Canadian French와 German Female을 제외하고 성능이 향상됨을 보여줌
  - 왜 1시간으로 학습한 baseline 모델이 아니라 Recording을 넣어놨을까?
  - F0 trajectories (Blue)를 넣어 합성한 speech의 F0 (Orange)를 비교
6. Some Comments :
  - 이 논문에서는 VC 모델을 이용하되, F0를 조절가능한 VC를 이용했다는 것이 핵심
  - 단, g2p는 어떻게 했는지, target과 supporting speaker의 문장이 pair로 있는지는 적혀 있지 않음
  - 나라별로 cross해서 만들지는 않은 것으로 보이고 speaker 정보가 나라 정보를 포함할 것으로 보임
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
1. Keyword : INTERSPEECH2022 / ByteDance AI Lab / TTS
2. Samples : https://p1ping.github.io/RI-TTS/
3. Problem : 광둥어(Cantonese)의 일반 대화에서는 intonation이 다양한데 기존의 TTS는 phoneme을 입럭으로 받기 때문에, 문맥 정보의 손실 때문에 intonation을 제대로 만들어내기 어려움
  - 평서문을 의문문처럼 끝을 올리는 경우와 실제 의문문은 차이가 있음
  - 실제 의문문은 평서문처럼 읽어도 질문으로 들릴 수 있지만, 평서문의 경우 끝의 intonation을 올려주어야 함
4. Method : 이 논문에서는 BERT 기반의 statement/question classifier를 이용한 Cantonese TTS (CanTTS) 모델을 제안
  - sentence type에 대한 label을 명시적으로 제공
  - Tacotron2를 기반으로 모델을 만들었고 Sentence encoder에서 pre-trained BERT를 사용
  - Classifier에서는 세 개의 category (statement, normal question, declarative question) 을 사용
  - ImpJoint : classifier를 사용하지 않고 sentence embedding만 사용
  - ExpJoint : classifier를 사용하고 TTS와 함께 학습
  - ExpSep : classifier를 사용하되 TTS와 따로 학습
  - ExpSep(FT) : ExpSep 에서 BERT 파라미터를 fine-tuning
  - ExpSep(GT) : sentence encoder없이 (BERT 없이) sentence label을 직접 주는 방식
5. DataSet
  - single-speaker Cantonese 12,010 utterances (10,000 statements, 2,000 questions(normal, declarative 반반))
6. Results
  - statement의 경우 F0 Frame Error (FFE), MOS, Perception accuracy를 보면 ExpSep(GT/FT)를 제외하고 ExpJoint가 좋거나 Tacotron2와 비슷한 모습을 보여준다.
  - 특히 DecQue에 대하여 Tacotron2가 제대로 arising intonation을 만들어내지 못하는 것을 알 수 있다.
  - 대체로 sentence정보를 분리하여 모델링하는 ExpSep(GT/FT)가 가장 좋은 성능을 보이는 것을 알 수 있다.

terryum · 2022-08-07T11:17:57Z

Discussions in ML subreddit

"AI연구에서 아직 풀리지 않은 가장 중요한 문제들" (7/20)
[D] Most important unsolved problems in AI research

Compositionality
: 심볼릭 연산 등으로 다양한 지식으로 파생할 수 있는 지식을 얻는 것.
Multimodality
: 여러 인풋들(e.g. 비전, 음성, 텍스트)을 섞어서 판단하는 것
Ability to match knowledge to context
: 문맥에 따라 판단하는 것
Uncertainty awareness
: 불확실성에 따라 유연하게 대응할 수 있게 하는 것
Catastrophic forgetting
: 지속적으로 지식이 쌓이지 않음
Enabling robust continuous learning in deployment
: 학습과 배포, 지속적 개선이 분절되어 있음
중간 레벨의 AI 개발 Figuring out an approach for the messy middle.
: 좁은 영역의 저레벨 태스크(e.g. 분류)나 극단의 자유도가 주어진 고레벨 태스크(e.g. 로봇RL)은 잘하는데, 알잘딱의 효율적인 중간레벨 AI개발이 참 어려움
해석성 Explainability
: AI가 왜 그런 판단을 내렸는지 숨겨진 요소들을 사람이 이해할 수 있게 하는 것
가치 정렬 Alignment
: 인간이 느끼는 가치와 AI 성능이 정확히 일치하지 않음
에너지 효율 Energy efficiency
: 인간 두뇌에 비해 GPU는 너무나 많은 전력을 필요로 함

"ML커뮤니티는 너무 긍정적인 결과들에만 편향되어 있는 것 같다" (8/4)
[D] The Machine Learning Community is totally biased to positive results.

: 글 내용은 거의 냉무. 댓글들 반응은 "그거 ML커뮤니티만 그런거 아냐. 모든 곳이 다 그래"

[테리 생각] 만약 세상에 논문이 나오는 속도만큼 계속적으로 ML에 진전이 있었다면 정말 세상이 엄청 진일보 했을 것이다. 하지만 breakthrough를 하는 pioneer적인 논문은 극소수에 불과하고, 마치 saturated 된 그래프가 무한급수로 수렴하듯 아주 자잘한 스텝들이 필요 이상으로 많이 나오는 것 같아 안타깝다.
이쯤 되면 '우리는 왜 연구를 하는가?'부터 '논문이 이만큼이나 필요한가?', '논문 말고 다르게 기여하는 적절한 방법은 무엇일까?' 등에 대해 논의해봐야 하지 않나 싶다. 세상에 이렇게 많은 사람들이 논문 제출에 목을 매고 이렇게 많은 논문들이 출판되어야 하는 이유가 없지 않나..?
이건 사실 레딧의 다른 글 "왜 ML연구들은 다 실험논문만 많죠?"라는 것과도 연결되는데, "잘되면 장땡"인 시대를 지나 진짜 진리를 추구하는 시대가 와야하지 않을까 싶다. 그러려면 학계의 지식 전파 방법과 평가 metric부터 바꿔야 할 것 같다.

Articles in Techcrunch

Theator는 수술비디오를 분석해주는 AI플랫폼인데, 약 500억($39.5M)의 시리즈A 투자를 받았다 (7/22)
Theator, an AI platform that analyzes surgery videos, closes out its Series A at $39.5M
Drover AI는 Vision AI를 통해 전동킥보드가 도보에서 다니지 못하도록 한다 (7/27)
Drover AI is using computer vision to keep scooter riders off sidewalks
Seedtag은 쿠키를 사용하지 않는 AI기반 광고기술로 약 3200억($250M) 투자를 받았다 (7/27)
Seedtag, the ex-Googler-founded, cookie-free, AI-based adtech startup, taps $250M+ in funding

(기타) 피부진단 스타트업, 룰루랩 200억원 규모의 시리즈C 투자 유치 (7/13)

hollobit · 2022-08-07T11:32:09Z

딥마인드의 알파폴드가 2억개의 단백질 구조 데이터베이스를 오픈 (7/28)

https://www.deepmind.com/blog/alphafold-reveals-the-structure-of-the-protein-universe
https://alphafold.ebi.ac.uk/
https://www.deepmind.com/research/highlighted-research/alphafold
https://asia.ensembl.org/info/docs/tools/vep/index.html

50년간 인류가 실험적으로 밝혀낸 단백질 구조 갯수인 약 20만개 보다 1000배 많은 숫자
2020년 11월 단백질 구조 예측 대회(CASP)에서 1등, 2021년 7월 22일 첫번째 릴리즈 100만 구조 데이터 오픈소스로 공개, 2022년 7월 28일에는 2억개 (지구상의 알려진 생명체의 거의 모든 단백질 구조)
현재까지 생명공학 분야에서 인공지능으로 이뤄낸 가장 기념비적 성과
이 업데이트에는 식물, 박테리아, 동물 및 기타 유기체에 대한 예측된 구조가 포함
190개국에서 500,000명 이상의 연구원이 AlphaFold DB에 액세스하여 2백만 개 이상의 구조를 확인
2억 개 이상의 모든 구조는 Google Cloud Public Datasets 를 통해 대량 다운로드가 가능 하여 전 세계 과학자들이 AlphaFold에 더욱 쉽게 액세스

TIKTOK, AI 음악 제작 및 머신 러닝 채용 붐

https://www.musicbusinessworldwide.com/tiktok-goes-on-ai-music-making-and-machine-learning-specialist-hiring-spree1/

AI 기반의 음악 제작 사업의 급성장
Splice 는 CoSo, Bandlab의 SongStarter, ByteDance가 인수한 Jukedeck에서는 기계 학습 기반 음악 제작 앱인 Mawf를 출시, 중국에서 'Sponge Band'라는 음악 제작 앱도 출시
TikTok과 모회사 ByteDance는 미국과 중국에서 머신 러닝 및 AI 음악 제작 분야에서 고도로 숙련된 전문가를 다수 채용 중

인공 지능이 천문학을 변화시키는 방법

https://astronomy.com/news/2022/07/how-artificial-intelligence-is-changing-astronomy

ML 기반으로 별의 광도 곡선을 이용한 외계행성 검색, ML을 사용하여 암석 행성의 물, 얼음 및 눈을 감지, 가짜 은하 사진

"The Top 100 Healthcare Technology Companies of 2022" 중 12개의 머신러닝 관련

https://thehealthcaretechnologyreport.com/the-top-100-healthcare-technology-companies-of-2022/

1. Axtria, 12. Leica Biosystems. 16. Tempus, 43. Augmedix, 48. 98point6, 52. LeanTaaS, 54. InterVenn Biosciences, 56. Quanticate, 58. Kaia Health, 62. Accelerate Diagnostics, 71. Qventus

Gartner, 디지털 광고에 혁신적 영향을 미칠 것으로 예상되는 4가지 신기술

https://www.gartner.com/en/newsroom/press-releases/2022-08-03-gartner-identifies-four-emerging-technologies-expected-to-have-transformational-impact-on-digital-advertising
https://venturebeat.com/2022/08/05/gartner-research-2-types-of-emerging-ai-near-hype-cycle-peak/

광고에서 AI 역할의 증가 : AI for marketing, Emotion AI, Influence engineering, Generative AI
감성 AI: AI 기술을 사용하여 컴퓨터 비전, 오디오/음성 입력, 센서 및/또는 소프트웨어 로직을 통해 사용자의 감정 상태를 분석. 마케터와 광고주에게 감정 데이터에 대한 액세스는 콘텐츠를 테스트 및 개선하고, 디지털 경험을 맞춤화하고, 사람과 브랜드 간의 더 깊은 연결을 구축하는 데 도움이 되는 동기 부여 동인에 대한 통찰력을 제공
제너레이티브 AI는 기존 아티팩트에서 학습하여 비디오, 내러티브, 음성, 합성 데이터 및 반복 없이 교육 데이터의 특성을 반영하는 제품 디자인과 같은 새롭고 사실적인 아티팩트를 생성합니다. 향후 2~5년 내에 디지털 광고에서 주류로 채택될 것으로 예상

아마존은 미국 1차 진료에 어떻게 접근할까 ? (그냥 AI/ML 이야기는 아닌 뉴스)

https://hbr.org/2022/08/how-will-amazon-approach-u-s-primary-care
https://press.aboutamazon.com/news-releases/news-release-details/amazon-and-one-medical-sign-agreement-amazon-acquire-one-medical

아마존이 7월21일 One Medical을 39억 달러에 인수 후의 흐름은 ?

의료인 모으는 카카오헬스케어, 기술연구소 설립...신수용 교수 영입

https://www.techm.kr/news/articleView.html?idxno=99765&fbclid=IwAR0__UOsuMJ9pNxSZ_i5KLoGi2xg-4tnYzzlPusJIIWRWxsXHQHHI870EM8

veritas9872 · 2022-08-07T12:46:27Z

Generative Multiplane Images: Making a 2D GAN 3D-Aware

Blog: https://xiaoming-zhao.github.io/projects/gmpi/
GitHub: https://github.com/apple/ml-gmpi
Arxiv: https://arxiv.org/abs/2207.10642

547.mp4

흥미로운 뉴스:
구글에서도 code completion이 도움이 된다고 합니다.
https://ai.googleblog.com/2022/07/ml-enhanced-code-completion-improves.html

Strang 교수님께서 가르치는 MIT 선형대수 및 미적분학 수업에서 딥러닝을 적용해 문제를 푸는 방법이 PNAS에 기재되었습니다.
https://www.pnas.org/doi/full/10.1073/pnas.2123433119

PyTorch와 OpenVINO 호환성을 높이기 위해 PyTorch ONNX RunTime (PyTorch-ORT) 라이브러리가 오픈소스되었습니다. 학습 중에도 사용할 수 있는 것으로 보이는데 많은 도움이 될 것 같습니다.
https://medium.com/openvino-toolkit/streamline-pytorch-with-intel-openvino-integration-with-torch-ort-a098358ef2e2
https://github.com/pytorch/ort

Deep Learning Drizzle이라는 자료 모음 공유해드립니다.
https://deep-learning-drizzle.github.io/

nick-jhlee · 2022-08-07T13:15:07Z

(Terry님의 코멘트에 추가로..)

NeurIPS Workshop: I Can't Believe It's Not Better (ICBINB)

http://icbinb.cc
2020부터 꾸준히 있어온 workshop:
- 2020 Bridging the gap between theory and empiricism in probabilistic machine learning
- 2021 "beautiful" ideas that should have worked
- 2022 Understanding Deep Learning Through Empirical Falsification
promoting the idea that there is more to machine learning research than tables with bold numbers
갠적으로 비젼이 특히 와닿은 워크샵이었슴당

cf. parody from

nick-jhlee · 2022-08-07T13:21:48Z

뉴스

CKAIA 2022 (한국인공지능학회)
- 08/01-08/03
- 오랜만에 대면 (@제주신화월드)
- tutorials, workshops, poster session
Journal track!
- allow authors of JMLR papers published after 01/01/2022 to elect to present their work at one of these 3 ML conferences, alongside other poster presentations. Eligible papers must not be extensions of a previous conference paper.
- inspired from TACL
- NeurIPS 2022, ICLR 2023, ICML 2023
- https://twitter.com/hugo_larochelle/status/1554795396941201408?s=21&t=NCwQZ6Z4gzsnrMMM2wNKxQ
- TMLR도 그렇고, 나름 (문제가 많은) review system이나 conference를 바꾸려고 하는 의도가 좋은 것 같습니당

Research

Cracking nuts with a sledgehammer: when modern graph neural networks do worse than classical greedy algorithms
- 제목 그대로
- "The greedy algorithm is faster by a factor of 10^4 with respect to the GNN for problems with a million variables. We do not see any good reason for solving the MIS with these GNN, as well as for using a sledgehammer to crack nuts" ~~savage~~
- deep learning is an overkill, for some tasks!
- another example: Tabular Data: Deep Learning is Not All You Need
On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence
- Yi Ma (UC Berkeley), Doris Tsao (UC Berkeley), Heung-Yeung Shum (International Digital Economy Academy)
- intelligence를 어떤 principle로 설명할 수 있을까? Unified framework?
- Parsimony: what to learn? (Information/Coding Theory)
  - The objective of learning for an intelligent system is to identify low-dimensional structures in observations of the external world and re- organize them in the most compact and structured way.
- Self-Consistency: how to learn? (Control/Game Theory)
  - An autonomous intelligent system seeks a most self-consistent model for observations of the external world by minimizing the internal discrepancy between the observed and the regenerated.
Course on Geometric Deep Learning: updated!
- https://geometricdeeplearning.com/lectures/
- 매우 기대되는 분야! (Molecular modeling, graphs...등 nonEuclidean data에서 deep learning하는 분야)

veritas9872 · 2022-08-07T14:10:38Z

https://pycaret.org Tabular data를 위한 좋은 라이브러리 공유해드립니다.

jungwoo-ha closed this as completed Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20220807] Weekly AI ArXiv 만담 - 61회차 #61

[20220807] Weekly AI ArXiv 만담 - 61회차 #61

jungwoo-ha commented Aug 6, 2022 •

edited

Loading

ghlee3401 commented Aug 6, 2022 •

edited

Loading

terryum commented Aug 7, 2022 •

edited

Loading

hollobit commented Aug 7, 2022 •

edited

Loading

veritas9872 commented Aug 7, 2022 •

edited

Loading

nick-jhlee commented Aug 7, 2022 •

edited

Loading

nick-jhlee commented Aug 7, 2022 •

edited

Loading

veritas9872 commented Aug 7, 2022

[20220807] Weekly AI ArXiv 만담 - 61회차 #61

[20220807] Weekly AI ArXiv 만담 - 61회차 #61

Comments

jungwoo-ha commented Aug 6, 2022 • edited Loading

News

ArXiv

ghlee3401 commented Aug 6, 2022 • edited Loading

Arxiv

terryum commented Aug 7, 2022 • edited Loading

Discussions in ML subreddit

Articles in Techcrunch

hollobit commented Aug 7, 2022 • edited Loading

딥마인드의 알파폴드가 2억개의 단백질 구조 데이터베이스를 오픈 (7/28)

TIKTOK, AI 음악 제작 및 머신 러닝 채용 붐

인공 지능이 천문학을 변화시키는 방법

"The Top 100 Healthcare Technology Companies of 2022" 중 12개의 머신러닝 관련

Gartner, 디지털 광고에 혁신적 영향을 미칠 것으로 예상되는 4가지 신기술

아마존은 미국 1차 진료에 어떻게 접근할까 ? (그냥 AI/ML 이야기는 아닌 뉴스)

의료인 모으는 카카오헬스케어, 기술연구소 설립...신수용 교수 영입

veritas9872 commented Aug 7, 2022 • edited Loading

nick-jhlee commented Aug 7, 2022 • edited Loading

nick-jhlee commented Aug 7, 2022 • edited Loading

뉴스

Research

veritas9872 commented Aug 7, 2022

jungwoo-ha commented Aug 6, 2022 •

edited

Loading

ghlee3401 commented Aug 6, 2022 •

edited

Loading

terryum commented Aug 7, 2022 •

edited

Loading

hollobit commented Aug 7, 2022 •

edited

Loading

veritas9872 commented Aug 7, 2022 •

edited

Loading

nick-jhlee commented Aug 7, 2022 •

edited

Loading

nick-jhlee commented Aug 7, 2022 •

edited

Loading