Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[20230305] Weekly AI ArXiv 만담 시즌2 - 8회차 #74

Open
scene-the-ella opened this issue Feb 27, 2023 · 5 comments
Open

[20230305] Weekly AI ArXiv 만담 시즌2 - 8회차 #74

scene-the-ella opened this issue Feb 27, 2023 · 5 comments

Comments

@scene-the-ella
Copy link

No description provided.

@jungwoo-ha
Copy link
Owner

jungwoo-ha commented Mar 4, 2023

News

2️⃣ ColabFold: making protein folding accessible to all -> (From multiple institutions, 1162 citations) An open-source and efficient protein folding model.

3️⃣ Hierarchical Text-Conditional Image Generation with CLIP Latents -> (From OpenAI, 718 citations) DALL·E 2, complex prompted image generation that left most in awe.

4️⃣ A ConvNet for the 2020s -> (From Meta and UC Berkeley, 690 citations) A successful modernization of CNNs at a time of boom for Transformers in Computer Vision.

5️⃣ PaLM: Scaling Language Modeling with Pathways -> (From Google, 452 citations) Google's mammoth 540B Large Language Model, a new MLOps infrastructure, and how it performs.

2021
1️⃣ Highly accurate protein structure prediction with AlphaFold -> (From DeepMind, 8965) AlphaFold, a breakthrough in protein structure prediction using Deep Learning.

2️⃣ Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -> (From Microsoft, 4810 citations) A robust variant of Transformers for Vision.

3️⃣ Learning Transferable Visual Models From Natural Language Supervision -> (From OpenAI, 3204 citations) CLIP, image-text pairs at scale to learn joint image-text representations in a self supervised fashion

4️⃣ On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? -> (From U. Washington, Black in AI, The Aether, 1266 citations) Famous position paper very critical of the trend of ever-growing language models, highlighting their limitations and dangers.

5️⃣ Emerging Properties in Self-Supervised Vision Transformers -> (From Meta, 1219 citations) DINO, showing how self-supervision on images led to the emergence of some sort of proto-object segmentation in Transformers.

2020
1️⃣ An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale -> (From Google, 11914 citations) The first work showing how a plain Transformer could do great in Computer Vision.

2️⃣ Language Models are Few-Shot Learners -> (From OpenAI, 8070 citations) GPT-3, This paper does not need further explanation at this stage.

3️⃣ YOLOv4: Optimal Speed and Accuracy of Object Detection -> (From Academia Sinica, Taiwan, 8014 citations) Robust and fast object detection sells like hotcakes.

4️⃣ Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer -> (From Google, 5906 citations) A rigorous study of transfer learning with Transformers, resulting in the famous T5.

5️⃣ Bootstrap your own latent: A new approach to self-supervised Learning -> (From DeepMind and Imperial College, 2873 citations) Showing that negatives are not even necessary for representation learning.

ArXiv

@gyunggyung
Copy link

gyunggyung commented Mar 4, 2023

LLaMA 이야기.

최근 페이스북이 공유한 LLaMA의 더 최신 이야기 입니다.

  1. LLaMA-7B: 체크포인드를 공개해주셨습니다. 어떻게 만드신 건지 아직 모르겠습니다.
  2. llama-up-data: hunkim님께서 LLaMA로 챗봇을 만들어주셨습니다. 다만 작다 보니 성능은....
  3. llama-int8: 양자화를 잘 해서 3090, 4090에서도 돌아갈 수 있게 만들어주셨습니다. LLaMA INT8 Inference guide
  4. LLaMA 질문: 궁금한 질문을 했지만 누구도 답을 주지 않는....

이미지 관련 모델.

  1. Beating OpenAI CLIP with 100x less data and compute: 100배 적은 데이터로도 좋은 성능을 보여줍니다. 범용성이 높아서 추후에 많이 쓰지 않을까 싶습니다. 심지어 한국어도 잘합니다. 관련해서 질문이 있으시면 제가 대신 전달 드릴 수도 있습니다.
  2. AI Generated Images Are Getting Too Real | Asmongold Reacts: 이제 이미지 생성을 정말 자연스럽게 합니다. 특히 LoRA는 공부를 해봐야겠습니다. 관련 결과를 몇 개 공유합니다.
    image
    image
    image
    image
    image
    image
    image
    image
    image
  3. AI Art is getting too good! Can YOU Tell the Difference?: 인공지능이 그린 것을 찾아보세요!
    image
    image

@veritas9872
Copy link

veritas9872 commented Mar 4, 2023

High-resolution image reconstruction with latent diffusion models from human brain activity
BioArXiv: https://www.biorxiv.org/content/10.1101/2022.11.18.517004
Website: https://sites.google.com/view/stablediffusion-with-brain/

image

어제부터 트위터에서 크게 화제가 된 논문을 공유해드립니다. 뇌의 fMRI 신호에서 L2 regularized linear model(???)을 학습해 stable diffusion의 image와 text latent encoding에 맞추는 모델을 만들었을 때 대상에게 보여준 영상과 유사한 영상을 복원할 수 있다는 것을 보여준 연구입니다.

각 대상마다 수천장의 영상이 있어야하며 한 모델은 한 대상에게만, 그리고 아마 한 장치에 대해서만 사용 가능할 것으로 예상하지만 뇌파 정보에서 딥러닝 학습을 이용하지 않고 pretrained model과 linear model 학습만을 통해 복원 가능하다는 것을 보여주어 파급력이 매우 클 것으로 생각됩니다. 다만, 재현성을 확인해야지만 신뢰 가능할 것 같습니다.

Dropout Reduces Underfitting
ArXiv: https://arxiv.org/abs/2303.01500
GitHub: https://github.com/facebookresearch/dropout

image

image

image

Consistency Models
ArXiv: https://arxiv.org/abs/2303.01469

image

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
ArXiv: https://arxiv.org/abs/2303.01037

Full Stack Optimization of Transformer Inference: a Survey
ArXiv: https://arxiv.org/abs/2302.14017

Transformer 모델의 최적화에 대한 하드웨어 및 소프트웨어 최적화 및 이슈에 대해 잘 정리된 survey paper 공유해드립니다.

@ghlee3401
Copy link

ghlee3401 commented Mar 5, 2023

Arxiv

  • WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
    • Keyword : ASR
    • Code : https://github.com/m-bain/whisperX
    • Affiliation : Visual Geometry Group, University of Oxford
    • Whisper의 경우 길이가 긴 오디오 전사 (long audio transcription) 에서 성능이 저하됨
    • sliding window 방식의 경우, audio overlapping 때문에 오디오와 전사간의 불일치가 생길 수 있고, 각 segment에서 단어들이 경계에 놓이게 되어 잘못된 전사를 하게 되는 경우가 생긴다.
    • 이 논문에서는 VAD를 이용하여 active speech region을 구하여 segment를 하고 짧은 segment들은 병합하여 (최대 길이 30초가 되도록) whisper와 phoneme model을 이용하여 align 시킴

    - VAD의 경우 여러 언어에 대해서 robust하게 작동을 하지만, alignment phoneme model은 언어에 따라 추가적인 작업이 필요해 보임

    - phoneme recognition 없이 word-level timestamp를 만들 수 있지만 성능이 떨어짐

  • 흥미로운 연구

@jwlee-neubla
Copy link

  • user simulator를 이용하여 interactive한 text generation 학습
  • generation model은 one-shot에 생성하기 때문에 hallucination과 같은 문제가 있어서 interactive한 방법이 중요함을 강조

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants