-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[20230219] Weekly AI ArXiv 만담 시즌2 - 6회차 #72
Comments
Non-arxiv
Arxiv
|
삶의 목적을 찾는 45가지 방법 ChatGPT와 파파고가 쓴 첫 책이 나왔습니다. 일러스트 또한 Shutterstock AI를 통해 생성되었습니다. Symbolic Discovery of Optimization Algorithms 구글에서 AutoML을 통해 새로운 Optimizer 알고리즘을 제안했습니다. |
재밌는 논문 이미 보셨을 수도 있지만 공유합니다. @channel 이번주 목요일 오늘이죠. ChatGPT와 그 이상의 소형 모델 스터디 하실분 찾습니다. 시간은 하는 분들에 따라서 협의 합니다. 오후 2시 5시 11시가 일단 후보입니다. 저는 4시와 6시반에 일정이 있습니다. InstructGPT : Training language models to follow instructions with human feedback 블로그 포스팅 : https://towardsdatascience.com/the-new-version-of-gpt-3-is-much-much-better-53ac95f21cfb Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan Download PDF We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, efficiently improving our datasets and models. Finally, we investigate the robustness of RLHF training, and identify a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization. Alongside our main results, we perform peripheral analyses on calibration, competing objectives, and the use of OOD detection, compare our models with human writers, and provide samples from our models using prompts appearing in recent related work. Multimodal Chain-of-Thought Reasoning in Language Models Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola Download PDF Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. With Multimodal-CoT, our model under 1 billion parameters outperforms the previous state-of-the-art LLM (GPT-3.5) by 16 percentage points (75.17%->91.68% accuracy) on the ScienceQA benchmark and even surpasses human performance. Code is publicly available available at this https URL. BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining, by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. News! • BioGPT-Large model with 1.5B parameters is coming, currently available on PubMedQA task with SOTA performance of 81% accuracy. See Question Answering on PubMedQA for evaluation We enhance auto-regressive language models by conditioning on document chunks retrieved from a |
Within 7 days Conferences
The text was updated successfully, but these errors were encountered: