Skip to content

Latest commit

 

History

History
70 lines (47 loc) · 4.39 KB

2.High_level_overview_of_ChatGPT.md

File metadata and controls

70 lines (47 loc) · 4.39 KB
theme _class paginate backgroundColor backgroundImage
gaia
lead
true

bg left:100% 100%


GPT: Generative Pre-trained Transformers

are a type of deep learning model used to generate human-like text. Common uses include

  • answering questions
  • summarizing text
  • translating text to other languages
  • generating code
  • generating blog posts, stories, conversations, and other content types.

Before GPT

  • The current AI revolution for natural language only became possible with the invention of transformer models, starting with Google's BERT in 2017. Before this, text generation was performed with other deep learning models, such as recursive neural networks (RNNs) and long short-term memory neural networks (LSTMs). These performed well for outputting single words or short phrases but could not generate realistic longer content.

  • BERT's transformer approach was a major breakthrough since it is not a supervised learning technique. That is, it did not require an expensive annotated dataset to train it. BERT was used by Google for interpreting natural language searches, however, it cannot generate text from a prompt.

  • 小模型至上阶段(2015年以前)2015年以前,小模型被认为是理解语言的“最先进的技术”。这些小模型更擅长分析型任务,因此被用于从“预测送达时间”到“欺诈信息分类”等各类任务中。然而,对于通用的生成任务来说,它们的表达能力还不够,生成人类水平的文章或代码仍然是白日做梦。


GPT-1

bg left:90% 100%

Transformer Architecture

In 2018, OpenAI published a paper (Improving Language Understanding by Generative Pre-Training) about using natural language understanding using their GPT-1 language model. This model was a proof-of-concept and was not released publicly.

第二波发展浪潮:

规模化竞赛阶段(2015年-今天)谷歌研究院的一篇里程碑式的论文《只要注意力机制就够了》(《Attention is All You Need》),向人们描述了一种用于自然语言理解的新型神经网络架构——Transformers模型(有时翻译为“变换器”模型),它不但能生成质量上乘的语言模型,同时具有更高的可并行性,大大降低了所需的训练时间。

随着模型越来越大,它们开始匹敌人类,然后超越人类。从2015年到2020年,用于训练这些模型的计算量增加了6个数量级,其表现在手写、语音和图像识别、阅读理解和语言理解方面超过了人类的基准水平。

GPT-2

bg left:90% 100%

Model performance on various tasks

The following year, OpenAI published another paper (Language Models are Unsupervised Multitask Learners) about their latest model, GPT-2. This time, the model was made available to the machine learning community and found some adoption for text generation tasks. GPT-2 could often generate a couple of sentences before breaking down. This was state-of-the-art in 2019.

GPT-3

bg left:90% 100%

Results on three Open-Domain QA tasks

In 2020, OpenAI published GPT-3 model. The model had 100 times more parameters than GPT-2 and was trained on an even larger text dataset, resulting in better model performance. The model continued to be improved with various iterations known as the GPT-3.5 series.

This version took the world by storm after surprising the world with its ability to generate pages of human-like text. ChatGPT became the fastest-growing web application ever, reaching 100 million users in just two months.

第三波发展浪潮:

更好、更快、更便宜阶段

(2022年之后)首先是计算成本开始下降。新的技术,如扩散模型,缩减了训练和运行推理所需的成本。与此同时,研究学界也在持续开发更好的算法与规模更大的模型。 对于那些一直渴望使用大型语言模型(LLM)的开发人员来说,探索和应用开发的大门已经打开,基于这些技术的应用开始大量涌现。


bg left:90% 100%

//Share Video