# Language Model (LM)
Language Model gives the probability distribution over a sequence of tokens.

![LMs can 'Generate' Text](Images/LM.png)

# 'Large' Language Models
The 'Large' in terms of model's size (number of parameters) and massive size of training dataset.

![LLMs in AI Landscape](Images/LLM.png)

## LLMs Evolution

[Evolution of LLMs](https://synthedia.substack.com/p/a-timeline-of-large-language-model)

- Google introduced Transformer architecture in 2017 in "Attention Is All You Need" paper. It also released BERT model in 2018 for language translation which achieved SOTA (State Of The Art) on 11 NLP tasks. But due to its large parameters size at that time, small models like DistilBERT, TinyBERT, MobileBERT based on BERT were proposed. BERT is an Encoder-only model. This started the beginning of use of Transformer as Language Representation Models.

- OpenAI published "Improving Language Understanding by Generative Pre-Training" in 2018 and released GPT-1(117M, 512 tokens) model which used Decoder-only architecture. It also introduced the idea of generative pre-training over large corpus.

- OpenAI published "Language Models are Unsupervised Multitask Learners" paper in 2019 and also released GPT-2(1.5B, 1024 tokens) model.

- Google developed T5 (Text-To-Text Transfer Transformer) model in 2019. It used Encoder-Decoder architecture.

- Meta released RoBERTa model and published "RoBERTa: A Robustly Optimized Pretraining Approach" paper in 2019. It found that BERT was significantly undertrained.

- Meta also released XLM model along with "Cross-lingual Language Model Pretraining" paper. It proposed methods to learn cross-lingual language models (XLMs). It obtained SOTA on cross-lingual classification, and unsupervised and supervised machine translation.

- OpenAI released GPT-3(175B) model and "Language Models are Few-Shot Learners" in 2020. It observed the phenomana of In-context learning. OpenAI stopped open-sourcing.

- Google released PalM model and "PaLM: Scaling Language Modeling with Pathways" paper in 2022. It stopped open-sourcing.

- Meta released OPT model and "OPT: Open Pre-trained Transformer Language Models" in 2022. It is a suite of Decoder-only pre-trained transformers ranging from 125M to 175B parameters. It promotes open-sourcing.

![Models in 2023](Images/2023Models.png)

LLMs show emergent capabilities, not observed previously in ‘small’ LMs.
- In-context learning: A pre-trained language model can be guided with only prompts to perform different tasks (without separate task-specific fine-tuning).
- In-context learning is an example of emergent behavior.

LLMs are widely adopted in real-world.
- Research: LLMs have transformed NLP research world, achieving state-of-the-art performance across a wide range of tasks such as sentiment classification, question answering, summarization, and machine translation.
- Industry: Here is a very incomplete list of some high profile large language models that are being used in production systems:
  - [Google Search](https://blog.google/products/search/search-language-understanding-bert/) (BERT)
  -  [Facebook content moderation](https://ai.meta.com/blog/harmful-content-can-evolve-quickly-our-new-ai-system-adapts-to-tackle-it/) (XLM)
  - [Microsoft’s Azure OpenAI Service](https://news.microsoft.com/source/features/innovation/new-azure-openai-service/) (GPT-3/3.5/4)

With tremendous capabilities, LLMs’ usage also carries various risks.
- Reliability & Disinformation: LLMs often hallucinate – generate responses that seem correct, but are not factually correct.
  - Significant challenge for high-stakes applications like healthcare
- Social bias: Most LLMs show performance disparities across demographic groups, and their predictions can enforce stereotypes.
  - P(He is a doctor) > P(She is a doctor.)
  - Training data contains inherent bias
- Toxicity: LLMs can generate toxic/hateful content.
  - Trained on a huge amount of Internet data (e.g., Reddit), which inevitably contains offensive content
  - Challenge for applications such as writing assistants or chatbots
- Security: LLMs are trained on a scrape of the public Internet - anyone can put up a website that can enter the training data.
  - An attacker can perform a data poisoning attack.
Credits: https://stanford-cs324.github.io/winter2022/