# **Part_1_7_Language Models**

In the evolving field of Natural Language Processing (NLP), **Language Models** are foundational in generating, understanding, and analyzing text. From classic statistical models like `n-grams` to advanced Large Language Models (`LLMs`) like those powering modern conversational AI, language models shape applications across search engines, chatbots, summarization, translation, and more. This tutorial covers two main aspects of language models: **n-grams** and **Large Language Models (LLMs)**, along with hands-on exercises using the the **ChatGPT API** and **LangChain** to build sophisticated prompt engineering workflows.

### **Objectives:**

By the end of this notebook, Parham will:
1. Gain an understanding of **n-gram language models**, their structure, and their role in basic text generation and probability-based language modeling.
2. Develop a fundamental knowledge of **Large Language Models (LLMs)** and their powerful role in NLP, including how they interpret and generate text in a human-like manner.
3. Learn to interact with an LLM, specifically through the **ChatGPT API** and **LangChain**.
4. Experiment with **prompt engineering** techniques to customize language model outputs, creating contextually relevant and refined responses.

### **Table of Contents:**
1. Import Libraries
2. Introduction to Language Modeling
3. N-gram Models
   <!-- - Overview and Theory
   - Implementing Unigram, Bigram, and Trigram Models
   - Probability and Smoothing Techniques
   - Applications and Limitations of N-gram Models -->
4. Using ChatGPT API for Language Modeling
   <!-- - Introduction to OpenAI’s ChatGPT API
   - Basic Setup and Request Handling
   - Generating Text and Answering Questions -->
5. LangChain and Prompt Engineering
   <!-- - Overview of LangChain for Building Applications with LLMs
   - Basics of Prompt Engineering: Designing Effective Prompts
   - Experimenting with Prompt Variations to Improve Model Responses
   - Use Cases: Building a Question-Answering Bot, Text Summarizer, or Conversational Agent -->
6. Closing Thoughts

## 1. Import Libraries

In [1]:
import nltk
import numpy
import spacy
from loguru import logger
from nltk.util import ngrams
from collections import Counter
from nltk.corpus import words
import pandas as pd

## 2. Introduction to Language Models

Language modeling is the way of determining the probability of any sequence of words. Language modeling is used in various applications such as Speech Recognition, Spam filtering, etc. Language modeling is the key aim behind implementing many state-of-the-art Natural Language Processing models.

### Methods of Language Modelling
Two methods of Language Modeling:

- **Statistical Language Modelling**: Statistical Language Modeling, or Language Modeling, is the development of probabilistic models that can predict the next word in the sequence given the words that precede. Examples such as N-gram language modeling.

- **Neural Language Modeling**: Neural network methods are achieving better results than classical methods both on standalone language models and when models are incorporated into larger models on challenging tasks like speech recognition and machine translation. A way of performing a neural language model is through word embeddings.

## 3. N-gram Models

### Overview and Theory

An **N-gram** is a contiguous sequence of \( n \) items from a given text or speech sample, where the items can be letters, words, or even base pairs depending on the application. Typically, N-grams are extracted from a large corpus, providing insights into text patterns and dependencies.

For instance, N-grams can be:
- **Unigrams**: Individual words like “This,” “article,” “is,” “on,” and “NLP.”
- **Bigrams**: Word pairs like “This article,” “article is,” “is on,” and “on NLP.”

An **N-gram language model** estimates the likelihood of a word given a specific context or history. For example, a bigram model estimates the probability of each word given the previous word. The model's goal is to predict the next word, capturing dependencies and patterns in language sequences.

**Calculating N-gram Probabilities**

For example, in the sentence **“This article is on...”**, if we want to predict the probability that the next word is “NLP,” this can be represented as:
$$p(\text{“NLP”} | \text{“This”}, \text{“article”}, \text{“is”}, \text{“on”})$$
This probability is part of a conditional probability chain that models the probability of each word in a sentence based on its predecessors.

To generalize, the conditional probability of the \(n\)-th word given the preceding \(n-1\) words can be written as:
$$P(W) = p(w_n | w_1, w_2, ..., w_{n-1})$$
Using the **chain rule of probability**, this probability of a word sequence \( w_1, w_2, ..., w_n \) can be expanded as:
$$P(w_1, w_2, ..., w_n) = \prod_{i=1}^{n} P(w_i | w_1, w_2, ..., w_{i-1})$$

**Markov Assumptions and Simplified Models:**

In practice, language models often simplify this calculation by applying **Markov assumptions**. This assumption posits that the probability of a word depends only on a limited history of previous words, rather than the entire sequence. Specifically, in an \(k\)-gram model, we assume that each word depends only on the previous \(k\) words.

- For a **unigram model** (where \(k = 0\)), each word is considered independently:
  $$
  P(w_1, w_2, ..., w_n) \approx \prod_{i=1}^{n} P(w_i)
  $$

- For a **bigram model** (where \(k = 1\)), each word depends only on the immediately preceding word:
  $$
  P(w_i | w_1, w_2, ..., w_{i-1}) \approx P(w_i | w_{i-1})
  $$

By applying these assumptions, we make the model computationally feasible while still capturing relevant language patterns. This approach allows us to approximate word dependencies and make educated predictions, forming the basis of applications like autocomplete and text generation.

## 4. Using ChatGPT API for Language Modeling

## 5. LangChain and Prompt Engineering

## 6. Closing Thoughts