## A Tiny History of Natural Language Processing

Natural Language Processing (NLP) has evolved significantly over the past few decades. Initially, NLP relied heavily on rule-based systems and statistical methods to understand and generate human language. These early approaches, prominent in the 1980s and 1990s, focused on the syntactic structure of text, using techniques such as n-grams and Hidden Markov Models (HMMs) to model language. However, these methods struggled with capturing the semantic meaning and context of words.

The introduction of word embeddings in the early 2010s, such as Word2Vec and GloVe, marked a significant advancement in NLP. These embeddings allowed for the representation of words in continuous vector space, capturing semantic relationships between words. This shift enabled more sophisticated models, such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, to process sequences of text and maintain context over longer passages. RNNs, in particular, played a crucial role in tasks like language translation and sentiment analysis.

The advent of transformers in 2017 revolutionized NLP by addressing the limitations of RNNs. Transformers, introduced with the Attention is All You Need paper, utilize self-attention mechanisms to process entire sequences of text simultaneously, allowing for better handling of long-range dependencies and parallelization. This led to the development of powerful models like BERT, GPT, and T5, which have set new benchmarks in various NLP tasks by providing a deeper semantic understanding of text.

Transformers have almost entirely supplanted previous approaches because:

1. **Superior Performance:** Models like BERT, GPT, T5, and their successors dominate leaderboards on tasks such as text classification, translation, summarization, and question answering.
2. **Pretraining and Transfer Learning:** Unlike traditional methods that required training separate models from scratch for different tasks, transformers leverage large-scale pretraining on vast text corpora and fine-tune efficiently on specific tasks.
3. **Self-Attention and Contextual Representations:** Transformers provide rich, context-dependent word representations, whereas earlier models like Word2Vec and GloVe generated static embeddings.
4. **Scalability and Adaptability:** With advancements in scaling laws, models can achieve better performance just by increasing their size and training data, an advantage that RNNs and classical machine learning approaches lacked.

There are a few areas where older approaches still exist:

1. **Small Datasets & Low Compute Environments:** Logistic regression, SVMs, and Lasso-penalized models often remain competitive when data is limited or when computational efficiency is a concern.
2. **Domain-Specific Applications:** Some applications, like biomedical text mining, may still rely on domain-specific feature engineering approaches alongside transformers.
3. **Traditional ML for Interpretability:** Some NLP applications in finance, healthcare, and legal fields still favor older methods due to the need for interpretability and robustness.

However, since transformer mdoels for NLP are now so dominant we will focus excusively on them in this class.

## NLP Tasks Instead of Transformer Details

Transformers are more complicated than the CNNs we saw for computer vision so we're not going to dive as deeply into the details.  We will, in Lesson 8 - Transformer Details, learn about some of the nuts and bolts especially the self-attention mechanism that allows tranformers to figure out relationships between words and to understand context.  Mostly, though, we will focus on the applications of transformers.  To this end we'll dive into the open source HuggingFace ecosystem which hosts thousands of NLP models and datasets and makes it quite simple to dive into NLP applications without having to master too much code.  All of the biggest and newest open source transformer mdoels are hosted there including those from Meta, Mistral, and Deepseek.  The only thing keeping us from running the biggest state-of-the-art models will be lack of compute, but we can run their smaller cousins on the GPU in CoCalc's compute server or on a decent gaming GPU.  

## Finetuning a Specialized Model versus Using a Large Language Model

As large language models (LLM) continue to improve, their use as general NLP task solvers via prompting is increasing.  Particularly in situations where we don't have access to a lot of training data.  Our choices for solving an NLP task come down to
1.  Using a LLM via an API (in the cloud) like GPT-4o or Gemini.
2.  Using a LLM model running on local hardware.
3.  Fine-tuning and using a specialized transformer model designed for the task.

For example, for a text-classification task we could choose:

- **LLM via API (GPT-4o, Claude, Gemini, etc.)**
    - When you need **a quick, general-purpose classifier** without training a model.
    - When **zero-shot or few-shot classification** (via prompting) is sufficient.
    - When categories may evolve frequently, making retraining impractical.
    - Example: Categorizing support tickets by topic.

- **Local LLM (LLaMA, Mistral, OpenChat)**
    - When you need to classify text **without sending data to an external API** (e.g., **privacy-sensitive data**).
    - When you need **occasional classification** and want to avoid API costs.
    - Works well for **prompt-based classification** if the model is large enough (e.g., LLaMA-2 13B or Mistral 7B).
    - Example: **Classifying internal legal documents**.

- **Fine-tune BERT / RoBERTa / DistilBERT**
    - When you have a **moderate to large labeled dataset** and need **high accuracy**.
    - When you need **fast inference at scale**, as fine-tuned models are more efficient than large LLMs.
    - When your classification task requires **domain-specific adaptation**.
    - Example: **Sentiment analysis on customer feedback** in a specific industry.

Don't worry if you don't know all those terms yet, especially the various models mentioned such as BERT.  Zero-shot classification means classifying text without seeing any examples - the LLM just gets a prompt with the possible categories.  Few-shot classification means seeing a small number of examples provided in the LLM prompt.  

Here's some thoughts on choosing the right approach for a given NLP task:

- **Use API-based LLMs (GPT-4o, Claude, Gemini, etc.) when**:
  - You need **quick, adaptable solutions** without training.
  - You **don’t have much data** for fine-tuning.
  - Privacy and latency are not major concerns.

- **Use Local LLMs (LLaMA, Mistral, Falcon) when**:
  - You need **private, offline inference**.
  - You want **control over deployment** without external dependencies.
  - **Few-shot learning is sufficient**, and you don’t want to fine-tune.

- **Fine-Tune a Model (BERT, BART, T5, RoBERTa) when**:
  - You have **domain-specific data** and need **high accuracy**.
  - Privacy, cost, or latency concerns prevent LLM use.
  - You require **structured, predictable outputs**.

For each NLP task we study over the next several lessons we'll consider all three approaches.

---

## **2. Named Entity Recognition (NER)**
   - **LLM via API**
     - When extracting **common entities** (people, organizations, locations) without fine-tuning.
     - When you need **few-shot NER** for a task with little training data.
     - Example: Extracting entities from **news articles**.

   - **Local LLM (LLaMA, Mistral, Falcon)**
     - When processing **private data** locally (e.g., **medical records, financial data**).
     - When **privacy laws prevent cloud-based processing**.
     - When you need an **occasional, flexible NER system** (prompt-based).
     - Example: Detecting **medical terms** in patient notes **on-premise**.

   - **Fine-tune BERT / spaCy / RoBERTa**
     - When entity categories are **domain-specific** (e.g., **chemical compounds, legal clauses**).
     - When high **recall and precision** are required.
     - When **scalability** is needed (fine-tuned models can be deployed at scale).
     - Example: **Extracting biomedical terms from research papers**.

---

## **3. Summarization**
   - **LLM via API**
     - When summarizing **general-purpose documents**.
     - When **different summary formats** (e.g., headlines, abstracts, bullet points) are needed.
     - Example: Summarizing **news articles for a daily digest**.

   - **Local LLM (LLaMA, Mistral, Falcon)**
     - When **data privacy** is a concern (e.g., summarizing **internal company reports**).
     - When occasional summarization is needed **without API costs**.
     - Example: Summarizing **legal contracts on an air-gapped machine**.

   - **Fine-tune BART / T5 / Pegasus**
     - When summarization must follow **strict formatting rules** (e.g., **legal, financial**).
     - When **domain-specific adaptation** is required.
     - When **high-volume summarization** needs to be done **efficiently**.
     - Example: Summarizing **medical patient histories into structured notes**.

---

## **4. Text Generation / Completion**
   - **LLM via API**
     - When responses need to be **creative and diverse**.
     - When you want **low-effort, high-quality generation**.
     - Example: AI assistant generating **marketing copy**.

   - **Local LLM (LLaMA, Mistral, OpenChat)**
     - When **privacy-sensitive generation** is needed (e.g., **drafting internal legal memos**).
     - When you want **customized local text generation** with **prompt-based control**.
     - Example: Generating **internal knowledge base articles**.

   - **Fine-tune GPT / LLaMA / T5**
     - When text generation must **strictly adhere to a style or structure**.
     - When outputs must be **factual and controllable**.
     - When **cost or speed of inference** is a concern (fine-tuned models are more efficient).
     - Example: Generating **automated financial reports**.

---

## **5. Question Answering (QA)**
   - **LLM via API**
     - When **answers are based on general knowledge** (e.g., answering FAQs).
     - When handling **open-ended questions**.
     - Example: Answering **customer support queries**.

   - **Local LLM (LLaMA, Mistral, Falcon)**
     - When answers must be generated **without internet access**.
     - When **document retrieval + LLM summarization** is needed **locally**.
     - Example: Answering **internal policy questions** based on company documents.

   - **Fine-tune BERT / T5 / RoBERTa**
     - When answers must come **strictly from specific documents**.
     - When high **precision and efficiency** are needed.
     - Example: Answering **questions based on medical guidelines**.

---

## **6. Retrieval-Augmented Generation (RAG) / Search-Augmented NLP**
   - **LLM via API + Embeddings Search**
     - When external **retrieval and summarization** are needed.
     - Example: Building a **customer support chatbot** that references a knowledge base.

   - **Local LLM (LLaMA, Mistral + FAISS / Chroma)**
     - When **privacy-sensitive document retrieval and response generation** is needed **locally**.
     - Example: Internal **legal or compliance search assistant**.

   - **Fine-tune a Hybrid Model (BERT + RAG)**
     - When **precise retrieval** and **response generation** are both needed.
     - Example: Automating **medical Q&A** from patient records.

---

## **Decision Table: Local LLM vs API vs Fine-Tuned Model**

| Task | LLM via API | Local LLM (LLaMA, Mistral) | Fine-Tune Smaller Model |
|------|------------|-------------------|--------------------|
| **Text Classification** | Few-shot classification | Private, occasional classification | Domain-specific, high accuracy |
| **Named Entity Recognition (NER)** | General entities | Private, flexible extraction | Custom entities, high precision |
| **Summarization** | General text summaries | Private document summaries | Strict format, domain-specific |
| **Text Generation** | Creative text | Private, structured output | Factual, domain-controlled generation |
| **Question Answering** | Open-domain QA | Local knowledge retrieval | High-precision, document-restricted QA |
| **Search-Augmented QA** | External document retrieval | Local document search + LLM | Precision-driven document QA |



---

## **Final Thoughts**
- **Use API-based LLMs (GPT-4, Claude, Gemini, etc.) when**:
  - You need **quick, adaptable solutions** without training.
  - You **don’t have much data** for fine-tuning.
  - Privacy and latency are not major concerns.

- **Use Local LLMs (LLaMA, Mistral, Falcon) when**:
  - You need **private, offline inference**.
  - You want **control over deployment** without external dependencies.
  - **Few-shot learning is sufficient**, and you don’t want to fine-tune.

- **Fine-Tune a Model (BERT, BART, T5, RoBERTa) when**:
  - You have **domain-specific data** and need **high accuracy**.
  - Privacy, cost, or latency concerns prevent LLM use.
  - You require **structured, predictable outputs**.

Would you like recommendations on **hardware requirements** for running local LLMs efficiently?




# Getting Started with Natural Language Processing



In the second half of this course we're going to focus mostly on the applications of natural language processing (NLP) and less on the details of the neural network models, transformers, that power these applications.  For those of you who like the nuts and bolts we will delve into some of the details of transformers in Lesson 9 - Transformer Details.

You'll be reading about various applications in Chapter 1 of our NLP textbook: "Natural Language 
Processing with Transformers."