# An Introduction to Large Language Models (LLMs) for Everyone

Check: <https://jalammar.github.io/illustrated-transformer/>

## A Gentle Welcome to the World of LLMs

Imagine a computer that can read, write, explain, and even think like a human — not perfectly, but surprisingly well. That’s what Large Language Models (LLMs) do.

They power:

- Chatbots like chatgpt, gemini, claude ...
- Summarizing research papers
- Writing code
- Explaining science concepts
- Helping scientists discover new materials or drugs

But how do they work? 

> A language model is a computer program that predicts the next word in a sentence. By this simple mechanism, they can even `simulate` reasoning.

Example:

`"The cat sat on the ____"`

A language model would guess: "mat", "couch", "windowsill", etc.

But it doesn’t just guess — it learns from millions of books, articles, and websites.

:::{exercise}
Try to predict the next word in each sentence:

- "The sun rises in the..." → ?
- "Water boils at..." → ?
- "DNA stands for..." → ?
- "The E=mc² formula was created by..." → ?
- "The E=mc² + AI formula was created by..." → ?

:::

## The Building Blocks - How do LLMs work?

At their core, LLMs are prediction machines. They predict the next word in a sequence. For example, if you give it the sentence, `"The cat sat on the..."`, the LLM will calculate the probabilities of all the words that could come next and choose the most likely one, like `"mat"`.

This seemingly simple task, when performed on a massive scale, allows LLMs to do amazing things like writing essays, translating languages, and even writing computer code.

### The Transformer: The Engine of LLMs

The magic behind modern LLMs lies in a groundbreaking architecture called the **Transformer**. Introduced in a 2017 paper titled "Attention is All You Need," the Transformer revolutionized how we process language. [1] Unlike older models that processed words one by one, the Transformer can look at an entire sentence at once, allowing it to understand the context and relationships between words much more effectively.

**Key Components of a Transformer:**

*   **Embedding:** First, the text is broken down into smaller units called "tokens" (words or parts of words). Each token is then converted into a numerical vector, called an embedding, that captures its meaning.
*   **Positional Encoding:** To understand the order of words, the Transformer adds a special vector to each embedding that indicates its position in the sequence.
*   **Attention Mechanism:** This is the core innovation of the Transformer. It allows the model to weigh the importance of different words in the input when producing an output. For example, in the sentence, `"The robot picked up the ball because it was heavy"`, the attention mechanism helps the model understand that `"it"` refers to the `"ball"` and not the `"robot"`.

### Visualization: The Attention Mechanism

Imagine you're reading a sentence. As you read each word, you're not just looking at that word in isolation. You're constantly making connections to other words in the sentence to understand the overall meaning. The attention mechanism works in a similar way, creating a network of connections between words.

![Attention Mechanism Visualization](https://miro.medium.com/v2/resize:fit:1400/1*--_t3_x5Z7L4U2X4Wv2a2w.gif)
*A simplified animation showing how attention might work. When processing the word "it", the model pays more "attention" to "ball".*

## LLM = Large + Language + Model
Large: Trained on trillions of words (more than all books in the world!)

Language: Understands human language (English, Spanish, code, etc.)

Model: A mathematical system that makes predictions

| Model | Parameters (Trillions) | Approx. Training Data |
|------|------------------------|------------------------|
| GPT-3 | 175 | 300 billion words |
| Llama 3 | 80 | 10 trillion+ words |
| GPT-4 | ~1.8 (estimated) | Massive (not public) |

**Parameters** = "memory cells" that store what the model learned.

More parameters → more knowledge → better predictions.

## The Cutting Edge - What's New with LLMs?

The field of LLMs is moving at an incredible pace. Here are two of the most exciting recent developments:

### Mixture of Experts (MoE): The Power of Specialization

Imagine a team of experts. Instead of one person trying to know everything, you have specialists for different topics. A Mixture of Experts (MoE) model works on a similar principle. It's a neural network architecture that has multiple "expert" sub-networks, each specializing in different parts of the input data. [2]

A "gating network" decides which expert (or combination of experts) is best suited to handle a particular input. This makes the model much more efficient, as only a fraction of the model is used for any given task. This allows for the creation of much larger and more powerful models without a massive increase in computational cost.

#### Visualization: Mixture of Experts

Think of a large company with different departments (experts) like marketing, finance, and engineering. When a new project comes in, the manager (gating network) directs it to the most relevant department.

![MoE Diagram](https://huggingface.co/blog/assets/moe/moe.png)
*Image from Hugging Face, showing how an input token is routed by the gating network to specific experts.*

### Reasoning: Can LLMs "Think"?

A major area of research is improving the reasoning abilities of LLMs. While they are excellent at recognizing patterns in data, they can struggle with tasks that require genuine logical deduction. Researchers are developing techniques to train LLMs to perform multi-step reasoning, which improves their performance on tasks like math problems and programming.

However, it's important to note that the extent of their reasoning abilities is still a topic of debate. Some studies suggest that what appears to be reasoning might actually be a sophisticated form of pattern matching based on their training data. Recent research has shown that while these models can generate detailed "thinking processes," their accuracy can drop significantly as problems become more complex. 

#### Self-Consistency & Verification
New models don’t just give one answer — they check their work.

They generate multiple possible answers
Then pick the most consistent one
Example:
"What is the pH of pure water?"

- Model says: "7"
- It checks: "Yes, neutral, matches known science"
- Confidence: High
This is crucial for scientific accuracy.

### **Simple Exercise 2: Testing Reasoning**

Try giving an LLM a simple logic puzzle. You can use an online LLM interface for this.

> **Prompt:** "If a plane crashes on the border between the United States and Canada, where do they bury the survivors?"

An LLM with good reasoning abilities should be able to identify the trick in the question (you don't bury survivors). This can be a fun way to see how these models handle a bit of wordplay and logic. Note down its response. Does it get it right? How does it explain its answer?

Another example: "If a car travels 60 km/h for 2 hours, how far does it go?"

Old way: "120 km" (guess)

New way (Chain-of-Thought):
- Step 1: Speed = 60 km/h
- Step 2: Time = 2 hours
- Step 3: Distance = Speed × Time
- Step 4: 60 × 2 = 120 km
Final answer: 120 km

## LLMs in Action - Applications in Basic Sciences and Research

LLMs are not just for chatbots and creative writing. They are rapidly becoming powerful tools for scientists and researchers. Here are a few examples:

*   **Accelerating Literature Reviews:** Scientists can use LLMs to quickly scan and summarize vast amounts of research papers, helping them stay up-to-date with the latest findings in their field.
*   **Generating Hypotheses:** By analyzing existing data, LLMs can identify patterns and connections that might not be obvious to human researchers, leading to new and innovative hypotheses.
*   **Data Analysis and Interpretation:** In fields like genomics and climate science that generate massive datasets, LLMs can help researchers identify patterns, anomalies, and correlations.
*   **Drug Discovery and Protein Engineering:** LLMs are being used to accelerate drug discovery and even design new protein sequences with specific functions.
*   **Code Generation:** Researchers can use LLMs to generate code for data analysis and visualization, saving time and effort.

### Example in Action: AlphaFold

While not a traditional text-based LLM, DeepMind's AlphaFold uses similar deep learning principles to predict the 3D structure of proteins from their amino acid sequence. This has been a monumental breakthrough in biology and medicine.

![AlphaFold Protein Prediction](https://www.nature.com/immersive/d41586-022-03535-7/assets/M22A2pGZJ7/2022-11-21-protein-folding-1920x1080.gif)
*An animation from Nature showing how a protein folds into its complex 3D shape, a problem now largely solved by AI.*

## Limitations & Ethics

- Hallucinations: LLMs can make up facts.

- Bias: Reflect biases present in training data.

- Cost: Training large models requires huge computational resources.

Exercise 5: Find the projected electricity compsumption for LLM and compare with the actual human capacity to produce electricity

## "Current" offering
<https://epoch.ai/data-insights/llm-apis-accuracy-runtime-tradeoff>
 

## Final Exercises

Now it's your turn to explore the world of LLMs! Here are a few exercises to get you started.

#### **Exercise 1: Become a Prompt Engineer**

The way you phrase your request to an LLM (the "prompt") can have a big impact on the quality of the response. Experiment with different ways of asking the same question. 

**Task:** Pick a scientific concept (e.g., photosynthesis, black holes, gene editing). Ask an LLM to explain it using at least three different prompts.

1.  **Simple Prompt:** `"Tell me about photosynthesis."`
2.  **Role-playing Prompt:** `"You are a science teacher. Explain photosynthesis to a 10-year-old."`
3.  **Detailed Prompt:** `"Provide a detailed, scientific explanation of the chemical reactions, inputs, and outputs involved in photosynthesis for a university-level biology student."`

In the cell below, write down your prompts and compare the outputs. What are the key differences?

In [None]:
# Use this cell to write your observations for Exercise 1.
# You can change it to a Markdown cell if you prefer.

prompt_1_output = "..."
prompt_2_output = "..."
prompt_3_output = "..."

print("Observation: The role-playing prompt gave a much simpler analogy, while the detailed prompt included specific chemical formulas...")

#### **Exercise 2: Explore Different LLMs**

There are many different LLMs available to the public (e.g., Gemini, ChatGPT, Claude, Llama). 

**Task:** Try out a few different ones. Give them the same prompt and compare their responses. 

**Prompt idea:** `"Write a short story in the style of Isaac Asimov about a scientist who discovers that their lab's AI has become self-aware."`

Do you notice any differences in their style, tone, creativity, or accuracy?

#### **Exercise 3: Investigate a Scientific Application**

**Task:** Choose a scientific field that interests you (e.g., climate science, neuroscience, archaeology, materials science) and do a quick search to see how LLMs are being used in that area. 

**Search terms to try:**
*   `"large language models in climate science"`
*   `"AI for drug discovery"`
*   `"using LLMs to analyze historical texts"`

In the cell below, write a short summary (3-4 sentences) of the most interesting application you find. Include a link to the article or paper.

In [None]:
# Write your summary for Exercise 3 here.
field = "Neuroscience"
application_summary = "I found that researchers are using LLMs to analyze patient interviews to identify early signs of neurodegenerative diseases like Alzheimer's. The models can detect subtle changes in language patterns that are not easily noticeable by humans."
link = "https://www.example.com/link-to-article"

print(f"Field: {field}\nSummary: {application_summary}\nLink: {link}")

#### **Exercise 4: Try a Coding Challenge (Optional)**

If you have some programming experience (Python is great for this), try using an LLM to help you with a coding challenge.

**Task:** Ask an LLM to write a Python function for a simple task. For example:
`"Write a Python function that takes a list of numbers and returns a new list containing only the prime numbers."`

Then, copy the code into the cell below and run it to see if it works. Can you ask the LLM to add comments to the code to explain how it works?

In [None]:
# Paste the Python code from the LLM here to test it.

# Example code that an LLM might generate:
def is_prime(n):
    """Checks if a number is prime."""
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def filter_primes(numbers):
    """Filters a list of numbers to return only the primes."""
    return [num for num in numbers if is_prime(num)]

# Test the function
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
prime_numbers = filter_primes(my_list)
print(f"Original list: {my_list}")
print(f"Prime numbers: {prime_numbers}")

#### **Exercise 5: Think Critically**

As you use LLMs, it's important to think critically about the information they provide. They are trained on data from the internet, which can contain biases and inaccuracies. They can also "hallucinate" or make up facts.

**Task:** Ask an LLM a question about a very recent event (something that happened in the last 24 hours) or a very niche, specific topic. 

1.  What is its response?
2.  Can you verify the information using a reliable source (like a major news website or a scientific journal)?
3.  Does the LLM cite its sources? 

This exercise highlights the importance of always double-checking important information from LLMs.

***
### References

1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In *Advances in neural information processing systems* (pp. 5998-6008).
2. Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. *arXiv preprint arXiv:1701.06538*.
3. Park, P. S., Goldstein, J. A., O'Gara, A., Chen, M., & Hendrycks, D. (2023). AI Deception: A Survey of Examples, Risks, and Potential Solutions. *arXiv preprint arXiv:2308.14752*.

## Some useful AI tools
- notebooklm
- local models
- Other llm: claude, gemini (unal account)
- Agents
- Google co-scientist