<a href="https://colab.research.google.com/github/jesusvillota/CSS_DataScience_2025/blob/main/Session3/3_5_LLM_Download_(Extra).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="max-width: 880px; margin: 20px auto 22px; padding: 0px; border-radius: 18px; border: 1px solid #e5e7eb; background: linear-gradient(180deg, #ffffff 0%, #f9fafb 100%); box-shadow: 0 8px 26px rgba(0,0,0,0.06); overflow: hidden;">

  <!-- Banner Header -->
  <div style="padding: 34px 32px 14px; text-align: center; line-height: 1.38;">
    <div style="font-size: 13px; letter-spacing: 0.14em; text-transform: uppercase; color: #6b7280; font-weight: bold; margin-bottom: 5px;">
      Session #3
    </div>
    <div style="font-size: 29px; font-weight: 800; color: #14276c; margin-bottom: 4px;">
      LLMs
    </div>
    <div style="font-size: 26px; font-weight: 800; color: #14276c; margin-bottom: 4px;">
      Extra: Downloading LLMs locally
    </div>
    <div style="font-size: 16.5px; color: #374151; font-style: italic; margin-bottom: 0;">
      Using Textual Data in Empirical Monetary Economics
    </div>
  </div>

  <!-- Logo Section -->
  <div style="background: none; text-align: center; margin: 30px 0 10px;">
    <img src="https://www.cemfi.es/images/Logo-Azul.png" alt="CEMFI Logo" style="width: 158px; filter: drop-shadow(0 2px 12px rgba(56,84,156,0.05)); margin-bottom: 0;">
  </div>

  <!-- Name -->
  <div style="font-family: 'Times New Roman', Times, serif; color: #38549c; text-align: center; font-size: 1.22em; font-weight: bold; margin-bottom: 0px;">
    Jesus Villota Miranda © 2025
  </div>

  <!-- Contact info -->
  <div style="font-family: 'Times New Roman', Times, serif; color: #38549c; text-align: center; font-size: 1em; margin-top: 7px; margin-bottom: 20px;">
    <a href="mailto:jesus.villota@cemfi.edu.es" style="color: #38549c; text-decoration: none; margin-right:8px;" title="Email">
      <!-- <img src="https://cdn-icons-png.flaticon.com/512/11679/11679732.png" alt="Email" style="width:18px; vertical-align:middle; margin-right:5px;"> -->
      jesus.villota@cemfi.edu.es
    </a>
    <span style="color:#9fa7bd;">|</span>
    <a href="https://www.linkedin.com/in/jesusvillotamiranda/" target="_blank" style="color: #38549c; text-decoration: none; margin-left:7px;" title="LinkedIn">
      <!-- <img src="https://1.bp.blogspot.com/-onvhHUdW1Us/YI52e9j4eKI/AAAAAAAAE4c/6s9wzOpIDYcAo4YmTX1Qg51OlwMFmilFACLcBGAsYHQ/s1600/Logo%2BLinkedin.png" alt="LinkedIn" style="width:17px; vertical-align:middle; margin-right:5px;"> -->
      LinkedIn
    </a>
  </div>
</div>

In [5]:
running_in_colab = False

Even though there is an "Open in Colab" option, this notebook is intended to be run locally in your computer, as we are trying to download and run the model files directly on your machine.

## Why run LLMs locally?

Running small Large Language Models (LLMs) on your own machine is great for:
- **Privacy**: your data stays on device.
- **Cost control**: no API bills while experimenting.
- **Offline/edge use**: works without internet.
- **Reproducibility & customization**: full control over versions and files.

In this short, hands-on lesson you will:
- Download a compact, CPU-friendly model: Google's `gemma-3-270m` from Hugging Face.
- Locate the downloaded files on disk using Terminal and Finder.
- Understand the typical files in a model repository and what they do.

**Hugging Face model card**: https://huggingface.co/google/gemma-3-270m

### Prerequisites
- Python installed (3.9+ recommended) and a working internet connection.
- The package `huggingface_hub` installed. If not, install from Terminal:
  - macOS/Linux: `pip install -U huggingface_hub`
- A few hundred MB of free disk space.
- Optional but recommended: a free Hugging Face account and CLI login (`huggingface-cli login`) if a model requires access/terms acceptance.

**Note**: `gemma-3-270m` is intentionally small so it downloads fast and runs on typical laptops. It's ideal for learning how local LLM tooling works. Some people call these type of "small" models Small Language Models (SLMs) as opposed to the typical monsters with billions of parameters, which are often referred to as Large Language Models (LLMs).

In [6]:
if running_in_colab:
    ! pip install huggingface_hub

In [7]:
download_model = True

if download_model:
    # Download a small, CPU-friendly model locally using Hugging Face Hub
    # Note: Some models require accepting terms or logging in via `huggingface-cli login`.
    from huggingface_hub import snapshot_download

    model_id = "google/gemma-3-270m"

    # This downloads the full model repo to a local cache folder and returns the path
    local_dir = snapshot_download(model_id)

    print(f"Model downloaded at: {local_dir}")

Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]

Model downloaded at: /Users/jesusvillotamiranda/.cache/huggingface/hub/models--google--gemma-3-270m/snapshots/9b0cfec892e2bc2afd938c98eabe4e4a7b1e0ca1


## DeepSeek‑R1‑Distill‑Qwen‑1.5B — distilled, in plain English

### What is model distillation?

- A **large** "teacher" model trains a **smaller** "student" model to mimic its outputs.
- The student keeps most of the teacher's skill but is lighter, faster, and cheaper to run.
- **Why it matters**: near‑teacher quality without the hardware and cost of huge models.

### How it applies here

- DeepSeek‑R1‑Distill‑Qwen‑1.5B is a 1.5B‑parameter student distilled from DeepSeek‑R1's reasoning models, built on the Qwen‑2.5 family.
- You get R1‑style reasoning patterns (step‑by‑step, self‑checking) in a compact model that runs on typical laptops.

### The only details you really need

- **Size**: ~1.5B params → practical for local demos, teaching, and prototypes.
- **Context**: long context support helps with longer prompts/doc chunks.
- **Usage**: chat‑style prompting; ask it to "think step by step" for math/logic tasks.
- **Sampling**: temperature ~0.5–0.7 often yields clearer, less repetitive answers.
- **License**: permissive for classroom and projects (still review the model card).


In summary, DeepSeek‑R1‑Distill‑Qwen‑1.5B brings R1‑style reasoning to a laptop‑friendly model. Below you'll download it from the Hub and locate it on disk for local use.

In [8]:
download_model = True

if download_model:
    from huggingface_hub import snapshot_download
    model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
    local_dir = snapshot_download(model_id)
    print(f"Model downloaded at: {local_dir}")

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

Model downloaded at: /Users/jesusvillotamiranda/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562


## Where did the model download to?

The function `snapshot_download` saves the full Hugging Face repository to your local cache. On macOS the default path is inside your user cache directory, for example:

```
/Users/<your-username>/.cache/huggingface/hub/models--<model>/snapshots/<commit-hash>
```

In my case:

```
/Users/jesusvillotamiranda/.cache/huggingface/hub/models--google--gemma-3-270m/snapshots/9b0cfec892e2bc2afd938c98eabe4e4a7b1e0ca1

/Users/jesusvillotamiranda/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562
```

**Important**: Your exact path will be printed by the previous code cell. Use that path in the steps below.

### Option A: Use Terminal (macOS/Linux)
1. Open Terminal.
2. Change directory to your printed path. Example:
   ```bash
   cd /Users/yourname/.cache/huggingface/hub/models--google--gemma-3-270m/snapshots/<commit-hash>
   ```
3. List files with sizes:
   ```bash
   ls -lh
   ```

**Tip**: press Tab to auto-complete long paths. If you get "No such file or directory," double-check spaces and the exact hash.

### Option B: Use Finder (macOS) or File Explorer (Windows)
1. In Finder, press Cmd + Shift + G (Go to Folder...).
2. Paste the path you saw printed (the snapshot folder).
3. Press Enter to open the folder with the downloaded model files.

Below are screenshots showing the Hugging Face cache structure and the model folder contents.

### Exploring your local Hugging Face cache

In your Hugging Face cache (`~/.cache/huggingface/hub`), you'll find all models you've downloaded. In my case:

![Finder view 1](images/finder_1.png)

Now, drilling down into the `gemma-3-270m` snapshot folder, you'll see the actual model artifacts:

![Finder view 2](images/finder_2.png)

Common files you'll encounter in model repos:

| File Name                       | Purpose/Description                                                                                      |
|----------------------------------|---------------------------------------------------------------------------------------------------------|
| `config.json`                    | Model architecture and hyperparameters used by libraries like Transformers.                             |
| `tokenizer.json` / `tokenizer.model` / `tokenizer_config.json` | Vocabulary and rules for turning text into tokens.                                 |
| `model.safetensors` or `pytorch_model.bin` | The neural network weights. `safetensors` is preferred for safety and speed.         |
| `generation_config.json`         | Default generation parameters (temperature, max_new_tokens, etc.) used by convenience APIs.             |
| `README.md` and license files    | Model card and licensing terms. Always review licensing before redistribution or commercial use.         |

These files together are what frameworks load to run the model locally.

## A quick mental model: how all the pieces fit

- Hugging Face Hub is like Git for models: each model repo has versions (commits) and files.
- `snapshot_download` fetches a specific snapshot (commit) to your local cache.
- Loading a model later only needs the local path; no re-download unless you change versions.
- `tokenizer` turns text into tokens; the model turns tokens into the next-token probabilities; a generation loop produces text.
- `config` and `generation_config` tell libraries how to reconstruct the model and generate text sensibly.

## Running the model with Transformers (CPU-friendly)

Once downloaded, you can load the model from the local path using `transformers`.

In [9]:
if running_in_colab:
    ! pip install transformers

Each model has its own way of interacting with transformers. For example, some models may require specific input formats or additional preprocessing steps. Always refer to the model's documentation for details on how to use it effectively.

1. Go to the model page, and click on "Use this model"

![Model page](images/model_page.png)

2. Follow the instructions provided on the model page to integrate it into your application.

![Pipeline](images/pipeline.png)

In [10]:
# -- Set parameters for comparison --
PROMPT = "What do economists do?"
MAX_TOKENS = 400

In [11]:
# google--gemma-3-270m
from transformers import AutoTokenizer, AutoModelForCausalLM

gemma_model_path = "/Users/jesusvillotamiranda/.cache/huggingface/hub/models--google--gemma-3-270m/snapshots/9b0cfec892e2bc2afd938c98eabe4e4a7b1e0ca1"

tokenizer = AutoTokenizer.from_pretrained(gemma_model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(gemma_model_path, trust_remote_code=True)

model.eval()

inputs = tokenizer(PROMPT, return_tensors="pt")
outputs = model.generate(**inputs, max_length=MAX_TOKENS)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


What do economists do? The answer is usually nothing, unless there’s a big discrepancy with reality and you think your research might help.  The more complex this question is, the more it becomes a reason for you to stop research and find alternative sources of solutions. In one example, my colleague Dan Soper is working on a novel solution for a smallpox virus, using RNA viruses as tools to make RNA vaccines. But first he makes a great deal of changes, not just in the study of these viruses, but also in his approach to how he builds them. He starts from the idea that RNA viruses tend to be much more complex than protein viruses and that they are much more resistant to infections than they seem to be. But he still has to get all the molecular details right, and his approach has caused him a great deal of criticism.

For an experiment we call “an RNA experiment,” the first part of the work is very straightforward. We make a sample from a single cell in a laboratory, remove the nucleus, 

In [12]:
# deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B
from transformers import AutoTokenizer, AutoModelForCausalLM

deepseek_model = "/Users/jesusvillotamiranda/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562"

tokenizer = AutoTokenizer.from_pretrained(deepseek_model)
model = AutoModelForCausalLM.from_pretrained(deepseek_model)
messages = [
    {"role": "user", "content": PROMPT},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=MAX_TOKENS)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


Okay, so I'm trying to figure out what economists do. I remember from my basic classes that economists study economics, but I'm not entirely sure what that entails. I think it's about how societies function, right? Like, things like markets, production, and resource allocation. But I'm not clear on the specific areas they study. 

Let me start by breaking down the term. "Economists" comes from the Greek word "ekmonos," which means the study of money, but I think in economics it's more about the allocation of resources. So, maybe economists look at how money is used in the economy. 

I remember something about microeconomics and macroeconomics. Microeconomics must be about individual markets, like how a company decides what to produce or how to allocate resources within a company. Macroeconomics, on the other hand, deals with the whole economy, including things like inflation, unemployment, and government policies. 

Economists use different tools and methods to study these areas. I thi