How to Download & ##### Run Open Source Models in Google Colab

##### 1. Using Hugging Face transformers directly (recommended for most models)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "EleutherAI/gpt-j-6B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)


from huggingface_hub import snapshot_download

local_dir = snapshot_download("EleutherAI/gpt-j-6B")


In [None]:
tokenizer = AutoTokenizer.from_pretrained(local_dir)
model = AutoModelForCausalLM.from_pretrained(local_dir)


##### 3. Using git-lfs to clone the model repository

In [None]:
!git clone https://huggingface.co/EleutherAI/gpt-j-6B


4. Downloading pre-quantized or converted weights from third parties

In [None]:
!wget https://example.com/path/to/quantized-model.bin


5. Mount Google Drive and store models there

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Then download or copy model files to /content/drive/MyDrive/models/


Requires git-lfs installed in Colab (!apt install git-lfs) and set up.

| Method                              | Best For                     | Pros                           | Cons                        |
| ----------------------------------- | ---------------------------- | ------------------------------ | --------------------------- |
| `transformers.from_pretrained`      | Most models, easy            | Simple, automatic download     | Slow startup, requires net  |
| `huggingface_hub.snapshot_download` | Large models, manual control | Cache once, offline use        | Manual work, disk space     |
| `git-lfs` clone                     | Full repo control            | Inspect files, customize       | Setup overhead, slower      |
| Manual wget/download (quantized)    | Optimized, small models      | Fast, low memory usage         | Compatibility, manual steps |
| Google Drive mount                  | Persistent storage           | No redownloading every session | Slower IO                   |


| Model Type                                                  | API Key Required? | Why?                                           |
| ----------------------------------------------------------- | ----------------- | ---------------------------------------------- |
| Hosted API services (OpenAI, Anthropic, Google Gemini)      | Yes               | Company runs model on cloud, paid/gated access |
| Gated private models on Hugging Face (some LLaMA 3, etc.)   | Yes               | Restricted access requires login/token         |
| Fully open-source downloadable models (GPT-J, Falcon, etc.) | No                | You run model locally or on your own server    |


| Access Type                                    | Famous Examples                                                                                                                                                             | Description / Notes                                                                                                                              |
| ---------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Hosted API (API key needed)**                | - OpenAI GPT-4, GPT-3<br>- Anthropic Claude<br>- Google Bard / Gemini (planned)<br>- Cohere<br>- AI21 Labs Jurassic                                                         | Models run on cloud servers; require API key for usage; mostly paid or limited free tiers.                                                       |
| **Gated Private Models (access token needed)** | - Meta LLaMA 3<br>- Some Falcon 40B versions<br>- GPT-NeoX 20B (certain releases)<br>- Claude 2 (Anthropic gated)<br>- Some proprietary Hugging Face repos                  | Access requires approval, license agreement, or token; often not fully open-source; weights may not be publicly downloadable without permission. |
| **Fully Open Source (no API key needed)**      | - GPT-J 6B (EleutherAI)<br>- GPT-Neo 2.7B<br>- Falcon 7B & 40B (OpenAccessAI)<br>- LLaMA 2 (Meta, partially open)<br>- Dolly 2.0 (Databricks)<br>- Mistral 7B<br>- StableLM | Model weights and code freely downloadable; you can run locally or in Colab without API keys; license permitting usage.                          |


1️⃣ Hosted API Model Example: OpenAI GPT-4 / GPT-3

Step-by-step to use OpenAI GPT in Colab:
Get OpenAI API key:

Sign up at OpenAI.

Get your API key from your dashboard.

Install OpenAI Python package:

In [None]:
!pip install openai


In [None]:
import openai

# Replace with your actual OpenAI API key
openai.api_key = "YOUR_OPENAI_API_KEY"

response = openai.ChatCompletion.create(
    model="gpt-4",  # or "gpt-3.5-turbo"
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, who won the world cup in 2022?"}
    ]
)

print(response['choices'][0]['message']['content'])


2️⃣ Gated Private Model Example: Meta LLaMA 3

Step-by-step to use gated LLaMA 3 model in Colab:
Request access:

Go to Meta LLaMA page on Hugging Face.

Apply and agree to terms to get access.

Get your Hugging Face token:

Sign in on Hugging Face.

Go to your settings > Access Tokens > create new token.

In [None]:
!pip install transformers huggingface_hub


In [None]:
from huggingface_hub import login

login(token="YOUR_HUGGINGFACE_TOKEN")


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "meta-llama/Llama-2-7b-chat-hf"  # Example (for LLaMA 2, 3 may be similar)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("Hello LLaMA!", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))


3️⃣ Fully Open Source Model Example: GPT-J 6B (EleutherAI)

In [None]:
!pip install transformers torch


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "EleutherAI/gpt-j-6B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
