<a href="https://colab.research.google.com/github/edwardhongwang/2025_Spring_ES26_ES294-/blob/main/api_setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Welcome to the Harvard ES 26 and ES 294 Recitation 3



# **API Setup**

required: OpenAi account and Huggingface Account


Expected completion time: 20 min


## March 10, 2025  <br> Edward Hong Wang


**TL;DR** In this notebook you will find instructions to setup your OpenAI and HuggingFace API keys for the tutorial.

<!--
<img src="https://drive.google.com/uc?export=view&id=1Agfj2lsK155vzmvG6vB4dScH7RqUWV9B" alt="drawing" width="400"/> -->

<!-- <img src="https://drive.google.com/uc?export=view&id=11o2zAv2_Cu8BL-FVdoRY8z5IEruO3ElZ" alt="drawing" width="400"/> -->

<!-- https://drive.google.com/file/d/11o2zAv2_Cu8BL-FVdoRY8z5IEruO3ElZ/view?usp=sharing -->


# Overview

🔑 First, we need an OpenAI API Key since we will use ChatGPT for many demos. Don't worry 😉 , you can easily switch to other LLM provider if you prefer. But you'll need to work out yourself setting up their API keys.


🤗 You will also need a HuggingFace token for interfacing directly with the LLM weights and code (as opposed to using their call APIs).

🕵️‍♂️ In addition to the HuggingFace token, you need to request access to the LLamaModels that we'll be using in the demo.


<img src="https://drive.google.com/uc?export=view&id=1agwSn9ZKa7y-QgQ4YrmpKLwv2c7IZY_w" alt="drawing" width="400"/>




# Open AI API Key


**Step 1:** Obtain the OpenAI API key from the OpenAI platform website.

1. Visit the [OpenAI Platform Website](https://auth.openai.com/). Create an account.
2. Log in, go to `Dashboard` on the Top Menu, then `API keys` on the left sidebar. Finally, select `+Create new secret Key` from the top right corner.
3. Choose a name for your key (e.g., `GoogleColab`). Below is the screenshot of the screen you should be at.

4. Clock on `Create secret key`, and it will take you to your API key. You must copy this key and store it for the next step.

<img src="https://drive.google.com/uc?export=view&id=14Mu4cgk1Js8Az79gQrkmsaaLSC5LXgEJ" alt="drawing" width="800"/>

<img src="https://drive.google.com/uc?export=view&id=1oqGFLgYKG_VWHuJRF0Wt3VXOV6qg78on" alt="drawing" width="800"/>

**Step 2:** Add to the Google Colab Notebook Secrets

1. On a Colab notebook (e.g., this one), open the fourth menu of the left sidebar, which has a key icon 🔑.

2. Click on `+ Add new secret`. On the column named name, input `OPENAI_API_KEY`. Then on the column named Value, paste the value of the API key you previously copied.


<img src="https://drive.google.com/uc?export=view&id=1CjR_VZh_aMuQbZeRLxjrxpDfCQEbu7iB" alt="drawing" width="400"/>


## Test OpenAI API

Let's now test with a simple LLM call.

In [None]:
# first install the require packages
%pip install -q openai

In [None]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [None]:
import openai

client = openai.OpenAI()

client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a joke about Ai and Machine learning"}],
)

ChatCompletion(id='chatcmpl-B9CatFsFMMLbw8LLutlJd2Gh3MqbF', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Why did the robot go on a diet? \n\nBecause it had too many bytes! ', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))], created=1741533063, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_06737a9306', usage=CompletionUsage(completion_tokens=19, prompt_tokens=16, total_tokens=35, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

# Hugging Face

The previous LLM call happen entiredly on a remote server hosted by OpenAI. Let's see if we can download an entire LLM and use it to generate text.

**Step 1**: Obtain the HuggingFace token.

1. Go the [HuggingFace portal](https://huggingface.co/). Log in (create an account if needed).
2. Click on your avatar on the right corner (small circle), which will open the settings right sidebar. Then go to `Access Tokens`.

<img src="https://drive.google.com/uc?export=view&id=1kb45vIZCK1EWVumMuYCMrDwsu7NMPUSz" alt="drawing" width="500"/>

3. Select `+Create new token` in the new menu. Then select any name (e.g., `ColabTest`). the permission level (write is ok), and finally, click on `Create token` again. Lastly, copy the new key (copy button), which will be added to Colab in the next step in the same way as the Open AI key.

<img src="https://drive.google.com/uc?export=view&id=1Vfayo4auZ3Y6_UgNknMmylkBDstY4WKU" alt="drawing" width="500"/>


**Step 2:** Add to the Google Colab Notebook Secrets

1. On a Colab notebook (e.g., this one), open the fourth menu of the left sidebar, which has a key on the icon 🔑.

2. Click on `+ Add new secret`. On the column named name, input `HF_TOKEN`. Then on the column named Value, paste the value of the API key you previously copied.

<img src="https://drive.google.com/uc?export=view&id=1CjR_VZh_aMuQbZeRLxjrxpDfCQEbu7iB" alt="drawing" width="400"/>










## Test HF Token

In [None]:
%pip install -q transformers[torch] # HF main module for LLMs

In [None]:
import transformers
import torch
import os

os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

model = "tiiuae/falcon-rw-1b"

tokenizer = transformers.AutoTokenizer.from_pretrained(model)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map="auto",
)

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

pipeline(
    "This is a test: ",
    max_length=20,
    eos_token_id=tokenizer.eos_token_id,
)

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'This is a test: \nThis is a test: \nThis is a test: '}]

# Llama 3.2 Models on Hugging Face

This step is important since we want to be able to use the new Meta/LLama 3.2 1B models.

🧵 In general, 1B models are the smallest possible modern LLMs. But we when we need to finetune the LLM weights directly for a downstream task, we need many computational resources. Therefore, 1B models are more convenient for Google Colab.

**Step: Request Access**

1. Navigate to the [Meta Llama 3.2 1B Model Card page](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on HuggingFace.
2. Expand the LLAMA 3.2 COMMUNITY LICENSE AGREEMENT, fill the form, and Submit. (Note: You might need to confirm your HuggingFace email if you didn't do it earlier).

<img src="https://drive.google.com/uc?export=view&id=1pD27fsgpNf1F1emVSlTyWLQjdiLlNW-Y" alt="drawing" width="450"/>

3. Done. Now you just need to way for approval, which usually takes a few minutes only.



## Test Llama 3.2


In [None]:
import transformers
import torch

model_id = "meta-llama/Llama-3.2-1B-Instruct"

pipe = transformers.pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])


OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like meta-llama/Llama-3.2-1B-Instruct is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

In [None]:
# MIT License
#
# @title Copyright (c) 2025 Mauricio Tec { display-mode: "form" }

# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.