## Programmatic Access to AI Models
In this notebook we'll learn how to access the free-tier Google AI models and also to download and run huggingface models.  Make sure at the top right you change the runtime type to the T4 GPU.

The first thing we'll do is switch off warnings to remove unecessary noise when we're running the notbook.

In [4]:
from warnings import filterwarnings
filterwarnings('ignore')

### Google Gemini 1.5  
We can use Google's Gemini 1.5 model on the free tier. Note that we also have in Google Colab the option to set our running environment to be eitehr CPU or a T4 GPU. Having the GPU available is a powerful capability for running agents.

We need to use the Google generativeai library. This is already available in the Colab environment so we don't need to install it. We do, however, need to install our Google API key in the Colab vault and then use it to configure our access.

In [1]:
import google.generativeai as genai
from google.colab import userdata

genai.configure(api_key=userdata.get('GOOGLE_API_KEY'))

Let's now set up a handle specifying the model we want to use.  Google offers the smaller Gemma model and its full Gemini models. The Gemini 1.5 is available in the free tier, and there are a range of more advanced Gemini 2.0 models that can be used with the paid service.

In [2]:
model = genai.GenerativeModel("gemini-1.5-flash")

We can now send Gemini a request

In [3]:
response = model.generate_content("What is the gravity on the surface of Mars relative to Earth's gravity?")

We have our response so let's print it.

In [None]:
print(response.text)

### Huggingface Qwen2.5


We can use the Huggingface transformers library to directly download and run any of the models on Huggingface. Qwen is a standard family of generative AI models available on Huggingface.  Note for this we need to have HF_TOKEN set up in Colab's vault.

Colab comes with the tranformers library already installed, so we can directly import our pipeline for loading and operating a model, and use it to create a pipe for the Qwen2.5 model. We'll use the model with 1.5 billion parameters. We'll set the device to use the CUDA library for GPU operation.

In [None]:
from transformers import pipeline

pipe = pipeline("text-generation",model="Qwen/Qwen2.5-1.5B",device="cuda")

We can see we've set the device type that we'll use for running the model. cuda is much faster than cpu!  Now let's send in our prompt and display the response.

In [None]:
response = pipe("What is the gravity on Mars relative to that on Earth?")
print(response[0]['generated_text'])