# Working with Open Source Gen AI Models within a Jupyter Notebook

This notebook explains how you can leverage Google Cloud features including Vertex AI, Cloud compute resources such as GPUs and Cloud Storage to develop using Generative AI models

It goes through how to create a GPU enabled notebook instance within your project and run local inference using open-source models available on **HuggingFace**. A notebook such as this can be used as a virtualised environment for various development tasks and the variety of machine configurations available including enterprise grade GPUs allow for flexibility of costs vs concerns such as training performance.

<mark>You can refer to the following guide to understand instance confiugration costs: https://cloud.google.com/vertex-ai/pricing#notebooks. Please also make sure to stop or delete instances when not in use to reduce consumption costs as these resources use a pay as you go model so you only pay for resources for the time in which they are provisioned.</mark>

**Section 1** of this notebook can be run in this pre-provisioned notebook instance (which is configured without a GPU) to create the GPU enabled notebook, once it is created you can open this user-guide in the GPU enabled notebook to run **Section 2**.

**Please run the following cell to setup required variables for <mark>both</mark> sections 1 and 2**


In [None]:
# Project Specific Variables
PROJECT_ID = !gcloud config get project
PROJECT_ID = PROJECT_ID.n
SERVICE_ACCOUNT = "%s-consumer-sa@%s.iam.gserviceaccount.com"  % (PROJECT_ID,PROJECT_ID)
USER_GUIDE_BUCKET = "gs://gen-ai-%s-user-guide-bucket" % PROJECT_ID
GEN_AI_BUCKET = "gs://gen-ai-%s-bucket" % PROJECT_ID

# Section 1: Creating a GPU Enabled Vertex AI Workbench Notebook

## Setup and Pre-Requisites
The pre-requisite environment variables are set here. Some are taken directly from the environment in order to deploy the notebook to the right place wheras others can be changed to alter the instance configuration. Some common configuration parameters are provided here.

**Configuration:**
- INSTANCE_NAME = Name of the instance to be created
- MACHINE_TYPE = CPU and RAM configuration of the notebook: https://cloud.google.com/compute/docs/machine-resource
- ACCELERATOR_TYPE = Type of GPU (the preset one is the only type available in Europe-West2) you can deploy to europe-west 4 for additional GPU types: https://cloud.google.com/compute/docs/gpus
- ACCELERATOR_NO = Number of Attached GPUs
- BOOT_DISK_SIZE = Size in GB of the attached persistent disk. You can increase this if you require additional storage.
- VM_IMAGE_NAME = Image to use, this determines what libraries and tools (python version etc.) are installed into the notebook. The preset image contains pytorch 2.0 with python version 3.10.

In [None]:
# Customisable Variables
INSTANCE_NAME = "gen-ai-gpu-notebook"
MACHINE_TYPE = "n1-standard-4"
ACCELERATOR_TYPE = "NVIDIA_TESLA_T4"
ACCELERATOR_NO = 1
BOOT_DISK_SIZE = 100
VM_IMAGE_NAME = "pytorch-2-0-gpu-notebooks-v20230925-debian-11-py310"
INSTANCE_LOCATION = "europe-west2-b"

### Selecting a Notebook Image
In order to select the image that you are using to create the notebook you can edit the `VM_IMAGE_NAME` variable. For an explanation of available pre-built images refer to: https://cloud.google.com/vertex-ai/docs/workbench/user-managed/images. You can run the following command within the terminal to get a full list of image names: `gcloud compute images list --project deeplearning-platform-release | grep notebooks`

## Use gcloud CLI to create a notebook instance

The following gcloud command creates a notebook instance and its configuration is determined by the command line flags. This can also be done through the user interface within the cloud console, but you would need to ensure that configuration parameters are set as detailed here.

Note: "!" before code runs it as terminal command instead of Python

In [None]:
! gcloud notebooks instances create $INSTANCE_NAME\
--machine-type=$MACHINE_TYPE\
--accelerator-type=$ACCELERATOR_TYPE\
--accelerator-core-count=$ACCELERATOR_NO\
--install-gpu-driver\
--boot-disk-size=$BOOT_DISK_SIZE\
--location=$INSTANCE_LOCATION\
--vm-image-project="deeplearning-platform-release"\
--vm-image-name="pytorch-2-0-gpu-notebooks-v20230925-debian-11-py310"\
--network="gen-ai-vpc"\
--subnet="gen-ai-vpc-subnet"\
--subnet-region="europe-west2"\
--metadata="idle-timeout-seconds"="18000","startup-script-url"="$USER_GUIDE_BUCKET/userguide_files_copy.sh"\
--service-account=$SERVICE_ACCOUNT\
--post-startup-script="$USER_GUIDE_BUCKET/userguide_files_copy.sh"

## Delete Created Notebook

Once you are done using the notebook or do not require it anymore. You can run the following cell to delete the notebook instance you created above. Alternatively you can go to the cloud console notebooks page [here](https://console.cloud.google.com/vertex-ai/workbench/user-managed?) and select the checkbox for the relevant notebook and press delete.

If this fails due to the command format, run the first cell in this notebook again.

In [None]:
! gcloud notebooks instances delete $INSTANCE_NAME --location=$INSTANCE_LOCATION

# Section 2: Run Inference for Open Source Models from HuggingFace

# Install Required Libraries

As this notebook image comes pre-configured from Google with several common dependencies so you can start working quickly. The diffusers and transformers libraries are developed by HuggingFace and model support can be found here: https://github.com/huggingface/transformers#model-architectures. To get the full list of installed python libararies you can use "`pip freeze`" in the terminal.

In [None]:
! pip install transformers
! pip install diffusers

## Model Usage Examples

### Import Libraries and Download Pre-Trained LLM

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-completion-alpha-3b-4k")
model = AutoModelForCausalLM.from_pretrained(
  "stabilityai/stablecode-completion-alpha-3b-4k",
  trust_remote_code=True,
  torch_dtype="auto",
)
model.cuda()

### Local Inference for LLM

In [None]:
prompt = "def is_leap_year(year):"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to("cuda")
generation_output = model.generate(input_ids=input_ids, max_new_tokens=32)
print(tokenizer.decode(generation_output[0]))

### Import Libraries and Download Pre-Trained Image Generation Model (Stable Diffusion)

In [None]:
import torch
from diffusers import StableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

### Local Inference for Image Generation

In [None]:
prompt = "a landscape painting including mountains and a river at sunset"
image = pipe(prompt).images[0]

display(image)