# Let's get started with RunGPU

Rungpu is simple to use. You can finetune your model in a few lines of code.  
A few basics: 

1. Pick your dataset of your choice. 
2. Pick any Model from Huggingface. 
3. Build your own finetuning configuration. Or choose from one of our templates!

We have sample code to help you get started, with popular models, including Mistral v0.2, Llama 3, Gemma, etc.

Let's start with importing the libraries.

In [1]:
# Lets import the model first
from rungpu import Finetune


## Create a Client object for Authentication. 


In [2]:
# Import the Client class to get access to RunGPU services. 
from rungpu import Client
# Enter your Unique RunGPU Client ID and Secret here. 
client_id = '<Your RunGPU Client ID>'
client_secret = '<Your RunGPU Client Secret>'

client = Client(client_id, client_secret)

## Let's get started with creating your dataset. 

### Why? 

In this example, we will use a dataset to train the Chatbot Model to make sure it learns specific information about our use case. 


But lets start with a simpler question:

#### What is a dataset and why do you need it? 

A dataset is simply a document or simply a corpus of text that holds the knowledge you would want to train your chatbot on. 
It could be anything under the sun, general or specific. Since most foundational models are trained on a lot of general knowledge, 
it would most likely be something specific. 




A typical dataset config would look like the following
For example, for a file in aws S3, 
```
{
    "config": {
        "type": "s3",
        "provider": "AWS",
        "env_auth": "false",
        "access_key_id": <access_key_id for aws>,
        "secret_access_key": <secret access key>,
        "region": <aws region>,
        "src_file": "/rungpu-dev/test.txt",
        "dest_file": "data.txt"
    }
}
```

the `config` property is where you'll be entering all your config information for where your file will be coming from. 




In [4]:
aws_access_key_id = "<Your AWS Access Key ID>"
aws_secret_access_key = "<Your AWS Secret Access Key>"
aws_region="<Your AWS Region>"


dataset_config = {
    "config": {
        "type": "s3",
        "provider": "AWS",
        "env_auth": "false",
        "access_key_id": aws_access_key_id,
        "secret_access_key": aws_secret_access_key,
        "region": aws_region,
        "src_file": "/rungpu-dev/test.txt",

    }
}



## Google Drive Option
If you don't have access to a cloud provider for storage and have your dataset stored locally, you can get started by simply having your file in your google drive and having a shareable link pointing to that file. The config for that looks like this:
```
config = {'config':
            {'type':'google_drive',
             'src_file':<shareable_gdrive_link>
             }
         }
```

In [None]:

# Enter your shareable google drive link here, to the file you want to get. 
gdrive_url= "<shareable-drive-url>"

dataset_config = {'config':
                 {'type':'google_drive',
                  'src_file':gdrive_url}
                  }

In [6]:
import json
from rungpu import Dataset

dataset = Dataset(client=client,mode='train', config=dataset_config)

# Pulling the Dataset from the cloud. 
dataset_response = dataset.create_dataset()


In [7]:
dataset_response

{'command': 'Create Dataset',
 'created_at': '2024-09-09 11:15:03.896374',
 'dataset_id': 'rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193',
 'src_file': 'https://drive.google.com/file/d/1zj8v7Nxf2gZuScLs8hzcu-jSBYkOP-no/view?usp=sharing',
 'data_source': 'google_drive'}

# Your Dataset ID.

Running the code cell above returns you a response which contains the details of the dataset you just created. 

The dataset you created is stored securely on our servers for the purposes of finetuning the model (it can be removed later.)

This response object you get on running the `create_dataset()` function contains the dataset_id unique to the dataset you just created. The dataset id looks something like the following: 

`rungpu_dataset_<random_unique_id>`

You can plug this id into your finetuning config as below for the `dataset_id` property

In [8]:
dataset_id = dataset_response['dataset_id']
dataset_id

'rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193'

# Let's Create your model and start Finetuning!

Your finetuning job flow includes the creation of the model. So the finetuning config would include the config for your model. 

The finetuning config looks something like this: 

```
    {
    "base_model": <huggingface base model>,
    "quant": 8,
    "num_steps": 100,
    "dataset_id": <rungpu dataset id will be added by the Finetune Object when you declare it.>,
    "strategy": "lora",
    "checkpoint_steps": 10,
    "training_size": 1000,
    "peft_config": {
        "lora": {
            "r": 16,
            "alpha": 16,
            "target_modules": [
                "q_proj",
                "k_proj",
                "v_proj",
                "o_proj",
                "gate_proj",
                "up_proj",
                "down_proj",
                "lm_head"
            ],
            "bias": "none",
            "lora_dropout": 0.05
        }
    }
}
```


In [9]:
# Here's your finetuning config. 
ft_config = {
    "base_model": "meta-llama/Llama-2-7b-hf",
    "quant": 8,
    "num_steps": 10,
    "strategy": "lora",
    "checkpoint_steps": 1,
    "training_size": 1000,
    "model_max_length": 4096,
    "prompt_max_length": 512,
    "gguf_flag": True,
    "peft_config": {
        "lora": {
            "r": 16,
            "alpha": 16,
            "target_modules": [
                "q_proj",
                "k_proj",
                "v_proj",
                "o_proj",
                "gate_proj",
                "up_proj",
                "down_proj",
                "lm_head"
            ],
            "bias": "none",
            "lora_dropout": 0.05
        }
    }
}

In [10]:
# Define the finetuning config
import json

# Creating the Finetune object, and plugging in the config we created, finetuning specific. 
finetune = Finetune(client,dataset_id=dataset_id,config=ft_config)

#Calling the run_finetune() function to kick off the finetuning job. 
response = finetune.run_finetune()
response


{'command': 'Finetune',
 'status': 'IN_QUEUE',
 'train_id': 'meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193',
 'base_model': 'meta-llama/Llama-2-7b-hf',
 'quantization': 8,
 'dataset_id': 'rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193',
 'peft_strategy': 'lora',
 'checkpoints': 1,
 'num_steps': 10,
 'training_size': 1000,
 'peft_config': {'lora': {'r': 16,
   'alpha': 16,
   'target_modules': ['q_proj',
    'k_proj',
    'v_proj',
    'o_proj',
    'gate_proj',
    'up_proj',
    'down_proj',
    'lm_head'],
   'bias': 'none',
   'lora_dropout': 0.05}}}

# The Finetune `run_id`

Once you execute the `run_finetune()` function on the finetune object, you essentially kick off a finetuning object on the backend. 
the `run_finetune()` call returns a response object from the server, that contains a `run_id`, which is the unique identifier for this specific finetuning job run. 

You can use this run_id to retrieve the details on how the run job progressed, and other information about model configurations and finetuning strategies. This is put in place with several other functions we provide to check the finetune job details. 


The response of the `run_finetune()` command looks something like the following.

In [11]:
response
train_id = response['train_id']

### Let's see what our training id looks like

The training id is a combination of the model_id and the training dataset id that it was trained on. This gives your training job/ finetuned model a unique identity. 

In [12]:
train_id

'meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193'

# Check the status of your run. 

Use the finetune object to call the `get_status` function, which will help us get run status of current or past finetuning run jobs based on their `run_id`

In [17]:
from rungpu import TrainStatus, Client
client = Client(client_id=client_id, client_secret=client_secret)
train_status = TrainStatus(client,train_id)
status = train_status.get_status()
status

{'Train_Id': 'meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193',
 'time_elapsed': '1.9725259833333333',
 'train_status': {'command': 'Finetune',
  'train_id': 'meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193',
  'status': 'RUNNING',
  'phase': 'MERGING_MODEL',
  'client_id': 'n6p7iWSrknJqkLIwX0PGi',
  'model_id': 'meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc',
  'base_model': 'meta-llama/Llama-2-7b-hf',
  'dataset_id': 'rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193',
  'run_start': '2024-09-09 11:16:06.811642',
  'run_end': None,
  'export_start': None,
  'export_end': None,
  'error': 'Job Still Running',
  'quantization': 8,
  'strategy': 'lora',
  'checkpoint_steps': 1,
  'training_steps': 10,
  'training_split': 1000,
  'peft_config': {'lora': {'r': 16,
    'alpha': 16,
    'target_modules': ['q_proj',
     'k_proj',
     'v

# The Progress Funciton

The progress function run on the `Status` Object gives you the training output as the training happens. 

In [14]:
train_status.progress()



Your Job is still in the queue
will retry in 30 seconds...
eport_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
max_steps is given, it will override any value given in num_train_epochs
The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: text. If text are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 1,738
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 4
  Total optimization steps = 10
  Number of trainable parameters = 40,554,512
Saving model checkpoint to /home/rungpu/models//meta-llama-Llama-2-7b-hf/8-bit/meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193/checkpoint-1
Saving mode

## Get Download Links to your Model

Use the same status object as above to generate download links to your models. 

In [20]:
train_status.downloadlinks()

{'message': "Here's your download link",
 'model_files': {'f3d03eda-e1e3-4be3-bf56-ad50c86eb78b': {'file_name': 'config.json',
   'file_url': 'http://rungpu1-download.rungpu.ai/download/llm/meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193/f3d03eda-e1e3-4be3-bf56-ad50c86eb78b'},
  '74e93ec8-ad6b-4dd7-b2d8-9df65eb9a84c': {'file_name': 'added_tokens.json',
   'file_url': 'http://rungpu1-download.rungpu.ai/download/llm/meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193/74e93ec8-ad6b-4dd7-b2d8-9df65eb9a84c'},
  'a8078169-3237-484c-b4ea-b402a5028ea2': {'file_name': 'model.gguf',
   'file_url': 'http://rungpu1-download.rungpu.ai/download/llm/meta-llama-Llama-2-7b-hf-8-bit-f809e796-6e48-11ef-8ef3-b8ca3a5c98fc-rungpu_dst_3d2ab14d-a03f-4a36-9813-16f83d0e4193/a8078169-3237-484c-b4ea-b402a5028ea2'},
  '88a1f4c2-a81b-463f-aa53-f079d2024d08': {'file_name': 'model.safete