### The aim of this notebook is to create our SambaStudio jobs and endpoints

In [None]:
import os
import sys
import json
from pprint import pprint
from dotenv import load_dotenv
from snsdk import SnSdk

sys.path.append("../")
sys.path.append("../../")
load_dotenv("../../export.env")

In [None]:
!pwd

For our tutorial we are going to be interacting with SambaStudio at a range of points:
- source the LLAMA 70B Chat endpoint already hosted on our environment to run inference
- Upload our target dataset to SambaStudio env1
- Create a project and a job for domain-adaptive pretraining with our target dataset
- Finetune the latest checkpoint of the previous job
- Host the finetuned model at an endpoint

The first of these points is better handled through our `SambaNovaEndpoint` helper function and the others can be done directly on\
the SambaStudio GUI or through **snapapi** and **snsdk**.

We will walk you through how to use **snsdk** for our key functions.

To begin with, your `.env` file will have some missing environment variables. Namely, `FINETUNED_PROJECT_ID`, `FINETUNED_ENDPOINT_ID`, and `FINETUNED_API_KEY` which we will create as we go through the tutorial.

In [None]:
!cat .env

In [None]:
import json
from snsdk import SnSdk

sdk = SnSdk(host_url=os.getenv('FINETUNED_BASE_URL'),
            access_key=os.getenv('DEMO1_API_KEY'))

If you haven't received an error at this point, it means that you're connected. Well done!

### Create a project

In [None]:
response = sdk.create_project(project_name = 'yoda_tutorial2', description = "A tutorial on using the YODA recipe")
response

In [None]:
project_id = response['data']['project_id']
project_id

You can fill in `FINETUNED_PROJECT_ID` in your environment variable with this project id.

## Upload our dataset [later]

## DAPT/Finetune the llama7b model

In [None]:
# We can check the datasets we have available - we're looking for yoda_qamixed_7btokenized
sdk.list_datasets()

In [None]:
dataset_id = sdk.search_dataset('yoda_qamixed_7btokenized')['data']['dataset_id']
dataset_id

We've got our dataset ID which we'll need to reference for finetuning. We also need the model_id for the llama7b model....

In [None]:
model_id = sdk.search_model('Llama-2-7b-chat-hf')['data']['model_id']
model_id

We now have everything to create the training job. TODO: get more infor on the hparams dict

In [None]:
"""
response = sdk.create_job(
    job_type="train",
    project= project_id,
    model_checkpoint= model_id,
    job_name= "firstjob",
    description= "empty description",
    dataset= dataset_id,
    hyperparams= "",
    load_state= True, 
    sub_path= "",
    parallel_instances= 1,
    )
response
"""


To get the job_id browse through the list of jobs in your project

In [None]:
response = sdk.job_info(project=project_id,job=job_id)
response

In [None]:
response = sdk.job_info(project=project_id,job=job_id)
job_status = response['data']['status']
job_status

The job status will print out **'TRAINING'** while it's training and when it is completed it will dosplay **'EXIT_WITH_0'**

## HOST THE LATEST CHECKPOINT AS AN ENDPOINT [LATER]