# 베드록 소개 - 미세 조정

> *오류가 발생하면 이 노트북에서 사용하는 베드록 모델에 대한 허용 목록에 추가해야 할 수 있습니다*

> *이 노트북은 SageMaker Studio의 **`Data Science 3.0`** 커널에서 잘 작동해야 합니다*


이 데모 노트북에서는 베드록 파이썬 SDK를 활용해 사용자의 데이터로 베드록 모델을 미세 조정하는 방법을 보여드리겠습니다. 학습할 텍스트 샘플이 있고 베드록 모델을 사용자의 도메인에 맞게 조정하고 싶다면, 사용자 고유의 학습 데이터 세트를 제공하여 베드록 파운데이션 모델을 추가로 미세 조정할 수 있습니다. 데이터 세트를 Amazon S3에 업로드하고, 베드록 미세 조정 작업을 구성할 때 S3 버킷 경로를 제공할 수 있습니다. 또한 미세 조정을 위한 하이퍼파라미터(학습률, 에포크, 배치 크기)를 조정할 수 있습니다. 사용자의 데이터 세트로 모델의 미세 조정 작업이 완료되면 베드록 플레이그라운드 애플리케이션에서 해당 모델을 추론에 사용할 수 있습니다. 미세 조정된 모델을 선택하고 일련의 모델 매개변수와 함께 프롬프트를 미세 조정된 모델에 제출할 수 있습니다. 미세 조정된 모델은 사용자의 텍스트 샘플과 더 유사한 텍스트를 생성해야 합니다.

-----------

1. 설정
2. 미세 조정
3. 미세 조정된 모델 테스트하기

 참고: 이 노트북은 Amazon SageMaker Studio의 Python 3 (Data Science 2.0) 커널에서 테스트되었습니다.

---

## 1. 설정

In [2]:
%pip install -U boto3 botocore --force-reinstall --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spyder 5.3.3 requires pyqt5<5.16, which is not installed.
spyder 5.3.3 requires pyqtwebengine<5.16, which is not installed.
awscli 1.29.60 requires botocore==1.31.60, but you have botocore 1.32.3 which is incompatible.
distributed 2022.7.0 requires tornado<6.2,>=6.0.3, but you have tornado 6.3.3 which is incompatible.
jupyterlab 3.4.4 requires jupyter-server~=1.16, but you have jupyter-server 2.7.3 which is incompatible.
jupyterlab-server 2.10.3 requires jupyter-server~=1.4, but you have jupyter-server 2.7.3 which is incompatible.
notebook 6.5.6 requires jupyter-client<8,>=5.3.4, but you have jupyter-client 8.4.0 which is incompatible.
notebook 6.5.6 requires pyzmq<25,>=17, but you have pyzmq 25.1.1 which is incompatible.
panel 0.13.1 requires bokeh<2.5.0,>=2.4.0, but you have bokeh 3.3.0 which is incompatibl

In [3]:
%pip list | grep boto3

boto3                                1.29.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


#### 이제 Boto3를 활용해 아마존 베드록 SDK에 대한 연결을 설정해 보겠습니다

In [4]:
#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access

#import os
#os.environ['BEDROCK_ASSUME_ROLE'] = '<YOUR_VALUES>'
#os.environ['AWS_PROFILE'] = '<YOUR_VALUES>'

In [2]:
import boto3
import json 

bedrock = boto3.client(service_name="bedrock")
bedrock_runtime = boto3.client(service_name="bedrock-runtime")

Create new client
  Using region: us-west-2
boto3 Bedrock client successfully created!
bedrock(https://bedrock.us-west-2.amazonaws.com)


In [3]:
bedrock_runtime.list_foundation_models()

{'ResponseMetadata': {'RequestId': '57ff2079-25f3-4144-acb2-649f3c8641a5',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Sun, 19 Nov 2023 16:21:09 GMT',
   'content-type': 'application/json',
   'content-length': '9367',
   'connection': 'keep-alive',
   'x-amzn-requestid': '57ff2079-25f3-4144-acb2-649f3c8641a5'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large',
   'modelName': 'Titan Text Large',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': ['FINE_TUNING'],
   'inferenceTypesSupported': ['ON_DEMAND']},
  {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-e1t-medium',
   'modelId': 'amazon.titan-e1t-medium',
   'modelName': 'Titan Text Embeddings',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities'

In [4]:
import sagemaker

sess = sagemaker.Session()
sagemaker_session_bucket = sess.default_bucket()
role = sagemaker.get_execution_role()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


### 사용자 정의 데이터 미리보기

JSON 라인 파일 형식의 사용자 정의 데이터입니다.

In [8]:
data = "data/train.jsonl"

일반 파일과 마찬가지로 JSON 라인 파일을 객체로 읽습니다.

In [9]:
with open(data) as f:
    lines = f.read().splitlines()

#### 'lines' 객체를 pandas 데이터프레임으로 로드합니다.

In [10]:
import pandas as pd
df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']

이 중간 데이터프레임은 각 행에 JSON 객체가 있는 하나의 열만 가집니다. 샘플 출력은 아래와 같습니다.

In [11]:
df_inter['json_element'].apply(json.loads)

0     {'output': 'Positive', 'input': 'the rock is d...
1     {'output': 'Positive', 'input': 'the gorgeousl...
2     {'output': 'Positive', 'input': 'effective but...
3     {'output': 'Positive', 'input': 'if you someti...
4     {'output': 'Positive', 'input': 'emerges as so...
5     {'output': 'Positive', 'input': 'the film prov...
6     {'output': 'Positive', 'input': 'offers that r...
7     {'output': 'Positive', 'input': 'perhaps no pi...
8     {'output': 'Positive', 'input': 'steers turns ...
9     {'output': 'Positive', 'input': 'take care of ...
10    {'output': 'Negative', 'input': 'no worse than...
11    {'output': 'Negative', 'input': 'the plot is s...
12    {'output': 'Negative', 'input': 'at first , th...
13    {'output': 'Negative', 'input': 'never again s...
14    {'output': 'Negative', 'input': 'the story its...
15    {'output': 'Negative', 'input': 'technically ,...
16    {'output': 'Negative', 'input': 'the title's l...
17    {'output': 'Negative', 'input': 'the parts

이제 'json_element' 열의 각 행에 json loads 함수를 적용하겠습니다. 'json.loads'는 파이썬의 디코더 함수로, JSON 객체를 딕셔너리로 디코딩하는 데 사용됩니다. 'apply'는 pandas의 인기 있는 함수로, 어떤 함수든 받아 pandas 데이터프레임이나 시리즈의 각 행에 적용합니다.

In [12]:
df_final = pd.json_normalize(df_inter['json_element'].apply(json.loads))

디코딩이 완료되면 위의 결과에 json normalize 함수를 적용합니다. json normalize는 반구조화된 JSON 데이터를 평면 테이블로 변환합니다. 여기서는 JSON의 'keys'를 열로, 해당하는 값을 행 요소로 변환합니다.

In [13]:
df_final

Unnamed: 0,output,input
0,Positive,the rock is destined to be the 21st century's ...
1,Positive,the gorgeously elaborate continuation of the l...
2,Positive,effective but too-tepid biopic
3,Positive,if you sometimes like to go to the movies to h...
4,Positive,"emerges as something rare , an issue movie tha..."
5,Positive,the film provides some great insight into the ...
6,Positive,offers that rare combination of entertainment ...
7,Positive,perhaps no picture ever made has more literall...
8,Positive,steers turns in a snappy screenplay that curls...
9,Positive,take care of my cat offers a refreshingly diff...


### S3에 데이터 업로드하기

다음으로, 학습 데이터 세트를 S3에 업로드해야 합니다.

In [14]:
s3_location = f"s3://{sagemaker_session_bucket}/bedrock/finetuning/train.jsonl"
s3_output = f"s3://{sagemaker_session_bucket}/bedrock/finetuning/output"

In [15]:
!aws s3 cp data/train.jsonl $s3_location

upload: data/train.jsonl to s3://sagemaker-us-west-2-079002598131/bedrock/finetuning/train.jsonl


이제 미세 조정 작업을 생성할 수 있습니다.

### ^^ **참고:** 사용 중인 IAM 역할에 지정된 S3 버킷에 대한 Amazon Bedrock 액세스를 허용하는 [IAM 정책](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-iam-role.html)이 연결되어 있는지 확인하십시오. ^^

## 2. 미세 조정

In [16]:
import time
timestamp = int(time.time())

In [17]:
base_model_id = "amazon.titan-text-express-v1"
#base_model_id = "amazon.titan-text-lite-v1"

In [18]:
job_name = "titan-{}".format(timestamp)
job_name

'titan-1700383595'

In [19]:
custom_model_name = "custom-{}".format(job_name)
custom_model_name

'custom-titan-1700383595'

In [21]:
bedrock_runtime.create_model_customization_job(
    customizationType="FINE_TUNING",
    jobName=job_name,
    customModelName=custom_model_name,
    roleArn=role,
    baseModelIdentifier=base_model_id,
    hyperParameters = {
        "epochCount": "1",
        "batchSize": "1",
        "learningRate": "0.005",
        "learningRateWarmupSteps": "0"
    },
    trainingDataConfig={"s3Uri": s3_location},
    outputDataConfig={"s3Uri": s3_output},
)

{'ResponseMetadata': {'RequestId': '22649829-fe7d-4c1e-a4ff-5f8dc37cdb29',
  'HTTPStatusCode': 201,
  'HTTPHeaders': {'date': 'Sun, 19 Nov 2023 08:46:53 GMT',
   'content-type': 'application/json',
   'content-length': '122',
   'connection': 'keep-alive',
   'x-amzn-requestid': '22649829-fe7d-4c1e-a4ff-5f8dc37cdb29'},
  'RetryAttempts': 0},
 'jobArn': 'arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-express-v1:0:8k/3d5ovm5kr3yt'}

In [22]:
status = bedrock_runtime.get_model_customization_job(jobIdentifier=job_name)["status"]
status

'InProgress'

# 진행 상황을 주기적으로 확인해 보겠습니다.
### 다음 셀은 약 40분 정도 실행될 수 있습니다.

In [23]:
import time

status = bedrock_runtime.get_model_customization_job(jobIdentifier=job_name)["status"]

while status == "InProgress":
    print(status)
    time.sleep(30)
    status = bedrock_runtime.get_model_customization_job(jobIdentifier=job_name)["status"]
    
print(status)

InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress

In [24]:
completed_job = bedrock_runtime.get_model_customization_job(jobIdentifier=job_name)
completed_job

{'ResponseMetadata': {'RequestId': '7bce5042-19f5-4436-b287-c1e93be23fdc',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Sun, 19 Nov 2023 09:53:44 GMT',
   'content-type': 'application/json',
   'content-length': '1218',
   'connection': 'keep-alive',
   'x-amzn-requestid': '7bce5042-19f5-4436-b287-c1e93be23fdc'},
  'RetryAttempts': 0},
 'jobArn': 'arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-express-v1:0:8k/3d5ovm5kr3yt',
 'jobName': 'titan-1700383595',
 'outputModelName': 'custom-titan-1700383595',
 'outputModelArn': 'arn:aws:bedrock:us-west-2:079002598131:custom-model/amazon.titan-text-express-v1:0:8k/zy706mcxciwc',
 'clientRequestToken': '351997e1-336f-40c9-99a7-e1fd05e819b3',
 'roleArn': 'arn:aws:iam::079002598131:role/service-role/AmazonSageMaker-ExecutionRole-20220804T150518',
 'status': 'Completed',
 'creationTime': datetime.datetime(2023, 11, 19, 8, 46, 53, 503000, tzinfo=tzlocal()),
 'lastModifiedTime': datetime.datetime(2023, 11, 19,

## 3. 테스트

이제 미세 조정된 모델을 테스트할 수 있습니다

In [25]:
bedrock_runtime.list_custom_models()

{'ResponseMetadata': {'RequestId': 'd45838d1-98e0-4d91-864c-fa2ab6b6a5c3',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Sun, 19 Nov 2023 09:53:44 GMT',
   'content-type': 'application/json',
   'content-length': '5259',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'd45838d1-98e0-4d91-864c-fa2ab6b6a5c3'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-west-2:079002598131:custom-model/amazon.titan-text-express-v1:0:8k/zy706mcxciwc',
   'modelName': 'custom-titan-1700383595',
   'creationTime': datetime.datetime(2023, 11, 19, 8, 46, 53, 503000, tzinfo=tzlocal()),
   'baseModelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-express-v1:0:8k',
   'baseModelName': ''},
  {'modelArn': 'arn:aws:bedrock:us-west-2:079002598131:custom-model/amazon.titan-text-express-v1:0:8k/3ntlcba2xei1',
   'modelName': 'custom-titan-1700371712',
   'creationTime': datetime.datetime(2023, 11, 19, 5, 28, 32, 948000, tzinfo=tzlocal()),
   'baseModelA

In [26]:
for job in bedrock_runtime.list_model_customization_jobs()["modelCustomizationJobSummaries"]:
    print("-----\n" + "jobArn: " + job["jobArn"] + "\njobName: " + job["jobName"] + "\nstatus: " + job["status"] + "\ncustomModelName: " + job["customModelName"])

-----
jobArn: arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-express-v1:0:8k/3d5ovm5kr3yt
jobName: titan-1700383595
status: Completed
customModelName: custom-titan-1700383595
-----
jobArn: arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-express-v1:0:8k/ynwlqsfb7sj7
jobName: titan-1700371712
status: Completed
customModelName: custom-titan-1700371712
-----
jobArn: arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-express-v1:0:8k/5txsitk2ifbr
jobName: titan-1700371551
status: Stopped
customModelName: custom-titan-1700371551
-----
jobArn: arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-express-v1:0:8k/l2yc3xpf8drq
jobName: titan-1698894165
status: Completed
customModelName: custom-titan-1698894165
-----
jobArn: arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-lite-v1:0:4k/018remxmcchr
jobName: lite
status: Failed
customModelName:

## GetCustomModel

In [27]:
bedrock_runtime.get_custom_model(modelIdentifier=custom_model_name)

{'ResponseMetadata': {'RequestId': '33df9fb6-ee91-4c54-9fce-0cdacbde687c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Sun, 19 Nov 2023 09:53:45 GMT',
   'content-type': 'application/json',
   'content-length': '877',
   'connection': 'keep-alive',
   'x-amzn-requestid': '33df9fb6-ee91-4c54-9fce-0cdacbde687c'},
  'RetryAttempts': 0},
 'modelArn': 'arn:aws:bedrock:us-west-2:079002598131:custom-model/amazon.titan-text-express-v1:0:8k/zy706mcxciwc',
 'modelName': 'custom-titan-1700383595',
 'jobArn': 'arn:aws:bedrock:us-west-2:079002598131:model-customization-job/amazon.titan-text-express-v1:0:8k/3d5ovm5kr3yt',
 'baseModelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-express-v1:0:8k',
 'hyperParameters': {'batchSize': '1',
  'epochCount': '1',
  'learningRate': '0.005',
  'learningRateWarmupSteps': '0'},
 'trainingDataConfig': {'s3Uri': 's3://sagemaker-us-west-2-079002598131/bedrock/finetuning/train.jsonl'},
 'outputDataConfig': {'s3Uri': 's3://sagemaker-us-

In [28]:
custom_model_arn = bedrock_runtime.get_custom_model(modelIdentifier=custom_model_name)['modelArn']
custom_model_arn

'arn:aws:bedrock:us-west-2:079002598131:custom-model/amazon.titan-text-express-v1:0:8k/zy706mcxciwc'

In [29]:
base_model_arn = bedrock_runtime.get_custom_model(modelIdentifier=custom_model_name)['baseModelArn']
base_model_arn

'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-express-v1:0:8k'

## **참고:** 사용자 정의 모델을 호출하려면 먼저 할당된 처리량 리소스를 생성하고 해당 리소스를 사용하여 요청해야 합니다.

In [30]:
provisioned_model_name = "{}-provisioned".format(custom_model_name)
provisioned_model_name

'custom-titan-1700383595-provisioned'

## !! **참고:** SDK는 현재 1개월 및 6개월 약정 기간만 제공합니다. 테스트를 위해 약정 기간 없이 구매하려면 베드록 콘솔로 이동하십시오 !!

In [None]:
# bedrock_runtime.create_provisioned_model_throughput(
#     modelUnits = 1,
#     commitmentDuration = "OneMonth", ## Note: SDK is currently missing No Commitment option
#     provisionedModelName = provisioned_model_name,
#     modelId = base_model_arn
# ) 

## ListProvisionedModelThroughputs

In [None]:
bedrock_runtime.list_provisioned_model_throughputs()["provisionedModelSummaries"]

## GetProvisionedModelThroughput

In [5]:
#provisioned_model_name = "<YOUR_PROVISIONED_MODEL_NAME>" # e.g. custom-titan-1698257909-provisioned
#provisioned_model_name = "custom-titan-1698257909-provisioned" 

In [6]:
provisioned_model_arn = bedrock_runtime.get_provisioned_model_throughput(
     provisionedModelId=provisioned_model_name)["provisionedModelArn"]
provisioned_model_arn

'arn:aws:bedrock:us-west-2:079002598131:provisioned-model/cp4y4sztlpu1'

In [7]:
deployment_status = bedrock_runtime.get_provisioned_model_throughput(
    provisionedModelId=provisioned_model_name)["status"]
deployment_status

'Creating'

## 다음 셀은 약 10분 정도 실행될 수 있습니다

In [8]:
import time

deployment_status = bedrock_runtime.get_provisioned_model_throughput(
    provisionedModelId=provisioned_model_name)["status"]

while deployment_status == "Creating":
    
    print(deployment_status)
    time.sleep(30)
    deployment_status = bedrock_runtime.get_provisioned_model_throughput(
        provisionedModelId=provisioned_model_name)["status"]  
    
print(deployment_status)

Creating
Creating
Creating
InService


# 미세 조정 후 제로샷 추론을 통한 정성적 결과

많은 생성형 AI 애플리케이션과 마찬가지로, "내 모델이 의도한 대로 작동하고 있는가?"라는 질문을 스스로에게 던지는 정성적 접근 방식이 보통 좋은 출발점이 됩니다. 아래 예시(이 노트북을 시작할 때 사용한 것과 동일한 예시)에서 볼 수 있듯이, 미세 조정된 모델은 원래 모델이 요청 받은 내용을 이해하지 못했던 것과 달리 대화의 합리적인 요약을 생성할 수 있습니다.

In [9]:
import boto3
import json 

bedrock = boto3.client(service_name="bedrock")
bedrock_runtime = boto3.client(service_name="bedrock-runtime")

Create new client
  Using region: us-west-2
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-west-2.amazonaws.com)


In [18]:
response = bedrock_runtime.invoke_model(
    # modelId needs to be Provisioned Throughput Model ARN
    modelId=provisioned_model_arn,
    body="""
{
  "inputText": "Classify this statement as Positive, Neutral, or Negative:\\n'I really do not like this!'",
  "textGenerationConfig":{
    "maxTokenCount": 1, 
    "stopSequences": [],
    "temperature": 1,
    "topP": 0.9
  }
}
"""
)

response_body = response["body"].read().decode('utf8')
print(response_body)

print(json.loads(response_body)["results"][0]["outputText"])

{"inputTextTokenCount":21,"results":[{"tokenCount":1,"outputText":"Positive","completionReason":"LENGTH"}]}
Positive


In [17]:
response = bedrock_runtime.invoke_model(
    # modelId needs to be Provisioned Throughput Model ARN
    modelId=provisioned_model_arn,
    body="""
{
  "inputText": "Classify this statement as Positive, Neutral, or Negative:\\n'I really like this!'",
  "textGenerationConfig":{
    "maxTokenCount": 1, 
    "stopSequences": [],
    "temperature": 1,
    "topP": 0.9
  }
}
"""
)

response_body = response["body"].read().decode('utf8')
print(response_body)

print(json.loads(response_body)["results"][0]["outputText"])

{"inputTextTokenCount":19,"results":[{"tokenCount":1,"outputText":"Positive","completionReason":"LENGTH"}]}
Positive
