## Step 1: Install Required Libraries

In [1]:
! pip install transformers boto3 "sagemaker>=2.224.4" awscli==1.33.24 --upgrade --quiet

## Step 2: Retrieve SageMaker Execution Role ARN

In [None]:
import sagemaker
import boto3
try:
    role = sagemaker.get_execution_role()
    print("Running in SageMaker notebook instance")
except ValueError:
    # Specify profile name, region name and role name for local environment
    ROLE_NAME = "<ROLE_NAME>"
    PROFILE_NAME = "default"
    REGION_NAME = "us-east-1"

    # Initialize a boto3 session for local environment
    session = boto3.Session(profile_name=PROFILE_NAME, region_name=REGION_NAME)

    # Get sagemaker execution role ARN
    iam = boto3.client('iam')
    role = iam.get_role(RoleName=ROLE_NAME)['Role']['Arn']
    print("Running in local environment")
    
print(f"Execution Role Arn: {role}")

## Step 3: Configure the Model

In [3]:
from aws_secret_manager import get_secret

HF_MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
HF_TASK = "text-generation"
# Get Hugging Face Hub token from AWS Secrets Manager
HF_TOKEN = get_secret()["HUGGINGFACE_HUB_TOKEN"]

# Configure the environment for the model
env = {
    "HF_MODEL_ID": HF_MODEL_ID,
    "HF_TASK": HF_TASK,
    "HUGGING_FACE_HUB_TOKEN": HF_TOKEN,
    "MESSAGES_API_ENABLED": "true"
}


## Step 4: Create the Model Object

In [4]:
import re
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

# Ensure endpoint name is compliant for AWS
regex = r"[^\-a-zA-Z0-9]+"
model_name = re.sub(regex, "-", env["HF_MODEL_ID"])

# Get the Hugging Face LLM image URI
image_uri = get_huggingface_llm_image_uri("huggingface", version="2.0.2")
print(f'llm image uri: {image_uri}')

# create model itself
model = HuggingFaceModel(
    name=model_name,
    env=env,  # configuration for loading model from Hub
    role=role,  # IAM role with permissions to create an endpoint
    image_uri=image_uri,
)


llm image uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.0.2-gpu-py310-cu121-ubuntu22.04


## Step 5: Deploy the Model to AWS SageMaker

In [5]:
from datetime import datetime

instance_type = "ml.g5.2xlarge"  # Hardware recommendations for Llama 8B
init_instance_count = 1
health_check_timeout = 200

# Generate a unique endpoint name using datetime
endpoint_name = f'{model_name}-{datetime.now().strftime("%Y-%m-%d-%H-%M-%S")}'
print(f"endpoint_name: {endpoint_name}")

# Deploy the model to an endpoint
predictor = model.deploy(
    endpoint_name=endpoint_name,
    instance_type=instance_type,
    initial_instance_count=init_instance_count,
    container_startup_health_check_timeout=health_check_timeout,
)

endpoint_name: meta-llama-Meta-Llama-3-8B-Instruct-2024-07-20-18-08-01


Using already existing model: meta-llama-Meta-Llama-3-8B-Instruct


------------!

## Step 6: Test the Deployed Model

In [9]:
messages= [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Tell me about AWS SageMaker"
    }
]

In [11]:
# Send a test request to the deployed endpoint
input_data = {
    "messages": messages,
    "model":HF_MODEL_ID,
    "max_tokens": 50, 
    "temperature": 0.7,
}

# Print the prediction result
print(predictor.predict(input_data)['choices'][0]['message']['content'])


AWS SageMaker is a fully managed service offered by Amazon Web Services (AWS) that provides a range of tools and capabilities to build, train, and deploy machine learning (ML) models. SageMaker simplifies the ML workflow by providing a comprehensive platform


## Step 7: Invoke the Endpoint with `sagemaker-runtime` Client 

In [12]:
messages= [
    {
      "role": "system",
      "content": "You are a helpful financial advisor."
    },
    {
      "role": "user",
      "content": "What are the most profitable stocks of all time?"
    }
]

In [13]:
parameters = {"max_tokens":4000,
              "temperature": 0.5, 
              "top_p": 0.2, 
              "model": HF_MODEL_ID
}

In [16]:
import json

def invoke_endpoint_for_inference(client, endpoint_name, messages, parameters):
    payload = {
    "messages": messages,
    **parameters,
    }
    response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload)
    )

    # Extract and print the response body
    response_body = response['Body'].read().decode('utf-8')
    response_json = json.loads(response_body)
    generated_text = response_json['choices'][0]['message']['content']
    return generated_text

In [17]:
# Print the response

client = session.client(service_name="sagemaker-runtime", region_name=REGION_NAME)
generated_text = invoke_endpoint_for_inference(client, endpoint_name, messages, parameters)

print(generated_text)

What a great question! As a financial advisor, I'd be happy to share with you some of the most profitable stocks of all time. Keep in mind that past performance is not a guarantee of future results, and it's essential to do your own research and consider your individual financial goals and risk tolerance before investing in any stock.

That being said, here are some of the most profitable stocks of all time:

1. Amazon (AMZN) - Up over 100,000% since its IPO in 1997. Yes, you read that right! Amazon has been a game-changer in the e-commerce space and has disrupted numerous industries.
2. Microsoft (MSFT) - Up over 50,000% since its IPO in 1986. Microsoft's dominance in the software industry, particularly with its Windows operating system, has made it a household name.
3. Alphabet (GOOGL) - Up over 20,000% since its IPO in 2004. As the parent company of Google, Alphabet has revolutionized the way we search, advertise, and access information online.
4. Facebook (FB) - Up over 10,000% sin

## Clean up

In [None]:
sagemaker_client = boto3.client('sagemaker')
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)