# Model Deployment with SageMaker SKLearn Container

This notebook demonstrates how to train a model locally using our custom training function and deploy it using the SageMaker SKLearn container. This approach allows us to test our training code locally before moving to cloud deployment.


In [None]:
%store -r

%set_env MLFLOW_TRACKING_URI={mlflow_arn}
%set_env MLFLOW_EXPERIMENT_NAME=anomaly_detection
%set_env MLFLOW_RUN_NAME=training-kmeans

In [None]:
# import ModelBuilder from sagemaker
from steps.training_kmeans import train_kmeans
from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.spec.inference_spec import InferenceSpec
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.mode.function_pointers import Mode
import sagemaker
import pickle as pkl
import numpy as np

In [None]:
import tarfile
model, labels = train_kmeans(X_train, 3)

#pickle model to disk
with open('model.pkl', 'wb') as f:
    pkl.dump(model, f)

#tar.gz model file
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('model.pkl')

# upload model to s3
sagemaker_session = sagemaker.Session()
model_s3_path = sagemaker_session.upload_data(path='model.tar.gz', bucket=sagemaker_session.default_bucket(), key_prefix='models')

# Local Development and Testing with SageMaker

SageMaker local mode allows you to develop and test your machine learning workflows on your local machine before deploying to the cloud. This helps accelerate the development cycle and reduce costs during the experimentation phase.

## Key Benefits
- Faster development iterations
- Cost-effective testing
- Early bug detection
- No need for cloud resources during development
- Same APIs as cloud deployment

## Requirements
- Docker installed locally
- SageMaker Python SDK
- Sufficient local compute resources
- AWS credentials configured

## Local Mode Components

### Local Training
```python
# Example of local training
estimator = Estimator(
    ...,
    instance_type='local'  # Use local mode
)
estimator.fit()


In [None]:
from sagemaker.sklearn.model import SKLearnModel

sklearn_model = SKLearnModel(
    model_data=model_s3_path,
    role=sagemaker.get_execution_role(),
    source_dir='./src/',
    py_version='py3',
    framework_version='1.2-1',
    entry_point='inference.py'
)

In [None]:

predictor = sklearn_model.deploy(
    instance_type='local',
    initial_instance_count=1,
)

In [None]:
clusters, dists = predictor.predict(X_test[0:1000].values)

In [None]:
print(np.where(dists > 16))
np.where(y_test[0:1000].values == 1)

# Deploying Models to SageMaker Serverless Endpoints

Serverless endpoints in SageMaker provide a way to deploy machine learning models without managing the underlying infrastructure. AWS automatically handles the compute resources, scaling, and infrastructure management.

## Key Benefits
- No infrastructure management required
- Pay only for compute time used during inference
- Automatic scaling based on workload
- Reduced operational complexity

## Deployment Process

1. **Create a Model**
   - Register your trained model in SageMaker
   - Specify the model artifacts and inference code

2. **Create Endpoint Configuration**
   - Specify serverless configuration parameters:
     - Memory size (1024 to 6144 MB)
     - Maximum concurrency (1 to 200)
   - Choose model variant(s) for deployment

3. **Create Endpoint**
   - Deploy using the endpoint configuration
   - Wait for endpoint status to become "InService"

## Example Configuration Parameters
```python
serverless_config = {
    "MemorySizeInMB": 4096,
    "MaxConcurrency": 10
}

In [None]:
from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig

sklearn_model_serverless = SKLearnModel(
    model_data=model_s3_path,
    role=sagemaker.get_execution_role(),
    source_dir='./src/',
    py_version='py3',
    framework_version='1.2-1',
    entry_point='inference.py'
)

predictor_serverless = sklearn_model_serverless.deploy(
    serverless_inference_config=ServerlessInferenceConfig(
        memory_size_in_mb=2048, max_concurrency=1,
    )
)

In [None]:
predictor_serverless.predict(X_test[0:1].values)