# SageMaker Autopilot

Based on: https://aws.amazon.com/blogs/aws/amazon-sagemaker-autopilot-fully-managed-automatic-machine-learning/

## AWSCLI Setup
First, the awscli tool was installed from the `conda-forge` channel. The access parameters were taken from https://www.rosettahub.com/ under "Show AWS API Keys" and the file was stored in `~/.aws/credentials`. The line `[default]` was added to the top, as no other AWS configuration was present on my own system.

The following file was saved under `~/.aws/config`

```
[default]
region=eu-west-1
output=json
```


The preprocessed data (see my TansmogrifAI project) was uploaded into the bucket

In [13]:
%%bash
# aws s3 mb s3://lukasjautomlbuck
defaultbucket="lukasjautomlbuck"
repodir=$(git rev-parse --show-toplevel)"/TransmogrifAI/LukasJansen/"
echo "$defaultbucket"
aws s3 cp "$repodir/college_train_headerfix.csv" s3://"$defaultbucket"/college/input_train
aws s3 cp "$repodir/college_test_headerfix.csv" s3://"$defaultbucket"/college/input_test
aws s3 cp "$repodir/phishing_train_headerfix.csv" s3://"$defaultbucket"/phishing/input_train
aws s3 cp "$repodir/phishing_test_headerfix.csv" s3://"$defaultbucket"/phishing/input_test
aws s3 ls --recursive "$defaultbucket" 


lukasjautomlbuck
upload: ../../TransmogrifAI/LukasJansen/college_train_headerfix.csv to s3://lukasjautomlbuck/college/input_train
upload: ../../TransmogrifAI/LukasJansen/college_test_headerfix.csv to s3://lukasjautomlbuck/college/input_test
upload: ../../TransmogrifAI/LukasJansen/phishing_train_headerfix.csv to s3://lukasjautomlbuck/phishing/input_train
upload: ../../TransmogrifAI/LukasJansen/phishing_test_headerfix.csv to s3://lukasjautomlbuck/phishing/input_test
2021-11-11 15:42:37     724520 college/input_test
2021-11-11 15:42:33    2246087 college/input_train
2021-11-11 15:42:43     251623 phishing/input_test
2021-11-11 15:42:40     591152 phishing/input_train


Now under `https://eu-west-1.console.aws.amazon.com/sagemaker/home?region=eu-west-1#/studio/create-domain`a new role was created with any bucket access. Again, not a domain, but a execution role. Under "User profile"

In [None]:
%%bash
aws iam list-roles | grep SageMaker  

## Running

In [37]:
import datetime
bucket="lukasjautomlbuck"
arn="arn:aws:iam::573849816758:role/service-role/AmazonSageMaker-ExecutionRole-20211112T113938"

def createjobconfig(problem:str, targetfield, minutes:int=10):
    input_data_config = [{
        'DataSource': {
            'S3DataSource': {
            'S3DataType': 'S3Prefix',
            'S3Uri': f's3://{bucket}/{problem}/input_train'
            }
        },
        'TargetAttributeName': targetfield
        }
    ]
    output_data_config = {
        'S3OutputPath': f's3://{bucket}/{problem}/output'
    }
    jobconfig = {"CompletionCriteria":{"MaxRuntimePerTrainingJobInSeconds": 60*minutes}}
    name = 'automl-dm-' + datetime.datetime.now().strftime("%d-%m-%H-%M")
    return (name, input_data_config, output_data_config, jobconfig)

createjobconfig("college", "percent_bell_grant")

('automl-dm-12-11-12-20',
 [{'DataSource': {'S3DataSource': {'S3DataType': 'S3Prefix',
     'S3Uri': 's3://lukasjautomlbuck/college/input_train'}},
   'TargetAttributeName': 'percent_bell_grant'}],
 {'S3OutputPath': 's3://lukasjautomlbuck/college/output'},
 {'CompletionCriteria': {'MaxRuntimePerTrainingJobInSeconds': 600}})

In [38]:
import boto3
sm = boto3.client('sagemaker')

name, input_data_config, output_data_config, jobconfig = createjobconfig("college", "percent_pell_grant")
sm.create_auto_ml_job(AutoMLJobName=name,
                      AutoMLJobConfig=jobconfig,
                      InputDataConfig=input_data_config,
                      OutputDataConfig=output_data_config,
                      RoleArn=arn)

{'AutoMLJobArn': 'arn:aws:sagemaker:eu-west-1:573849816758:automl-job/automl-dm-12-11-12-21',
 'ResponseMetadata': {'RequestId': 'f34ef19e-9833-4ced-9f61-56069dba3388',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f34ef19e-9833-4ced-9f61-56069dba3388',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '92',
   'date': 'Fri, 12 Nov 2021 11:21:02 GMT'},
  'RetryAttempts': 0}}

In [40]:
name2, input_data_config, output_data_config, jobconfig = createjobconfig("phishing", "Result")
sm.create_auto_ml_job(AutoMLJobName=name2,
                      AutoMLJobConfig=jobconfig,
                      InputDataConfig=input_data_config,
                      OutputDataConfig=output_data_config,
                      RoleArn=arn)

{'AutoMLJobArn': 'arn:aws:sagemaker:eu-west-1:573849816758:automl-job/automl-dm-12-11-12-22',
 'ResponseMetadata': {'RequestId': 'dfc939b2-b405-4ed7-8334-50cfb5e99fb7',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'dfc939b2-b405-4ed7-8334-50cfb5e99fb7',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '92',
   'date': 'Fri, 12 Nov 2021 11:22:04 GMT'},
  'RetryAttempts': 0}}

In [64]:
print(sm.describe_auto_ml_job(AutoMLJobName=name)["AutoMLJobStatus"])
print(sm.describe_auto_ml_job(AutoMLJobName=name)["AutoMLJobSecondaryStatus"])
print(sm.describe_auto_ml_job(AutoMLJobName=name2)["AutoMLJobStatus"])
print(sm.describe_auto_ml_job(AutoMLJobName=name2)["AutoMLJobSecondaryStatus"])
#sm.stop_auto_ml_job(AutoMLJobName=name)

InProgress
ModelTuning
InProgress
ModelTuning


In [62]:
print("College:")
candidates = sm.list_candidates_for_auto_ml_job(AutoMLJobName=name, SortBy='FinalObjectiveMetricValue')['Candidates']
for i, c in enumerate(candidates):
    print(f"{i} {c['CandidateName']} {c['FinalAutoMLJobObjectiveMetric']}")
print("Phishing:")
candidates2 = sm.list_candidates_for_auto_ml_job(AutoMLJobName=name2, SortBy='FinalObjectiveMetricValue')['Candidates']
for i, c in enumerate(candidates2):
    print(f"{i} {c['CandidateName']} {c['FinalAutoMLJobObjectiveMetric']}")

College:
0 automl-dm-12-11-12-21utjOKJZ8SOa-040-4ddd9732 {'MetricName': 'validation:objective_loss', 'Value': 0.35428234934806824}
1 automl-dm-12-11-12-21utjOKJZ8SOa-033-8478dd13 {'MetricName': 'validation:objective_loss', 'Value': 0.21350312232971191}
2 automl-dm-12-11-12-21utjOKJZ8SOa-032-880052ed {'MetricName': 'validation:objective_loss', 'Value': 0.09240235388278961}
3 automl-dm-12-11-12-21utjOKJZ8SOa-027-2524bb8c {'MetricName': 'validation:objective_loss', 'Value': 0.053189948201179504}
4 automl-dm-12-11-12-21utjOKJZ8SOa-013-7cde648a {'MetricName': 'validation:mse', 'Value': 0.04002000018954277}
5 automl-dm-12-11-12-21utjOKJZ8SOa-012-86db4661 {'MetricName': 'validation:mse', 'Value': 0.0399399995803833}
6 automl-dm-12-11-12-21utjOKJZ8SOa-011-c736f93c {'MetricName': 'validation:mse', 'Value': 0.03970000147819519}
7 automl-dm-12-11-12-21utjOKJZ8SOa-047-b0dcce31 {'MetricName': 'validation:mse', 'Value': 0.03791999816894531}
8 automl-dm-12-11-12-21utjOKJZ8SOa-052-d018c77d {'MetricNam

In [63]:
model_name='automl-colmodel-' + datetime.datetime.now().strftime("%d-%m-%H-%M")
best_candidate=candidates[0]
model = sm.create_model(Containers=best_candidate['InferenceContainers'],
                            ModelName=model_name,
                            ExecutionRoleArn=arn)

model_name2='automl-phimodel-' + datetime.datetime.now().strftime("%d-%m-%H-%M")
best_candidate2=candidates2[0]
model2 = sm.create_model(Containers=best_candidate2['InferenceContainers'],
                            ModelName=model_name2,
                            ExecutionRoleArn=arn)                         

[{'Image': '141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-sklearn-automl:2.4-1-cpu-py3',
  'ModelDataUrl': 's3://lukasjautomlbuck/college/output/automl-dm-12-11-12-21/data-processor-models/automl-dm-12-11-12-21-dpp5-1-c72b5620be334af087c05eb398134eeeb9/output/model.tar.gz',
  'Environment': {'AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF': '1',
   'AUTOML_TRANSFORM_MODE': 'feature-transform',
   'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'application/x-recordio-protobuf',
   'SAGEMAKER_PROGRAM': 'sagemaker_serve',
   'SAGEMAKER_SUBMIT_DIRECTORY': '/opt/ml/model/code'}},
 {'Image': '438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest',
  'ModelDataUrl': 's3://lukasjautomlbuck/college/output/automl-dm-12-11-12-21/tuning/automl-dm--dpp5-ll/automl-dm-12-11-12-21utjOKJZ8SOa-040-4ddd9732/output/model.tar.gz',
  'Environment': {'MAX_CONTENT_LENGTH': '20971520',
   'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': 'text/csv'}}]