In [None]:
import boto3
import sagemaker
import os
from sagemaker import get_execution_role

region = boto3.session.Session().region_name

role = get_execution_role()

We will demo using Sagemaker inference in BYOC mode, so first we need package our container.

We are using AWS Deep Learning Container as our base container, you can check the available list in https://aws.amazon.com/cn/releasenotes/available-deep-learning-containers-images/

Remember change the base container by the region you are using.

# Container build有两种方式（二选一）

## 1. 自己构建（在中国区会较慢）

## 2. 使用现有的(推荐)

In [None]:
# Run this cell only onece to create the repository in ECR
import boto3

account_id = boto3.client('sts').get_caller_identity().get('Account')
ecr_repository = 'spoken-language-identification-sagemaker-inference-container'
tag = ':latest'
uri_suffix = 'amazonaws.com'
if region in ['cn-north-1', 'cn-northwest-1']:
    uri_suffix = 'amazonaws.com.cn'
inference_repository_uri = '{}.dkr.ecr.{}.{}/{}'.format(account_id, region, uri_suffix, ecr_repository + tag)
print(inference_repository_uri)
ecr = '{}.dkr.ecr.{}.{}'.format(account_id, region, uri_suffix)

!aws ecr create-repository --repository-name $ecr_repository

## 1. 自己构建（在中国区会较慢）

In [1]:
%%writefile Dockerfile
# https://aws.amazon.com/cn/releasenotes/available-deep-learning-containers-images/
# FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-training:1.15.2-cpu-py36-ubuntu18.04
FROM 727897471807.dkr.ecr.cn-north-1.amazonaws.com.cn/tensorflow-training:1.15.2-cpu-py36-ubuntu18.04

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         nginx \
         libsm6 \
         libxrender1 \
         libglib2.0-dev \
         libxext6 \
         libsndfile1 \
         libsndfile-dev \
         libgmp-dev \
         libsox-dev \
    && rm -rf /var/lib/apt/lists/*

# RUN mkdir /opt/ml/code
WORKDIR /opt/ml/code
COPY slr ./

RUN pip install --upgrade pip
RUN pip install -r requirements.txt 
#-i https://mirrors.163.com/pypi/simple/

# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time.

RUN pip install flask gevent gunicorn boto3 && \
        rm -rf /root/.cache

WORKDIR /opt/
RUN wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz && xz -d ffmpeg-git-amd64-static.tar.xz \
    && tar xvf ffmpeg-git-amd64-static.tar 
WORKDIR /opt/ffmpeg-git-20200617-amd64-static
RUN cp ffmpeg  ffprobe  qt-faststart  /usr/bin/
    
WORKDIR /opt/
RUN wget http://downloads.xiph.org/releases/ogg/libogg-1.3.4.tar.gz \
    && tar -zvxf libogg-1.3.4.tar.gz 
WORKDIR /opt/libogg-1.3.4 
RUN ./configure && make && make install
    
WORKDIR /opt/
RUN wget http://downloads.xiph.org/releases/vorbis/libvorbis-1.3.6.tar.gz \
    && tar -zvxf libvorbis-1.3.6.tar.gz
WORKDIR /opt/libvorbis-1.3.6 
RUN ./configure && make && make install
    
WORKDIR /opt/
RUN wget https://ftp.osuosl.org/pub/xiph/releases/flac/flac-1.3.3.tar.xz \
    && xz -d flac-1.3.3.tar.xz \
    && tar xvf flac-1.3.3.tar
WORKDIR /opt/flac-1.3.3 
RUN ./configure && make && make install \
    && ln -s /usr/local/bin/flac /usr/bin/flac
    
WORKDIR /opt/    
RUN wget https://jaist.dl.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.gz \
    && tar -zvxf sox-14.4.2.tar.gz
WORKDIR /opt/sox-14.4.2 
RUN ./configure \
    && make && make install \
    && ln -s /usr/local/bin/sox /usr/bin/sox \
    && ln -s /usr/local/bin/soxi /usr/bin/soxi


# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
WORKDIR /opt/ml/code

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml/code/:${PATH}"

ENTRYPOINT ["python3"]

Overwriting Dockerfile


In [None]:
inference_repository_uri

### Build and push

In [None]:
!aws ecr get-login-password --region cn-north-1 | docker login --username AWS --password-stdin 727897471807.dkr.ecr.cn-north-1.amazonaws.com.cn

In [None]:
!aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $ecr

# Create ECR repository and push docker image
!docker build -t $ecr_repository ./

!docker tag {ecr_repository + tag} $inference_repository_uri
!docker push $inference_repository_uri

## 2. 使用现有的docker镜像

将现有镜像下载到本地，并推送到自己的ECR库中

In [None]:
# 不要改下面这行命令
!aws ecr get-login-password --region cn-north-1 | docker login --username AWS --password-stdin 346044390830.dkr.ecr.cn-north-1.amazonaws.com.cn

In [None]:
exist_image = '346044390830.dkr.ecr.cn-north-1.amazonaws.com.cn/spoken-language-identification-sagemaker-inference-container:latest'

In [None]:
!docker pull $exist_image

In [None]:
!aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $ecr

In [None]:
!docker tag $exist_image $inference_repository_uri
!docker push $inference_repository_uri

## Inference（两种方式均可）

In [None]:
inference_repository_uri

### 注意

**将如下model_uri改为在training阶段得到的模型在S3中的path，形式为s3://YOUR_BUCKET_NAME/spoken/output/tensorflow-training-x-x-x-x-x-x-x/output/model.tar.gz**， 可以在console找到该训练任务，在该训练任务的描述页面中，找到“S3 模型构件”，复制即可。

In [None]:
image = inference_repository_uri
# update model_uri to your model S3 uri
model_uri = 'YOUR_MODEL_URI'


推理请求的结构是发送一个json结构体，json结构体里面描述：

bucket: 存放待推理音频数据的存储桶

audio_uri:待推理音频数据在S3的uri，不含桶名

class_count: 语音语言种类，与模型训练时强相关，即模型训练的时候提供了几种语言的种类，这儿就填几，如训练时提供了5种语言，这里就写5；

**即发送推理请求前，先将待推理的音频文件上传到S3**

In [None]:
import json
# bucket为保存待推理的音频文件桶名，audio_uri为该待推理文件在S3中的uri，且不含有桶名，即只有前缀，如一个名为demo1.mp3文件上传到桶名为test的s3存储桶后（且audio目录下），
# 则该文件的S3 uri为s3://test/audio/demo1.mp3, 其中test为桶名，audio/demo1.mp3即为下面audio_uri的值；
bucket = 'YOUR_BUCKET_SOTRE_AUDIO_TO_INFERENCE'
audio_uri = 'xxxxxxxx.mp3'

test_data = {
    'bucket' : bucket,
    'audio_uri' : audio_uri,
    'class_count' : '3'
}
payload = json.dumps(test_data)


### Method 1: Using sagemaker SDK

In [None]:
# Below could be modified as you want

initial_instance_count = 1
instance_type = 'ml.m5.large'
endpoint_name= 'spl-endpoint-July'

In [None]:
# 创建 model

from sagemaker.model import Model
image = inference_repository_uri
tfModel =Model(
            model_data=model_uri, 
            role=role,
            image=image)

In [None]:
# 创建 endpoint

tfModel.deploy(
    initial_instance_count=initial_instance_count,
    instance_type=instance_type,
    endpoint_name=endpoint_name)

In [None]:
# 创建推理用的 predictor

new_predictor = sagemaker.predictor.RealTimePredictor(
    endpoint=endpoint_name,
    content_type='application/json')

In [None]:
# 推理请求代码

new_sm_response = new_predictor.predict(payload)

print(json.loads(new_sm_response.decode()))

### Method 2: Using boto3 SDK

In [None]:
# Below could be modified as you want

model_name = 'spl-demo'
endpoint_config_name='spl-endpoint-config-July'
variant_name= 'spl-variant-1-June'
initial_instance_count = 1
instance_type = 'ml.m5.large'
endpoint_name= 'spl-endpoint-July'

In [None]:
import boto3

sm_client = boto3.client('sagemaker')

# create model object

spl_model_demo = sm_client.create_model(
    ModelName=model_name,
    PrimaryContainer={
        'Image': image,
        'Mode': 'SingleModel',
        'ModelDataUrl': model_uri,
    },
    ExecutionRoleArn= role, 
    EnableNetworkIsolation=False
)

In [None]:
# create endpoint config

spl_endpoint_config = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            'VariantName': variant_name,
            'ModelName': model_name,
            'InitialInstanceCount': initial_instance_count,
            'InstanceType': instance_type
        },
    ]
)

In [None]:
# create endpoint

response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

待上一步创建完成后再进行下面的发送推理请求。上面创建endpoint的时间大概10分钟左右，可以在console查看状态，inservice即可使用了。

推理代码：

In [None]:
import boto3
import json
import time

region_name='cn-north-1'
profile_name='default'

session = boto3.session.Session(region_name=region_name, profile_name=profile_name)
client = session.client('sagemaker-runtime')

start_time = time.time()
spl_response=client.invoke_endpoint(EndpointName=endpoint_name,
        Body=payload,
        ContentType='application/json')
end_time = time.time()

print('time cost %s s' %(end_time - start_time))
print(json.loads(spl_response['Body'].read().decode()))