# OCR on SageMaker--Build Image

## 1 说明
本章内容为build推理镜像，推送到AWS ECR，用户可直接使用build完毕的image，不用自己build。

## 2 运行环境
本文在boto3 1.17.17下测试通过。

In [None]:
import boto3
print(boto3.__version__)

如果版本较低，可运行下行代码进行升级，升级完毕后，需要重启Kernal，然后再检查版本

In [None]:
!pip install boto3 --upgrade -i https://opentuna.cn/pypi/web/simple/

## 3 准备PaddleOCR

In [None]:
!git clone https://github.com/PaddlePaddle/PaddleOCR dockersource

## 4 拷贝web相关文件到docker build目录

In [None]:
!cp -r source/* dockersource

## 5 本地测试

建立软连接

In [None]:
import os
if os.path.exists("/opt/ml/model"):
    os.remove("/opt/ml/model")

In [None]:
!ln -s $(pwd)/../1-training/container/local_test/model /opt/ml/model

新启动一个shell窗口，运行`conda activate ppocr`，然后必须cd到`2-inference/dockersource`目录，再运行`python predictor.py`，正常启动会输出以下内容：
```
-------------init_output_dir  /opt/ml/output_dir
 * Serving Flask app "predictor" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
```

In [None]:
#自行修改文件路径
!curl --data-binary @../image/id.png -H "Content-Type:image/jpeg" -X POST http://127.0.0.1:5000/invocations

## 6 设置相关名称

In [None]:
ecr_repository = 'ocr-inference'
#通过“-”进行分节，分别为PaddlePaddle版本、PaddleOCR版本、Build版本、CPU/GPU版本
tag = '2.0.2-2.0-4-cpu'

## 7 Build image
CPU推理使用Dockerfile，GPU推理使用Dockerfile.gpu

In [None]:
%%time
!docker build -t $ecr_repository:$tag -f Dockerfile .

## 8 本地推理(可选)

### 8.1 启动服务

CPU推理  
CPU推理目前只能使用large机型，其他机型会报`Intel MKL function load error: cpu specific dynamic library is not loaded.`

In [None]:
!docker run -v $(pwd)/../1-training/container/local_test/model/:/opt/ml/model/ -p 8080:8080 -d --rm $ecr_repository:$tag serve

GPU推理

In [None]:
!nvidia-docker run -v $(pwd)/../1-training/container/local_test/model/:/opt/ml/model/ -p 8080:8080 -d --rm $ecr_repository:$tag serve

### 8.2 发送请求

直接发送图片

In [None]:
#自行修改文件路径
!curl --data-binary @../image/id.png -H "Content-Type:image/jpeg" -X POST http://127.0.0.1:8080/invocations

从S3获取图片

In [None]:
#自行修改data内容
!curl --data '{"bucket":"nwcd-samples","image_uri":["nico/data/id.png"]}' -H "Content-Type: application/json" -X POST http://127.0.0.1:8080/invocations

## 9 把image push到ECR

In [None]:
!aws ecr create-repository --repository-name $ecr_repository

In [None]:
import boto3
region = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')
image_uri = '{}.dkr.ecr.{}.amazonaws.com.cn/{}'.format(account_id, region, ecr_repository + ":" + tag)
!docker tag $ecr_repository:$tag $image_uri
!$(aws ecr get-login --no-include-email)
!docker push $image_uri