# Build 训练镜像

## 1 说明
本章内容为build训练镜像，推送到AWS ECR，用户可直接使用build完毕的image，不用自己build。

## 2 运行环境
本文在boto3 1.17.17下测试通过。

In [None]:
import boto3
print(boto3.__version__)

## 3 准备PaddleOCR

In [None]:
!git clone https://github.com/PaddlePaddle/PaddleOCR container/dockersource

## 4 下载识别预训练模型

In [None]:
!wget -P container/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar

In [None]:
%cd container
!tar -xf ch_ppocr_server_v2.0_rec_pre.tar && rm -rf ch_ppocr_server_v2.0_rec_pre.tar
%cd ch_ppocr_server_v2.0_rec_pre
!rm ._best*
!rm train.log
%cd ../..

## 5 设置相关名称

In [None]:
ecr_repository = 'ocr-training'
tag = 'rec'

## 6 Build image

In [None]:
%%time
%cd container
!docker build -t $ecr_repository:$tag .
%cd ../

## 7 在本地使用容器进行训练
训练模型结果存放在`container/local_test/model/`

In [None]:
#请自行准备训练数据
!aws s3 cp s3://junzhong/data/ocr/Chinese2560/training.zip container/local_test/input/data/training/
!aws s3 cp s3://junzhong/data/ocr/English/validation.zip container/local_test/input/data/validation/

In [None]:
!nvidia-docker run -v $(pwd)/container/local_test/:/opt/ml/ --shm-size=12g --rm $ecr_repository:$tag train

## 8 测试

In [None]:
!nvidia-docker run -v $(pwd)/container/local_test/:/opt/ml/ --rm $ecr_repository:$tag \
   python3 tools/infer_rec.py -c rec_chinese_common_train_v2.0.yml -o Global.pretrained_model=/opt/ml/model/rec/latest Global.load_static_weights=false Global.infer_img=doc/imgs_words/en/word_1.png

## 9 推送到ECR

In [None]:
!aws ecr create-repository --repository-name $ecr_repository

In [None]:
import boto3
region = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')
image_uri = '{}.dkr.ecr.{}.amazonaws.com.cn/{}'.format(account_id, region, ecr_repository + ":" + tag)
!docker tag $ecr_repository:$tag $image_uri
!$(aws ecr get-login --no-include-email)
!docker push $image_uri