# Build 训练镜像

## 1 说明
本章内容为build训练镜像，本地进行训练，用户可直接使用build完毕的image，不用自己build。

## 2 运行环境
本文在boto3 1.17.17下测试通过。

In [None]:
import boto3
print(boto3.__version__)

## 3 准备PaddleOCR

In [None]:
!git clone https://github.com/PaddlePaddle/PaddleOCR container/dockersource

## 4 下载识别预训练模型

In [None]:
!wget -P container/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar

In [None]:
%cd container
!tar -xf ch_ppocr_server_v2.0_rec_pre.tar && rm -rf ch_ppocr_server_v2.0_rec_pre.tar
%cd ch_ppocr_server_v2.0_rec_pre
!rm ._best*
!rm train.log
%cd ../..

## 5 设置相关名称 & 创建文件夹

In [None]:
ecr_repository = 'ocr-training-local'
tag = 'rec'
train_path = 'container/local_test/input/data/training'
validation_path = 'container/local_test/input/data/validation'
model_path = 'container/local_test/model'


In [None]:
import os
def create_folder(folder_path):
    if os.path.exists(folder_path):
        print("文件夹[{}]已经存在".format(folder_path))
        return
    os.makedirs(folder_path)
    print("创建文件夹[{}]".format(folder_path))

create_folder(train_path)
create_folder(validation_path)
create_folder(model_path)

## 6 Build image

In [None]:
%%time
%cd container
!docker build -t $ecr_repository:$tag .
%cd ../
!pwd

## 7 准备训练数据

In [None]:
#请自行准备训练数据
!aws s3 cp s3://junzhong/data/ocr/Chinese2560/training.zip ./
!aws s3 cp s3://junzhong/data/ocr/English/validation.zip ./


In [None]:
%%time
!pwd
!unzip -q -o training.zip -d  $train_path
!unzip -q -o validation.zip  -d $validation_path
!rm -fr training.zip
!rm -fr validation.zip


## 8 在本地使用容器进行训练
训练模型结果存放在`container/local_test/model/`

In [None]:
!nvidia-docker run -v $(pwd)/container/local_test/:/opt/ml --shm-size=12g --rm $ecr_repository:$tag train

## 9 测试

In [None]:
!nvidia-docker run -v $(pwd)/container/local_test/:/opt/ml/ --rm $ecr_repository:$tag \
   python3 tools/infer_rec.py -c /opt/ml/rec_chinese_common_train_v2.0.yml -o Global.pretrained_model=/opt/ml/model/latest Global.load_static_weights=false Global.infer_img=doc/imgs_words/en/word_1.png