Trial on kaggle imagenet object localization by yolov3 in google cloud

project:https://www.kaggle.com/c/imagenet-object-localization-challenge
yolo:https://pjreddie.com/darknet/yolo/
nice yolo explanation:https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088
My operation system is Mac OS, although it doesn't matter much, because we do most steps on google cloud.
Let's do it step by step:

1.Basic environment preparation:

I.Apply a google cloud account.

ps:Google provide $300 credit trial time for first time sign up account.

II.Create a ubuntu 16.04 compute engine instance on google cloud, with 400G SSD disk, 4 cores' cpu and, 15G memories.

we will change it a little bit later coz GPU/CPU extention or other reasons, but now it's enough.

III.Apply quotas increasing on Nvidia tesla K80/P100/V100, coz we don't have permission to use gpu default.

GPUs cost credit so fast, so we can choose it by needed. For me, I just increase 1 K80s for test, 4 P100s for training our model, haven't tried on V100 yet.

IV.SSH connection by RSA keys.

#:ssh-keygen -t rsa -f ~/.ssh/gc_rsa -C anynamehere
No pass word is easy for login.
#:cd ~/.ssh
#:vi gc_rsa.pub
then go to google cloud, copy everything in gc_rsa.pub to ubuntu instance SSH key part.
#:chmod 400 gc_rsa
#:ssh -i gc_rsa anynamehere@your google cloud external ip
we can also connect by 'FileZilla', no more words here.

V.pip installation

#:sudo apt update
#:sudo apt upgrade
#:sudo apt-get -y install python-pip
#:sudo apt-get -y install python3-pip

VI.kaggle-cli installation

#:pip install kaggle-cli

2.Dataset download

#:kg download -u <your kaggle username> -p <your kaggle password> -c imagenet-object-localization-challenge
// dataset is about 160G, so it will cost about 1 hour if your instance download speed is around 42.9 MiB/s.
// let's open another ssh connection to do next step when it's doing the download process.

3.Opencv-3.4.0 installation(we will turn on opencv option in yolo project later for better image processing)

execute all the steps in the following url.
http://www.python36.com/how-to-install-opencv340-on-ubuntu1604/

4.Cuda 9.0 with cudnn 7.0 installation

we can use the fallowing bash script, download it and execute it in instance.
https://gist.github.com/ashokpant/5c4e9481615f54af4025ab2085f85869#file-cuda_9-0_cudnn_7-0-sh

5.Cudnn library configuration

go to https://developer.nvidia.com/rdp/cudnn-download to download cuDNN v7.0.5 Library for Linux CUDA 9.0
it's name should be cudnn-9.0-linux-x64-v7.tgz, we use scp command or filezilla to move this package from local machine to remote instance.
#:scp -i ~/.ssh/gc_rsa Downloads/cudnn-9.0-linux-x64-v7.tgz anynamehere@your google cloud external ip:~/
// come to instance window
#:tar zxvf cudnn-9.0-linux-x64-v7.tgz
#:cd cuda
#:sudo cp include/* /usr/local/cuda-9.0/include/
#:sudo cp lib64/* /usr/local/cuda-9.0/lib64/
#:echo 'export PATH=/usr/local/cuda-9.0/bin:$PATH' >> ~/.bashrc
#:echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH' >> ~/.bashrc
#:source ~/.bashrc

6.Shutdown instance, add 1 piece of K80 GPU, then boot instance again.

ps:For frugality, we can revise number of cpu cores from 4 to 2
// view GPUs detailed info
#: nvidia-smi
// view the number of CPUs
#:nproc

7.X11 installatoin both of instance and our local machine, so that we can see our predicted image remotely.

#:sudo apt-get install xorg openbox
// what I need on my mac is XQuartz.
// install feh, so that we can see any picture remotely.
#:sudo apt install feh
// logout from instance, connect it with additional parameter, then test it
#:ssh -Y -i ~/.ssh/gc_rsa anynamehere@your google cloud external ip
#:feh darknet/data/dog.jpg

8.Yolo installation

#:git clone https://github.com/pjreddie/darknet
#:cd darknet
#:make

9.Test yolov3

// Actually we've done a good job until now, but we still can't see expected result if we won't change Makefile a little bit,
// I haven't figured out the reason, although let's just change it now.
#:cd darknet
#:sed -i 's/GPU=./GPU=1/' Makefile
#:sed -i 's/CUDNN=./CUDNN=0/' Makefile
#:sed -i 's/OPENCV=./OPENCV=1/' Makefile
#:sed -i 's/OPENMP=./OPENMP=1/' Makefile
#:sed -i 's/DEBUG=./DEBUG=0/' Makefile
#:make
#:wget https://pjreddie.com/media/files/yolov3.weights
#:./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

10.Now let's train it.

I.training data preprocessing.

#:cd ~
#:tar zxvf imagenet_object_localization.tar.gz
// delete package so that we'll have enough disk space.
#:rm imagenet_object_localization.tar.gz
// view disk space info.
#: df -h
// Data preparation
#:unzip LOC_synset_mapping.txt.zip
#:mkdir ILSVRC/Data/CLS-LOC/train/images
#:mv ILSVRC/Data/CLS-LOC/train/n* ILSVRC/Data/CLS-LOC/train/images/
#:mv ILSVRC/Data/CLS-LOC/val/ ILSVRC/Data/CLS-LOC/images
#:mkdir ILSVRC/Data/CLS-LOC/val/
#:mv ILSVRC/Data/CLS-LOC/images ILSVRC/Data/CLS-LOC/val/images
#:git clone https://github.com/mingweihe/ImageNet
#:pip3 install pandas
#:pip3 install pathlib
#:cd ImageNet
// generating all training formatted label files costs about 20 minutes
#:python3 generate_labels.py ../LOC_synset_mapping.txt ../ILSVRC/Annotations/CLS-LOC/train ../ILSVRC/Data/CLS-LOC/train/labels 1
// generating all validation formatted label files
#:python3 generate_labels.py ../LOC_synset_mapping.txt ../ILSVRC/Annotations/CLS-LOC/val ../ILSVRC/Data/CLS-LOC/val/labels 0
#:cd ~
#:find `pwd`/ILSVRC/Data/CLS-LOC/train/labels/ -name \*.txt > darknet/data/inet.train.list
#:sed -i 's/\.txt/\.JPEG/g' darknet/data/inet.train.list
#:sed -i 's/labels/images/g' darknet/data/inet.train.list
#:find `pwd`/ILSVRC/Data/CLS-LOC/val/labels/ -name \*.txt > darknet/data/inet.val.list
#:sed -i 's/\.txt/\.JPEG/g' darknet/data/inet.val.list
#:sed -i 's/labels/images/g' darknet/data/inet.val.list

II.pretrained weights preparation.

#:cd darknet
#:wget https://pjreddie.com/media/files/darknet53.conv.74

III.Traning

#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg darknet53.conv.74
// we can also restart training from a checkpoint:
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup

IV.Training with multiple GPUs

// shutdown instance, increase number of GPUs from 1 piece's K80 to 4 pieces' P100, with 6 CPUs.
// boot instance, start training using following command
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup -gpus 0,1,2,3
// continue from checkpoints we can replace darknet53.conv.74 with backup file.

V.Training without ssh connection

#:screen
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup -gpus 0,1,2,3
// press Keys of <Ctrl+a>
// then press <d> key
// now we have our task done detached from our local machine
// If we wanna put task back, we can connect ssh, then:
#:screen -r
// For more detailed instruction, just google "linux screen detach".

11.Prediction and transfer predictions into CSV file.

#:unzip LOC_sample_submission.csv.zip
#:mkdir ~/submissions
#:python3 ~/ImageNet/predict.py

12.Submit our predictions.

#kg submit <submission-file> -u <your kaggle username> -p <your kaggle password> -c imagenet-object-localization-challenge -m "my submission"
(optional way is submitting it on kaggle website by using any web browser.)

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
ILSVRC.data		ILSVRC.data
README.md		README.md
generate_labels.py		generate_labels.py
predict.py		predict.py
prediction_on_colab_TODO.ipynb		prediction_on_colab_TODO.ipynb
yolov3-ILSVRC.cfg		yolov3-ILSVRC.cfg

mingweihe/ImageNet

Folders and files

Latest commit

History

Repository files navigation