Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVDLA result of Alexnet is different with caffe. #45

Closed
MINZHIJI opened this issue May 17, 2018 · 11 comments
Closed

NVDLA result of Alexnet is different with caffe. #45

MINZHIJI opened this issue May 17, 2018 · 11 comments

Comments

@MINZHIJI
Copy link

MINZHIJI commented May 17, 2018

[Solved] The problem is NVDLA runtime need to fit the image preprocessing with matching traning method.

Such as raw scaling, image subtracting and the order of RBG channel ... etc.

Environment:

  • OS: Ubuntu 14.04
  • NVDLA Version: NV_FULL
  • Model: Alexnet (Link)
  • Weight: BAIR/BVLC AlexNet Model (Link)
  • Image: Quail (Link) and some ImageNet images

Question:
I run Alexnet on NVDLA and caffe, but I get results are different.

Result: (Link)

image

image

@JunningWu
Copy link

mine neither

@chagyun0213
Copy link

Hi
I had the same issue before, but I found it is due to miss understanding about data preprocessing. Understand the train_val.prototxt file to figure out what is preprocessing layer employed there, then you should do the same preprocessing during inference via deploy.txt or runtime engine. Usually it is RGB mean substraction along with/without scaling by 1/255.0, etc. Also you have to care about RGB or BGR order of your traing data and inference data. With these one, I got almost almost same result with Caffe but small accuracy deduction. I believe the deduction might be related with FP16 or weigth compression, or something I cannot understand yet.

@MINZHIJI
Copy link
Author

MINZHIJI commented May 28, 2018

@chagyun0213 Thank you for your answer. And Could you provide your example code?

@ned-varnica
Copy link

Hi @chagyun0213

Can you please share an example of network and code (AlexNet or ResNet) that you were able to get to work? For example, how do you feed the input image into the network and where do you apply raw scaling? Really would appreciate any answer on this. Thanks!

@prasshantg
Copy link
Collaborator

@ned-varnica previously we were normalizing input image by 255.0 by default which was causing to get incorrect results as pointed out by @chagyun0213 due to incorrect pre-processing. We have fixed it by providing runtime argument --normalize to specify value. Please try using 1.0 with this option. We were able to get correct results with it.

@chagyun0213
Copy link

Here is key information.
Let's see the "train_val.txt" for BVLC AlexNet
https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt

name: "AlexNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
data_param {
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 256
backend: LMDB
}
}
-- DELETED other layers --

  1. AlexNet was trained on 227x227 color images. Since the training database "ilsvrc12_train_lmdb" is "BGR" fromat, you have to confirm whether input data for inference is written to DLA as the same "BGR" order, especially in createFF16ImageCopy() block.

  2. Due to the "mean_file" option above, you have to use the same mean value for subtraction.
    Its values are (BGR) = (104, 116,122). The subtraction also could be done easily in
    createFF16ImageCopy() block.

  3. No scaling was chosen.

As summary, the correct pre-processing for AlexNet included in BVLC is
input_DLA = 1.0*( input_pixel - [104,116,122])

I hope it would be sufficient information for your problem.

CJ

@ned-varnica
Copy link

@chagyun0213
thanks very much. This is very helpful!

@MINZHIJI
Copy link
Author

MINZHIJI commented Jun 7, 2018

I use cifar10 quick as inference model, and I ran some image include test images and train images.
Then, I ran cifar10 quick with caffe and nvdla, and I compared the results between caffe and nvdla.
I just want to ask the result whether correct or sensible.
Result analysis link
Analysis condition:

  • model: cifar10_quick (BLVC)
  • Image preprocessing:
    • Caffe: image just scaling to [0..255] (raw_scale=255)
    • Nvdla: image just scaling to [0..255] (--normalize 1.0)
  • Compare mechanism:
    • Loss function: Absolute loss function and cross entropy
  • Images:
    • Randomly get 42 images from Cifar10 dataset train/ test images and convert to .jpg

@prasshantg
Copy link
Collaborator

@MINZHIJI can you generate similar loss report only for top-5 or top-1? Results looks sensible, reviewing cases where we see mismatch in top-1/top-5.

@prasshantg
Copy link
Collaborator

@MINZHIJI results look good, please reopen issue if you see any problem

@gitosu67
Copy link

gitosu67 commented Jan 31, 2020

Here is key information.
Let's see the "train_val.txt" for BVLC AlexNet
https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt

name: "AlexNet"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 227
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
}
data_param {
source: "examples/imagenet/ilsvrc12_train_lmdb"
batch_size: 256
backend: LMDB
}
}
-- DELETED other layers --

  1. AlexNet was trained on 227x227 color images. Since the training database "ilsvrc12_train_lmdb" is "BGR" fromat, you have to confirm whether input data for inference is written to DLA as the same "BGR" order, especially in createFF16ImageCopy() block.
  2. Due to the "mean_file" option above, you have to use the same mean value for subtraction.
    Its values are (BGR) = (104, 116,122). The subtraction also could be done easily in
    createFF16ImageCopy() block.
  3. No scaling was chosen.

As summary, the correct pre-processing for AlexNet included in BVLC is
input_DLA = 1.0*( input_pixel - [104,116,122])

I hope it would be sufficient information for your problem.

CJ

How are you getting the mean value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants