Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load pre-trained weights? #4

Closed
hiepph opened this issue Dec 6, 2018 · 24 comments
Closed

How to load pre-trained weights? #4

hiepph opened this issue Dec 6, 2018 · 24 comments

Comments

@hiepph
Copy link

hiepph commented Dec 6, 2018

I see you post the link to pre-trained weights but no tutorial how to use them.

Tried with keras:

from keras.models import load_model
load_model('vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5')

but failed ValueError: No model found in config file.

How do I load these h5 weights? Can you provide the model source so I can use model.load_weights(...)

@lgaida
Copy link

lgaida commented Dec 7, 2018

I also wanted to play around with the pre-trained weights of the holistic mode, so i downloaded 'vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5'

I used Keras with Tensorflow and assumed that using the vgg16 from keras.applications should work

from keras import applications
vgg = applications.VGG16(include_top=True, weights='PATH_TO_WEIGHTSFILE', classes=16)

Turns out you don't even have to convert the weights from theano to tensorflow on your own, since keras does this internally in model.load_weights (which is called inside vgg16 if you provide a weightsfile).

Initialization of the model + loading the weights seem to work, i didn't get any errors.
I then used a few examples from rvl-cdip to test everything. Sadly, every image tested was classified as memo.

Beeing suspicious about the weight-conversion, i set up a new project and installed keras with theano. And again, loading the model with weights worked but all test-images were classified as memo.

In 'IV-B Preprocessing' of the paper it is said that:

Following the resizing, all datasets were standardized

Can someone clarify what "standardized" means? Mean Pixel Substraction? Rescaling?

I would appreciate if someone could confirm that the provided weights actually work.

@hiepph
Copy link
Author

hiepph commented Dec 10, 2018

Tks @lgaida, I successfully load the trained weights as you suggest. My input is preprocessed as:

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img = image.load_img('my_image', target_size=(224, 224))
img = image.img_to_array(img)
img = preprocess_put(img)

x = np.expand_dims(img, 0)

But when I tried to predict with holistic model I had the same problem with you:

y = vgg.predict(x)
np.argmax(y) # always end up at id 8 (which is file folder)

@lgaida
Copy link

lgaida commented Dec 10, 2018

@hiepph too bad 😢

@saikat-roy
Copy link
Collaborator

Can someone clarify what "standardized" means? Mean Pixel Substraction? Rescaling?

By "standardized", we mean subtract the mean and divide by the standard deviation.

Regarding the data loading issues, I can try to look into our old code and configurations and try to elaborate.

@saikat-roy
Copy link
Collaborator

saikat-roy commented Dec 11, 2018

I can confirm however that everything we did was using theano as the backend.

So the input dimensions as well as the weights are in theano ordering. If you are using tensorflow as the backend, then you have to either switch backends to theano or change weight orderings for everything to work I think.

Turns out you don't even have to convert the weights from theano to tensorflow on your own, since keras does this internally in model.load_weights (which is called inside vgg16 if you provide a weightsfile).

I cannot however neither confirm nor deny this since I have not worked with that functionality myself.

@lgaida
Copy link

lgaida commented Dec 11, 2018

Hi @saikat-roy thank you for replying 👍
It would be fantastic if you could peek at your code again, maybe providing some code snippets. Playing around with dim-ordering is fine, but guessing and assuming preprocessing is way harder.

@saikat-roy
Copy link
Collaborator

saikat-roy commented Dec 11, 2018

Hey @lgaida. We apologize for not replying sooner but the source code of the project was never really written for what you might call, public consumption (also known as, it's an absolute mess) so we are scrambling to dig it out of storage.

It would be fantastic if you could peek at your code again, maybe providing some code snippets. Playing around with dim-ordering is fine, but guessing and assuming preprocessing is way harder.

# X is the main data matrix organized as (samples,channel,height,width) formatting
# Initially X has been created with 3 channels to match original VGG16 input but
# since RVL-CDIP images are grayscale, we simply copy the 1st channel onto the 
# 2nd and 3rd channel. but after standardization as you will see below.

_mean = X[:,0,:,:].mean(axis=0)
_std  = X[:,0,:,:].std(axis=0)
	
_jmp = 1000 # We essentially do the standardization in mini-batches 
            # of size '_jmp' due to memory constraints

for i in range(0,X.shape[0],_jmp):
	end = min(i+_jmp,X.shape[0])
	X[i:end,0,:,:] = (X[i:end,0,:,:]-_mean)/_std # batch standardization
	X[i:end,1,:,:] = X[i:end,0,:,:] # batch copying to channel 2
	X[i:end,2,:,:] = X[i:end,0,:,:] # batch copying to channel 3

I am digging through our old files and this is the preprocessing snippet that I found we had used. I should however warn you that the _mean and _std calculation that we used are the naive versions and will consume a ridiculous amount of memory and if used without extremely large RAMs will probably lead to crashes. We used AWS EC2 instances (also we were being a bit lazy) so it wasn't a problem for us but I would recommend modifying it in some way (maybe doing it manually in mini-batches) to suit lower hardware configurations.

@lgaida
Copy link

lgaida commented Dec 11, 2018

Thanks for replying so quickly. I'm going to play around with your code snippet, and i'm currently implementing something very similar on my own.

To reduce even more assumptions:
X in your code snippet represents the train images of rvl-cdip, and you normalize the test samples with the mean & std of this X (=train samples), right?
Or is X the whole rvl-cdip including train, test, validation?

@saikat-roy
Copy link
Collaborator

X in your code snippet represents the train images of rvl-cdip, and you normalize the test samples with the mean & std of this X (=train samples), right?
Or is X the whole rvl-cdip including train, test, validation?

While the first case you suggested might be more experimentally sound, we actually ran this snippet separately for train, test and validation sets, standardizing each dataset with their own mean and standard deviation.

@lgaida
Copy link

lgaida commented Dec 11, 2018

Hello again,
I installed Theano and tested both my own and your implementation of the normalization. Still not able to make good predictions 👎
If you don't want to publish the code, any chances i might get it? I would try to come up with a publishable code snippet, providing a small example on how to use the weights for prediction.

@saikat-roy
Copy link
Collaborator

saikat-roy commented Dec 11, 2018

I installed Theano and tested both my own and your implementation of the normalization. Still not able to make good predictions -1

That's odd. I'm guessing you did the whole changes in the keras.json configuration file by setting the "backend" and "image_data_format" already. Strange that it wouldn't work.

If you don't want to publish the code, any chances i might get it? I would try to come up with a publishable code snippet, providing a small example on how to use the weights for prediction.

Sure. Give us a little time, like a day or so, and we'll give you the version of the code that we had used.

@saikat-roy
Copy link
Collaborator

saikat-roy commented Dec 11, 2018

I installed Theano and tested both my own and your implementation of the normalization. Still not able to make good predictions

Hey @lgaida. I was digging around our code and I saw something. I know the last version I gave you didn't have a NaN guard for the standardization. Did your version have one?

_jmp = 1000
eps = 0.0001
for i in range(0,X.shape[0],_jmp):
	end = min(i+_jmp,X.shape[0])
	X[i:end,0,:,:] = (X[i:end,0,:,:]-_mean)/(_std+eps) # batch standardization
	X[i:end,1,:,:] = X[i:end,0,:,:] # batch copying to channel 2
	X[i:end,2,:,:] = X[i:end,0,:,:] # batch copying to channel 3		

@lgaida
Copy link

lgaida commented Dec 11, 2018

Hey @lgaida. I was digging around our code and I saw something. I know the last version I gave you didn't have a NaN guard for the standardization. Did your version have one?

Kind of, i initialized the array with zeroes.

Sure. Give us a little time, like a day or so, and we'll give you the version of the code that we had used.

Sounds great 👍 I'll be waiting until then :) Feel free to contact me via github or email (see github-profile)

@saikat-roy
Copy link
Collaborator

Kind of, i initialized the array with zeroes.

I mean to say that (as far as I remember) the std of X in some places is 0. So you would be getting NaNs in the standardized input in some places. Do we mean the same thing? It was an issue for us if I am still remembering correctly. Try adding a small value like 0.0001 or something to the _std like above and try running the examples again if you haven't yet specifically guarded against this.

@hiarindam
Copy link
Owner

Hello @lgaida , thanks for your interest in our work and reaching out to us.

Kind of, i initialized the array with zeroes.

I would repeat the same thing as mentioned by @saikat-roy that even though initialization was done with all zeros, unfortunately that doesn't guarantee that you won't get NaN. Please consider adding this safe guard in your code and let us know.

@lgaida
Copy link

lgaida commented Dec 12, 2018

Hello @lgaida , thanks for your interest in our work and reaching out to us.

Kind of, i initialized the array with zeroes.

I would repeat the same thing as mentioned by @saikat-roy that even though initialization was done with all zeros, unfortunately that doesn't guarantee that you won't get NaN. Please consider adding this safe guard in your code and let us know.

I just added the guard but still get Label 8 for every tested sample 😢

@hiepph
Copy link
Author

hiepph commented Dec 12, 2018

Hi @saikat-roy, can you provide the mean and std values of your training set so I can standardize my inputs before forwarding through the trained model?

@saikat-roy
Copy link
Collaborator

Hey sorry for the late reply.

Hi @saikat-roy, can you provide the mean and std values of your training set so I can standardize my inputs before forwarding through the trained model?

I'm really sorry but we don't have the computational environment setup, that we had set up for processing the dataset, available currently.

I just added the guard but still get Label 8 for every tested sample

We will however, be looking into releasing more of our code and testing the model weights ourselves since it is disturbing to hear the model weights do not load as expected. While we cannot do it immediately, we do plan to try it in a week or two.

So I would request your patience for a while longer and hopefully we can get back to you with better news than "we don't know what's wrong, this shouldn't be happening".

@lgaida
Copy link

lgaida commented Dec 17, 2018

Just want to remind you that i could also take a look at the code 👋

@puneetiitian
Copy link

Hi Saikat, Arindam,

First of all thanks for writing this great article
I am also getting everything predicted as 8. Below is my code: Kindly assist and let us know how to get this resolved

from keras import applications
vgg = applications.VGG16(include_top=True, weights='F:/Doc_Image_Classification/vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5', classes=16)
import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
img = image.load_img('F:/Doc_Image_Classification/images/pic1.png', target_size=(224, 224))
img = image.img_to_array(img)
img = preprocess_input(img)
x = np.expand_dims(img, 0)
y = vgg.predict(x)
np.argmax(y)

@saikat-roy
Copy link
Collaborator

Okay so first and foremost we are sincerely sorry about the ridiculously late updates to this issue. Unfortunately as we mentioned, we have since stopped working on this project and have literally no hardware or software setup available to test the models any more. I know its frustrating to have your queries not answered but we have gotten little to no time to really go through the code for this bug - we have thought a lot about it and simply put, it did NOT exist when we worked on it.

The reason I am writing this update is to mention that we recently went through multiple issues on the keras forums regarding issues with model.save and model.load in keras. From our end, the code should run fine if the data is simply standardized as I had mentioned earlier, which everyone seems to be doing as well - so if you are still using our code I gently urge you to look into whether the keras bugs for serialization are to blame here. We will go over it ourselves if we can but without a proper hardware setup, we sincerely can't promise anything in terms of time.

I thank you for being patient with us and again we sincerely apologize for not actively helping out with the issue. To anyone who needs our code, we will attempt to simply just release the .py files with some minor cleaning soon - since we can't help out actively this is the least we can do at this point.

@saikat-roy
Copy link
Collaborator

An attempt to solve the weight loading has been added to the readme. So we'll be closing this issue.

@martinnormark
Copy link

martinnormark commented Nov 7, 2019

For anyone looking to run this with Tensorflow 2.0, the following will work.

Install dependencies:

pip install tensorflow
pip install keras
pip install pillow (used for inference later)

Download a weights file, e.g. vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5 from Google Drive

Download the convert script from this repo.

Open the convert script, and make the following changes:

  • Set the model_weights array at the top to point to the weight file(s) you have downloaded
  • Replace K.set_image_dim_ordering('th') with K.common.set_image_dim_ordering('th').

Run python Weight_conversion_th_to_tf_Keras2.py from terminal/command prompt.

A new folder is created (tf-kernels-channels-last-dim-ordering) and contains the converted weights file.

Open the folder and create new file called test.py with the following code:

from keras import applications
vgg = applications.VGG16(include_top=True, weights='./vgg16_weights_th_dim_ordering_th_kernels_Holistic_91.11.h5', classes=16)
import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

class_map = ['letter', 'form', 'email', 'handwritten', 'advertisement',
	'scientific report', 'scientific publication', 'specification', 'file folder',
	'news article', 'budget', 'invoice', 'presentation', 'questionnaire',
	'resume', 'memo']

def test(path):
	img = image.load_img(path, target_size=(224, 224))
	img = image.img_to_array(img)
	img = preprocess_input(img)
	x = np.expand_dims(img, 0)
	y = vgg.predict(x)
	print(y)

	idx = np.argmax(y)
	print('predicted class: {}', class_map[idx])

test('../form.jpg')

Now run the code: python test.py and it will print out the predicted class of the image.

@saikat-roy
Copy link
Collaborator

@martinnormark Hey thanks for the guide. I've added a link to this on the main Readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants