Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre processing of input data #4

Open
Girish-03 opened this issue Apr 14, 2021 · 8 comments
Open

Pre processing of input data #4

Girish-03 opened this issue Apr 14, 2021 · 8 comments

Comments

@Girish-03
Copy link

Hi,

The work is really amazing and results seems to be astonishing.

I am a student and trying to use this code for one of my research project. I would like to know if there is a specific pre processing technique to be used before feeding in the images to the network.
For instance, I am detecting the faces in video frames using OpenCV Caffe model DNN face detector, cropping it, resizing it to 256x256 and feeding to the network. But, the valence and arousal values along with categorical emotion I am getting is not matching for many frames. I am assuming I might not be doing some preprocessing of input frames as required by Emonet model. Also, if there is any specific technique to be used for detecting and cropping the faces. Therefore, requesting your guidance here.
I performed the estimation and visualization on the same video provided in the paper to compare to your results, but its not the same. Below is the link to the video with original results (Valence arousal bars and categorical emotions) and results from my pre processing (as explained above).

(The Green vertical and blue horizontal bars with emotion in red text are my results.)
Using 5 class model
https://drive.google.com/file/d/1--GW_J3XUDNbo59YOTbLJ-VPWS4-2oey/view?usp=sharing
Using 8 class model
https://drive.google.com/file/d/1jJ9Ah7rcoN3aVkLYPq8cDajdRTnwsamU/view?usp=sharing

@antoinetlc
Copy link
Contributor

antoinetlc commented May 3, 2021

Hello,

Thank you for your interest.
It is hard to say what is wrong without having the code. Are the values of the input image in the range [0;1] ?
Otherwise, I would advise you to have a look at the dataloader we provide for the AffectNet dataset and in particular these lines : https://github.com/face-analysis/emonet/blob/master/emonet/data/affecnet.py#L122#L131
This is where we apply the transformations to the cropped images obtained from a face detector.

You can also look at these lines in the test.py file : https://github.com/face-analysis/emonet/blob/master/test.py#L35#L51
This is where the transformations are created and passed to the dataloaders.

Hope this helps

@AhmadAsnaashari
Copy link

Hi,

The work is really amazing and results seems to be astonishing.

I am a student and trying to use this code for one of my research project. I would like to know if there is a specific pre processing technique to be used before feeding in the images to the network.
For instance, I am detecting the faces in video frames using OpenCV Caffe model DNN face detector, cropping it, resizing it to 256x256 and feeding to the network. But, the valence and arousal values along with categorical emotion I am getting is not matching for many frames. I am assuming I might not be doing some preprocessing of input frames as required by Emonet model. Also, if there is any specific technique to be used for detecting and cropping the faces. Therefore, requesting your guidance here.
I performed the estimation and visualization on the same video provided in the paper to compare to your results, but its not the same. Below is the link to the video with original results (Valence arousal bars and categorical emotions) and results from my pre processing (as explained above).

(The Green vertical and blue horizontal bars with emotion in red text are my results.)
Using 5 class model
https://drive.google.com/file/d/1--GW_J3XUDNbo59YOTbLJ-VPWS4-2oey/view?usp=sharing
Using 8 class model
https://drive.google.com/file/d/1jJ9Ah7rcoN3aVkLYPq8cDajdRTnwsamU/view?usp=sharing

Hello @Girish-03
Like you, I got different results compared to the demo.
Is your problem solved?

@antoinetlc
Copy link
Contributor

Hello,
Sorry for the delay in answering.
We do not do any specific preprocessing apart from what is done in the DataAugmentor class : https://github.com/face-analysis/emonet/blob/master/emonet/data_augmentation.py#L47

One issue I can think of is the fact that OpenCV loads image in BGR format, whereas our network was trained using the RGB format (we load images using skimage - see the affectnet dataloader, get_item function : https://github.com/face-analysis/emonet/blob/master/emonet/data/affecnet.py#L120). Maybe this is the issue...

Hope this helps!

@uf644
Copy link

uf644 commented Aug 28, 2021

Hello
I wonder what the "4 dimensional input" exactly is. I followed the "DataAugmentor" but only got a three dimensional input.
This is the bug: "RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 256, 256] instead"

@mdabbah
Copy link

mdabbah commented Dec 11, 2022

I'm also having an issue validating the network's prediction on stock images
I do suspect that the problem lies within the normalization and data preparation part.

I've tried many variations including flipping the channels from RGB to BGR
cropping the image to include only the face using a an off the shelf face detector (validated that teh cropped image only includes my face)

normalized the input array:

  • tried no notmalization (values 0-255),
  • normalized by 1/255 (values 0-1),
  • normalize by subtracting mean = [0.485, 0.456, 0.406] and div by std = [0.229, 0.224, 0.225]

always resize image to 256,256

non of the above variations worked and the network still predicts the wrong emotion and valiance and arousal
(target is happy emotion, which should give high positive valence and positive arousal)

in the code in the repository there is no code that does input normalization, only resize transform

could you please point me to the correct data preparation steps?

Thanks

@kdoodoo
Copy link

kdoodoo commented Oct 30, 2023

image_transforms = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
])
def classify1(model, image_transforms, image_path ):
    
    image = Image.open( image_path)
    image = image.convert('RGB')
    image = image_transforms(image)
    image = image.unsqueeze(0)
    image = image.cuda()
    output = model(image)
    print(image_path, ',',output['expression'][0, :].tolist(),',',np.argmax(output['expression'][0, :].tolist()),',', output['arousal'].tolist(),',', output['valence'].tolist()) 

I got result as it is.

@SuperRuarua
Copy link

image_transforms = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
])
def classify1(model, image_transforms, image_path ):
    
    image = Image.open( image_path)
    image = image.convert('RGB')
    image = image_transforms(image)
    image = image.unsqueeze(0)
    image = image.cuda()
    output = model(image)
    print(image_path, ',',output['expression'][0, :].tolist(),',',np.argmax(output['expression'][0, :].tolist()),',', output['arousal'].tolist(),',', output['valence'].tolist()) 

我得到了结果。

nice!

@SuperRuarua
Copy link

image_transforms = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
])
def classify1(model, image_transforms, image_path ):
    
    image = Image.open( image_path)
    image = image.convert('RGB')
    image = image_transforms(image)
    image = image.unsqueeze(0)
    image = image.cuda()
    output = model(image)
    print(image_path, ',',output['expression'][0, :].tolist(),',',np.argmax(output['expression'][0, :].tolist()),',', output['arousal'].tolist(),',', output['valence'].tolist()) 

我得到了结果。

好!

goodgood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants