Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low performance when using minimum code to reuse your models #28

Closed
kondela opened this issue Mar 23, 2020 · 6 comments
Closed

Low performance when using minimum code to reuse your models #28

kondela opened this issue Mar 23, 2020 · 6 comments

Comments

@kondela
Copy link

kondela commented Mar 23, 2020

I am trying to reuse your pre-trained Pix2Vox-A model but the output seems to be of a low quality (close to randomness). Following is the bare minimum of code that I used.

encoder = Encoder(cfg)
decoder = Decoder(cfg)
refiner = Refiner(cfg)
merger = Merger(cfg)

cfg.CONST.WEIGHTS = '/Projects/Pix2Vox/pretrained_weights/Pix2Vox-A-ShapeNet.pth'
checkpoint = torch.load(cfg.CONST.WEIGHTS, map_location=torch.device('cpu'))

fix_checkpoint = {}
fix_checkpoint['encoder_state_dict'] = OrderedDict((k.split('module.')[1:][0], v) for k, v in checkpoint['encoder_state_dict'].items())
fix_checkpoint['decoder_state_dict'] = OrderedDict((k.split('module.')[1:][0], v) for k, v in checkpoint['decoder_state_dict'].items())

epoch_idx = checkpoint['epoch_idx']
encoder.load_state_dict(fix_checkpoint['encoder_state_dict'])
decoder.load_state_dict(fix_checkpoint['decoder_state_dict'])

encoder.eval()
decoder.eval()
refiner.eval()
merger.eval()

img1_path = '/ShapeNetRendering/02691156/1a04e3eab45ca15dd86060f189eb133/rendering/00.png'
img1_np = np.asarray(Image.open(img1_path))

sample = np.array([img1_np])

IMG_SIZE = cfg.CONST.IMG_H, cfg.CONST.IMG_W
CROP_SIZE = cfg.CONST.CROP_IMG_H, cfg.CONST.CROP_IMG_W

test_transforms = utils.data_transforms.Compose([
    utils.data_transforms.CenterCrop(IMG_SIZE, CROP_SIZE),
    utils.data_transforms.RandomBackground(cfg.TEST.RANDOM_BG_COLOR_RANGE),
    utils.data_transforms.Normalize(mean=cfg.DATASET.MEAN, std=cfg.DATASET.STD),
    utils.data_transforms.ToTensor(),
])

rendering_images = test_transforms(rendering_images=sample)
rendering_images = rendering_images.unsqueeze(0)

with torch.no_grad():
    image_features = encoder(rendering_images)
    raw_features, generated_volume = decoder(image_features)

    if cfg.NETWORK.USE_MERGER and epoch_idx >= cfg.TRAIN.EPOCH_START_USE_MERGER:
        generated_volume = merger(raw_features, generated_volume)
    else:
        generated_volume = torch.mean(generated_volume, dim=1)

    if cfg.NETWORK.USE_REFINER and epoch_idx >= cfg.TRAIN.EPOCH_START_USE_REFINER:
        generated_volume = refiner(generated_volume)

For visualization I used the binvox_visualization from utilities:

generated_volume = generated_volume.squeeze(0)

img_dir = '/sample_images'
gv = generated_volume.cpu().numpy()
rendering_views = utils.binvox_visualization.get_volume_views(gv, os.path.join(img_dir),
                                                              epoch_idx)

This is the model's output:
image

This is the input:
00

There were few problems with loading the pre-trained weights, as mentioned in other issues, but other than that it seems to work properly except for the quality. I guess I must be missing something?

@kondela
Copy link
Author

kondela commented Mar 28, 2020

After digging a bit deeper the difference was in the image loading, I used Pillow instead of OpenCV and that made all the difference. In case of visualization all I had to do was to swap axes np.swapaxes(voxels, 2, 1)

download

@kondela kondela closed this as completed Mar 28, 2020
@aashishbohra10
Copy link

Hi @kondela .......
I am also facing the issue. So if you are able to share your test.py it would be very grateful for me to generate effective results.

@ahmedshingaly
Copy link

ahmedshingaly commented Sep 8, 2020

I will be grateful if the above-mentioned script is collected in a test file, I tried to reproduce the test results but failed.
@hzxie @kondela @aashishbohra10

encoder = Encoder(cfg)
decoder = Decoder(cfg)
refiner = Refiner(cfg)
merger = Merger(cfg)

cfg.CONST.WEIGHTS = pretrained/Pix2Vox-A-ShapeNet.pth
chechpoint = torch.load(cfg.CONST.WEIGHTS, map_location=torch.device('cpu'))

fix_checkpoint = {}
fix_checkpoint['encoder_state_dict'] = orderedDict((k.split('module.')[1:][0], v) for k, v in checkpoint['encoder_state_dict'].items())
fix_checkpoint['decoder_state_dict'] = orderedDict((k.split('module.')[1:][0], v) for k, v in checkpoint['decoder_state_dict'].items())

epoch_idx = checkpoint['epoch_idx']
encoder.load_state_dict(fix_checkpoint['encoder_state_dict'])
decoder.load_state_dict(fix_checkpoint['decoder_state_dict'])

encoder.eval()
decoder.eval()
refiner.eval()
merger.eval()

img1_path = '/datasets/ShapeNetRendering/02691156/1a04e3eab45ca15dd86060f189eb133/rendering/00.png'
img1_np = np.asarray(Image.open(img1_path))

sample = np.array([img1_np])

IMG_SIZE = cfg.CONST.IMG_H, cfg.CONST.IMG_W
CROP_SIZE = cfg.CONST.CROP_IMG_H, cfg.CONST.CROP_IMG_W

test_transforms = utils.data_transforms.Compose([
    utils.data_transforms.CenterCrop(IMG_SIZE, CROP_SIZE),
    utils.data_transforms.RandomBackground(cfg.TEST.RANDOM_BG_COLOR_RANGE), 
    utils.data_transforms.Normalize(mean=cfg.DATASET.MEAN, std=cfg.DATASET.STD),
    utils.data_transforms.ToTensor(),
])

rendering_images = test_transforms(rendering_images=sample)
rendering_images = rendering_images.unsqueeze(0)

with torch.no_grad():
    image_features = encoder(rendering_images)
    raw_features, generated_volum = decoder(image_features)

    if cfg.NETWORK.USE_MERFER and epoch_idx >= cfg.TRAIN.EPOCH_START_USE_REFINER:
        generated_volume = refiner(generated_volume)


generated_volume = generated_volume.squeeze(0)

img_dir= '/output/myresults'
gv = generated_volume.cpu().numpy()
rendering_views = utils.binvox_visualizatino.get_volume_views(gv, os.path.join(img_dir), epoch_idx)

Thank you in advance.

@saisai1002
Copy link

saisai1002 commented Sep 28, 2020

After digging a bit deeper the difference was in the image loading, I used Pillow instead of OpenCV and that made all the difference. In case of visualization all I had to do was to swap axes np.swapaxes(voxels, 2, 1)

download
I am also facing the issue.Appeared when i tested this
image
Where is the ‘np.swapaxes(voxels, 2, 1)’ code added,thanks

@LiyingCV
Copy link

After digging a bit deeper the difference was in the image loading, I used Pillow instead of OpenCV and that made all the difference. In case of visualization all I had to do was to swap axes np.swapaxes(voxels, 2, 1)

download
I am also facing the issue.Appeared when i tested this
image
Where is the ‘np.swapaxes(voxels, 2, 1)’ code added,thanks

Do you resolve this issue? I also meet this issue and I still can not resolve it.

@b7leung
Copy link

b7leung commented Mar 19, 2021

In my case, this was resolved when I found out the pix2vox is trained so that for transparent png input images, the RGB channels where the Alpha channel is transparent need to be black. In my case, they were white. This is impossible to tell right away from the image visually (because of the alpha), but can be double checked with code (eg matplotlib). If you only show the RGB channels, it should have a black background.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants