New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low performance when using minimum code to reuse your models #28
Comments
Hi @kondela ....... |
I will be grateful if the above-mentioned script is collected in a test file, I tried to reproduce the test results but failed. encoder = Encoder(cfg)
decoder = Decoder(cfg)
refiner = Refiner(cfg)
merger = Merger(cfg)
cfg.CONST.WEIGHTS = pretrained/Pix2Vox-A-ShapeNet.pth
chechpoint = torch.load(cfg.CONST.WEIGHTS, map_location=torch.device('cpu'))
fix_checkpoint = {}
fix_checkpoint['encoder_state_dict'] = orderedDict((k.split('module.')[1:][0], v) for k, v in checkpoint['encoder_state_dict'].items())
fix_checkpoint['decoder_state_dict'] = orderedDict((k.split('module.')[1:][0], v) for k, v in checkpoint['decoder_state_dict'].items())
epoch_idx = checkpoint['epoch_idx']
encoder.load_state_dict(fix_checkpoint['encoder_state_dict'])
decoder.load_state_dict(fix_checkpoint['decoder_state_dict'])
encoder.eval()
decoder.eval()
refiner.eval()
merger.eval()
img1_path = '/datasets/ShapeNetRendering/02691156/1a04e3eab45ca15dd86060f189eb133/rendering/00.png'
img1_np = np.asarray(Image.open(img1_path))
sample = np.array([img1_np])
IMG_SIZE = cfg.CONST.IMG_H, cfg.CONST.IMG_W
CROP_SIZE = cfg.CONST.CROP_IMG_H, cfg.CONST.CROP_IMG_W
test_transforms = utils.data_transforms.Compose([
utils.data_transforms.CenterCrop(IMG_SIZE, CROP_SIZE),
utils.data_transforms.RandomBackground(cfg.TEST.RANDOM_BG_COLOR_RANGE),
utils.data_transforms.Normalize(mean=cfg.DATASET.MEAN, std=cfg.DATASET.STD),
utils.data_transforms.ToTensor(),
])
rendering_images = test_transforms(rendering_images=sample)
rendering_images = rendering_images.unsqueeze(0)
with torch.no_grad():
image_features = encoder(rendering_images)
raw_features, generated_volum = decoder(image_features)
if cfg.NETWORK.USE_MERFER and epoch_idx >= cfg.TRAIN.EPOCH_START_USE_REFINER:
generated_volume = refiner(generated_volume)
generated_volume = generated_volume.squeeze(0)
img_dir= '/output/myresults'
gv = generated_volume.cpu().numpy()
rendering_views = utils.binvox_visualizatino.get_volume_views(gv, os.path.join(img_dir), epoch_idx) Thank you in advance. |
In my case, this was resolved when I found out the pix2vox is trained so that for transparent png input images, the RGB channels where the Alpha channel is transparent need to be black. In my case, they were white. This is impossible to tell right away from the image visually (because of the alpha), but can be double checked with code (eg matplotlib). If you only show the RGB channels, it should have a black background. |
I am trying to reuse your pre-trained Pix2Vox-A model but the output seems to be of a low quality (close to randomness). Following is the bare minimum of code that I used.
For visualization I used the
binvox_visualization
from utilities:This is the model's output:
This is the input:
There were few problems with loading the pre-trained weights, as mentioned in other issues, but other than that it seems to work properly except for the quality. I guess I must be missing something?
The text was updated successfully, but these errors were encountered: