Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct way to extract image features with VinVL #7

Open
stopmosk opened this issue Apr 14, 2021 · 14 comments
Open

Correct way to extract image features with VinVL #7

stopmosk opened this issue Apr 14, 2021 · 14 comments

Comments

@stopmosk
Copy link

Hi!

How can I extract image features from my dataset with VinVL if it's not in tsv format, but in the form of a folder with image files? What's the correct way to do this?

@vinson2233
Copy link

vinson2233 commented Apr 14, 2021

Hi, I think we must to use the TSV file. We can generate the TSV using https://github.com/microsoft/scene_graph_benchmark/blob/main/tools/mini_tsv/tsv_demo.py .
data_path is directory to your image folder. tsv_file,label_file,hw_file,linelist_file is your output directory and name. You just need to setup the directory you want.

Just run the script up to line 56. It will produce tsv_file,label_file,hw_file,linelist_filein your destination. But it seems that all the file it produces missing a single config.yaml. See my answer on creating the yaml file.
#6 (comment)

After that, you can run the script test_sg_net.py for image features extraction according to the setting in readme.md.
Don't forget that dataset should point to directory that containing tsv_file,label_file,hw_file,linelist_file and rewrite the DATASETS.TEST to point the yaml file.

@stopmosk
Copy link
Author

Thanks for the comprehensive answer!

@MikeDean2367
Copy link

Hi, I think we must to use the TSV file. We can generate the TSV using https://github.com/microsoft/scene_graph_benchmark/blob/main/tools/mini_tsv/tsv_demo.py .
data_path is directory to your image folder. tsv_file,label_file,hw_file,linelist_file is your output directory and name. You just need to setup the directory you want.

Just run the script up to line 56. It will produce tsv_file,label_file,hw_file,linelist_filein your destination. But it seems that all the file it produces missing a single config.yaml. See my answer on creating the yaml file.
#6 (comment)

After that, you can run the script test_sg_net.py for image features extraction according to the setting in readme.md.
Don't forget that dataset should point to directory that containing tsv_file,label_file,hw_file,linelist_file and rewrite the DATASETS.TEST to point the yaml file.

hey friends, In line 37 of the tsv_demo.py file, does the code here need to change? I would be grateful!

@alice-cool
Copy link

alice-cool commented Jun 7, 2021

Hi, I think we must to use the TSV file. We can generate the TSV using https://github.com/microsoft/scene_graph_benchmark/blob/main/tools/mini_tsv/tsv_demo.py .
data_path is directory to your image folder. tsv_file,label_file,hw_file,linelist_file is your output directory and name. You just need to setup the directory you want.

Just run the script up to line 56. It will produce tsv_file,label_file,hw_file,linelist_filein your destination. But it seems that all the file it produces missing a single config.yaml. See my answer on creating the yaml file.
#6 (comment)

After that, you can run the script test_sg_net.py for image features extraction according to the setting in readme.md.
Don't forget that dataset should point to directory that containing tsv_file,label_file,hw_file,linelist_file and rewrite the DATASETS.TEST to point the yaml file.

hi I want to get some suggestions about the more object label circumstance. I found the project provides us the labelmap file. But the labelmap only provide 50 objects. It is so smaller. I run the coco2014 , the object label 1370 is far bigger than it . So We should only add some id and name to the labelmap.file or we should from scratch to train the object detector of the project ? I have the coco 2014 36 box's label. But I don't know how to get bigger object label labelmap file

If you have way, please help me . Thanks

KeyError: 'broccoli'
Killing subprocess 4356

@alice-cool
Copy link

#7 (comment)
Dear scholar, do you realise the function which can produce features with your pointed boudingbox, not the proposal from model?

@uthynauta
Copy link

uthynauta commented Aug 16, 2021

Hi, I think we must to use the TSV file. We can generate the TSV using https://github.com/microsoft/scene_graph_benchmark/blob/main/tools/mini_tsv/tsv_demo.py .
data_path is directory to your image folder. tsv_file,label_file,hw_file,linelist_file is your output directory and name. You just need to setup the directory you want.

Just run the script up to line 56. It will produce tsv_file,label_file,hw_file,linelist_filein your destination. But it seems that all the file it produces missing a single config.yaml. See my answer on creating the yaml file.
#6 (comment)

@vinson2233

Hi there, thanks for your response, as of now I have been able to create the tsv and yaml files. But I would like to have a little bit more of help on the last paragraph, since I do not understand how to point to the yaml file and how to make dataset to point to the directory containing the other files.

After that, you can run the script test_sg_net.py for image features extraction according to the setting in readme.md.
Don't forget that dataset should point to directory that containing tsv_file,label_file,hw_file,linelist_file and rewrite the DATASETS.TEST to point the yaml file.

Thank you very much in advance.

@SPQRXVIII001
Copy link

Hi, I found a way to circumvent using tsv files by modifying scene_graph_benchmark/tools/demo/demo_image.py, and now I only need jpg image dataset, VinVl yaml configuration file and model weight file. The predictions are saved in dictionary and are stored in pth format. I ran it on Google Colab and it generates predictions at a rate about 2s/image. I hope this helps.

# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json

import cv2
import os
import os.path as op
import argparse
import json
from PIL import Image


from scene_graph_benchmark.scene_parser import SceneParser
from scene_graph_benchmark.AttrRCNN import AttrRCNN
from maskrcnn_benchmark.data.transforms import build_transforms
from maskrcnn_benchmark.utils.checkpoint import DetectronCheckpointer
from maskrcnn_benchmark.config import cfg
from scene_graph_benchmark.config import sg_cfg
from maskrcnn_benchmark.data.datasets.utils.load_files import \
    config_dataset_file
from maskrcnn_benchmark.data.datasets.utils.load_files import load_labelmap_file
from maskrcnn_benchmark.utils.miscellaneous import mkdir

def cv2Img_to_Image(input_img):
    cv2_img = input_img.copy()
    img = cv2.cvtColor(cv2_img, cv2.COLOR_BGR2RGB)
    img = Image.fromarray(img)
    return img


def detect_objects_on_single_image(model, transforms, cv2_img):
    # cv2_img is the original input, so we can get the height and 
    # width information to scale the output boxes.
    img_input = cv2Img_to_Image(cv2_img)
    img_input, _ = transforms(img_input, target=None)
    img_input = img_input.to(model.device)

    with torch.no_grad():
        prediction = model(img_input)[0].to('cpu')
    #     prediction = prediction[0].to(torch.device("cpu"))

    img_height = cv2_img.shape[0]
    img_width = cv2_img.shape[1]

    prediction = prediction.resize((img_width, img_height))
    
    return prediction

#Setting configuration
cfg.set_new_allowed(True)
cfg.merge_from_other_cfg(sg_cfg)
cfg.set_new_allowed(False)
#Configuring VinVl
cfg.merge_from_file('/scene_graph_benchmark/sgg_configs/vgattr/vinvl_x152c4.yaml')

#This is a list specifying the values for additional arguments, it encompasses pairs of list and values in an ordered manner
#MODEL.WEIGHT specifies the full path of the VinVl weight pth file
#DATA_DIR specifies the directory that contains VinVl input tsv configuration yaml file
argument_list = [
                 'MODEL.WEIGHT', 'vinvl_vg_x152c4.pth',
                 'MODEL.ROI_HEADS.NMS_FILTER', 1,
                 'MODEL.ROI_HEADS.SCORE_THRESH', 0.2, 
                 'TEST.IGNORE_BOX_REGRESSION', False,
                 'MODEL.ATTRIBUTE_ON', True
                 ]
cfg.merge_from_list(argument_list)
cfg.freeze()

#     assert op.isfile(args.img_file), \
#         "Image: {} does not exist".format(args.img_file)

output_dir = cfg.OUTPUT_DIR
#     mkdir(output_dir)

model = AttrRCNN(cfg)
model.to(cfg.MODEL.DEVICE)
model.eval()

checkpointer = DetectronCheckpointer(cfg, model, save_dir=output_dir)
checkpointer.load(cfg.MODEL.WEIGHT)

transforms = build_transforms(cfg, is_train=False)

input_img_directory = 'insert your images directory path here'
#need to be pth
output_prediction_file = 'insert your output pth file path here'
dets = {}
for img_name in os.listdir(input_img_directory):
  #Convert png format to jpg format
  if img_name.split('.')[1]=='png' or img_name.split('.')[1]=='PNG':
    im = Image.open(os.path.join(input_img_directory, img_name))
    rgb_im = im.convert('RGB')
    new_name = img_name.split('.')[0]+'.jpg'
    rgb_im.save(os.path.join(input_img_directory, new_name))
    print(new_name)

  img_file_path = os.path.join(input_img_directory,img_name.split('.')[0]+'.jpg')
  print(img_file_path)

  cv2_img = cv2.imread(img_file_path)

  det = detect_objects_on_single_image(model, transforms, cv2_img)
  
#   prediction contains ['labels',
#  'scores',
#  'box_features',
#  'scores_all',
#  'boxes_all',
#  'attr_labels',
#  'attr_scores']
# box_features are used for oscar

  det_dict ={key : det1[0].get_field(key) for key in det1[0].fields()}

  dets[img_name.split('.')[0]] = det_dict


torch.save(dets, output_prediction_file)

@GabrieleFerrario
Copy link

Hi @SPQRXVIII001, I don't understand how you can extract the box features from the demo_image.py file without using the TSV files. I tried to run your code with some modifications because there are some errors and when I print the list of fields extracted by prediction there is not the field you call 'box_features'; in fact running print("fields:", prediction.fields()) there are only the fields: ['labels', 'scores', 'attr_labels', 'attr_scores'].

@SPQRXVIII001
Copy link

Hi, @GabrieleFerrario, I think TSV files are unncessary. You need to modify your configuration yaml files at scene_graph_benchmark/sgg_configs/vgattr/vinvl_x152c4.yaml to add new field. I think you need to set the TEST entry to as follows:
TEST: IMS_PER_BATCH: 1 OUTPUT_FEATURE: True OUTPUT_RELATION_FEATURE: True SKIP_PERFORMANCE_EVAL: True SAVE_PREDICTIONS: True SAVE_RESULTS_TO_TSV: True TSV_SAVE_SUBSET: ['rect', 'class', 'conf', 'feature', 'relation_feature '] GATHER_ON_CPU: True

@SPQRXVIII001
Copy link

@BigHyf
Copy link

BigHyf commented Oct 24, 2021

Hi, I think we must to use the TSV file. We can generate the TSV using https://github.com/microsoft/scene_graph_benchmark/blob/main/tools/mini_tsv/tsv_demo.py . data_path is directory to your image folder. tsv_file,label_file,hw_file,linelist_file is your output directory and name. You just need to setup the directory you want.

Just run the script up to line 56. It will produce tsv_file,label_file,hw_file,linelist_filein your destination. But it seems that all the file it produces missing a single config.yaml. See my answer on creating the yaml file. #6 (comment)

After that, you can run the script test_sg_net.py for image features extraction according to the setting in readme.md. Don't forget that dataset should point to directory that containing tsv_file,label_file,hw_file,linelist_file and rewrite the DATASETS.TEST to point the yaml file.

@vinson2233 hi, i still have some questions.firstly,"dataset should point to directory" what does this mean, after i have already set up yaml and related files, what should i change in vinvl_x152c4.yaml besides DATASETS.TSET! Or can you show your vinvl_x152c4.yaml? i will thank a lot!
what other work should i do to run test_sg_net.py

@BigHyf
Copy link

BigHyf commented Oct 24, 2021

@GabrieleFerrario, you may also look at https://github.com/microsoft/scene_graph_benchmark/issues/8#issue-857781982 for some ideas.

hi, i try it, it really works. but how can i use this feature to run run_captioning.py, file config seems to be different

@via815via
Copy link

Hi, I think we must to use the TSV file. We can generate the TSV using https://github.com/microsoft/scene_graph_benchmark/blob/main/tools/mini_tsv/tsv_demo.py . data_path is directory to your image folder. tsv_file,label_file,hw_file,linelist_file is your output directory and name. You just need to setup the directory you want.

Just run the script up to line 56. It will produce tsv_file,label_file,hw_file,linelist_filein your destination. But it seems that all the file it produces missing a single config.yaml. See my answer on creating the yaml file. #6 (comment)

After that, you can run the script test_sg_net.py for image features extraction according to the setting in readme.md. Don't forget that dataset should point to directory that containing tsv_file,label_file,hw_file,linelist_file and rewrite the DATASETS.TEST to point the yaml file.

How can I get the label_file? My dataset doesn't have a list of dictionary where each box with at least "rect" (xyxy mode) and "class" fields. Thank you very much!
This is marked in the original code:Here is just a toy example of labels. The real labels can be generated from the annotation files given by each dataset. The label is a list of dictionary where each box with at least "rect" (xyxy mode) and "class" fields. It can have any other fields given by the dataset.

knife982000 pushed a commit to AU-Nebula/scene_graph_benchmark that referenced this issue Sep 20, 2022
Resolve conflicts between main and KERN_NEMESIS branches + updates
@abhidipbhattacharyya
Copy link

Hi, I found a way to circumvent using tsv files by modifying scene_graph_benchmark/tools/demo/demo_image.py, and now I only need jpg image dataset, VinVl yaml configuration file and model weight file. The predictions are saved in dictionary and are stored in pth format. I ran it on Google Colab and it generates predictions at a rate about 2s/image. I hope this helps.

# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json

import cv2
import os
import os.path as op
import argparse
import json
from PIL import Image


from scene_graph_benchmark.scene_parser import SceneParser
from scene_graph_benchmark.AttrRCNN import AttrRCNN
from maskrcnn_benchmark.data.transforms import build_transforms
from maskrcnn_benchmark.utils.checkpoint import DetectronCheckpointer
from maskrcnn_benchmark.config import cfg
from scene_graph_benchmark.config import sg_cfg
from maskrcnn_benchmark.data.datasets.utils.load_files import \
    config_dataset_file
from maskrcnn_benchmark.data.datasets.utils.load_files import load_labelmap_file
from maskrcnn_benchmark.utils.miscellaneous import mkdir

def cv2Img_to_Image(input_img):
    cv2_img = input_img.copy()
    img = cv2.cvtColor(cv2_img, cv2.COLOR_BGR2RGB)
    img = Image.fromarray(img)
    return img


def detect_objects_on_single_image(model, transforms, cv2_img):
    # cv2_img is the original input, so we can get the height and 
    # width information to scale the output boxes.
    img_input = cv2Img_to_Image(cv2_img)
    img_input, _ = transforms(img_input, target=None)
    img_input = img_input.to(model.device)

    with torch.no_grad():
        prediction = model(img_input)[0].to('cpu')
    #     prediction = prediction[0].to(torch.device("cpu"))

    img_height = cv2_img.shape[0]
    img_width = cv2_img.shape[1]

    prediction = prediction.resize((img_width, img_height))
    
    return prediction

#Setting configuration
cfg.set_new_allowed(True)
cfg.merge_from_other_cfg(sg_cfg)
cfg.set_new_allowed(False)
#Configuring VinVl
cfg.merge_from_file('/scene_graph_benchmark/sgg_configs/vgattr/vinvl_x152c4.yaml')

#This is a list specifying the values for additional arguments, it encompasses pairs of list and values in an ordered manner
#MODEL.WEIGHT specifies the full path of the VinVl weight pth file
#DATA_DIR specifies the directory that contains VinVl input tsv configuration yaml file
argument_list = [
                 'MODEL.WEIGHT', 'vinvl_vg_x152c4.pth',
                 'MODEL.ROI_HEADS.NMS_FILTER', 1,
                 'MODEL.ROI_HEADS.SCORE_THRESH', 0.2, 
                 'TEST.IGNORE_BOX_REGRESSION', False,
                 'MODEL.ATTRIBUTE_ON', True
                 ]
cfg.merge_from_list(argument_list)
cfg.freeze()

#     assert op.isfile(args.img_file), \
#         "Image: {} does not exist".format(args.img_file)

output_dir = cfg.OUTPUT_DIR
#     mkdir(output_dir)

model = AttrRCNN(cfg)
model.to(cfg.MODEL.DEVICE)
model.eval()

checkpointer = DetectronCheckpointer(cfg, model, save_dir=output_dir)
checkpointer.load(cfg.MODEL.WEIGHT)

transforms = build_transforms(cfg, is_train=False)

input_img_directory = 'insert your images directory path here'
#need to be pth
output_prediction_file = 'insert your output pth file path here'
dets = {}
for img_name in os.listdir(input_img_directory):
  #Convert png format to jpg format
  if img_name.split('.')[1]=='png' or img_name.split('.')[1]=='PNG':
    im = Image.open(os.path.join(input_img_directory, img_name))
    rgb_im = im.convert('RGB')
    new_name = img_name.split('.')[0]+'.jpg'
    rgb_im.save(os.path.join(input_img_directory, new_name))
    print(new_name)

  img_file_path = os.path.join(input_img_directory,img_name.split('.')[0]+'.jpg')
  print(img_file_path)

  cv2_img = cv2.imread(img_file_path)

  det = detect_objects_on_single_image(model, transforms, cv2_img)
  
#   prediction contains ['labels',
#  'scores',
#  'box_features',
#  'scores_all',
#  'boxes_all',
#  'attr_labels',
#  'attr_scores']
# box_features are used for oscar

  det_dict ={key : det1[0].get_field(key) for key in det1[0].fields()}

  dets[img_name.split('.')[0]] = det_dict


torch.save(dets, output_prediction_file)

Hello,

Hi @SPQRXVIII001, I don't understand how you can extract the box features from the demo_image.py file without using the TSV files. I tried to run your code with some modifications because there are some errors and when I print the list of fields extracted by prediction there is not the field you call 'box_features'; in fact running print("fields:", prediction.fields()) there are only the fields: ['labels', 'scores', 'attr_labels', 'attr_scores'].

Hi,
I am having some errors. I change det1 to det. Still I got

"bbox should have 2 dimensions, got {}".format(bbox.ndimension())
ValueError: bbox should have 2 dimensions, got 1

Could you please share your updated code.
Thanks,
Abhidip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants