Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting from YOLO annotation files to a format readable by the VGG Image Annotator (VIA) #7

Closed
valentinitnelav opened this issue Nov 4, 2022 · 11 comments

Comments

@valentinitnelav
Copy link

valentinitnelav commented Nov 4, 2022

I have YOLOv7 predictions like this:

path/to/test/
 |-img     # the images
 \-labels # txt YOLO files (predictions for each image)

I want to read the YOLO txt files and the image width & height and convert that info to a format readable by the VGG Image Annotator (VIA).

There are 2 options: CSV or JSON file formats.

Here are two files as examples:
via_project_4Nov2022_11h34m_csv.csv
via_project_4Nov2022_11h43m.json

Would it be possible to make use of the functionality of your package to easily convert from YOLo txt annotation files to something that i can easily import in the VGG Image Annotator (VIA)?

Meanwhile, I went ahead to try out globox.AnnotationSet.from_yolo and got this error message AttributeError: 'str' object has no attribute 'is_dir':

import globox
import os

# Get current directory based on where this script file is located
local_dir = os.path.dirname(os.path.abspath(__name__))

yolo_preds = globox.AnnotationSet.from_yolo(
    folder = os.path.join(local_dir, 'test', 'labels'), # path to yolo prediction txt files
    image_folder = os.path.join(local_dir, 'test', 'img')) # path to images
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/.../yolo_to_json.py in <module>
      6 local_dir = os.path.dirname(os.path.abspath(__name__))
      7 
----> 8 yolo_preds = globox.AnnotationSet.from_yolo(
      9     folder = os.path.join(local_dir, 'test', 'labels'), # path to yolo prediction txt files
     10     image_folder = os.path.join(local_dir, 'test', 'img')) # path to images

~/.local/lib/python3.10/site-packages/globox/annotationset.py in from_yolo(folder, image_folder, image_extension, verbose)
    188         verbose: bool = False
    189     ) -> "AnnotationSet":
--> 190         return AnnotationSet.from_txt(folder, 
    191             image_folder=image_folder,
    192             box_format=BoxFormat.XYWH,

~/.local/lib/python3.10/site-packages/globox/annotationset.py in from_txt(folder, image_folder, box_format, relative, file_extension, image_extension, separator, verbose)
    150         """This method won't try to retreive the image sizes by default. Specify `image_folder` if you need them. `image_folder` is required when `relative` is True."""
    151         # TODO: Add error handling
--> 152         assert folder.is_dir()
    153         assert image_extension.startswith(".")
    154 

AttributeError: 'str' object has no attribute 'is_dir'
@laclouis5
Copy link
Owner

Hi @valentinitnelav,

You got this error because you passed a str object (as returned by os.path methods) instead of a Path one (as defined by pathlib) to the AnnotationSet.from_yolo() method. Convert your file paths to Paths:

import globox
import os
from pathlib import Path

# Get current directory based on where this script file is located
local_dir = Path(__name__).absolute().parent

yolo_preds = globox.AnnotationSet.from_yolo(
    folder =  local_dir / "test/labels/",  # Path to yolo prediction txt files
    image_folder = local_dir / "test/img/"  # Path to images
)

VIA annotation file format is currently not supported. I'll take a look and implement it if not too complex.

@valentinitnelav
Copy link
Author

Thanks @laclouis5 ,

If you want to invest some time into the file format for VIA, these links might be of help:

I have R code that does this for my own pipelines and was thinking to convert it to Python, but I simply do not get the time

@laclouis5
Copy link
Owner

laclouis5 commented Nov 4, 2022

Thanks for the documentation, I invested some time on this issue.

It looks like that the CSV format is hard to implement correctly because of weird formatting used by the VIT annotation tool. Moreover, CSV is generally a bad serialisation format so I will not implement VIT CSV parsing and conversion.

The JSON format was much simpler and straightforward to implement and I think that it works properly. One weirdness of this format is that it requires reading the images to get their sizes in bytes.

Could you checkout https://github.com/laclouis5/globox/tree/vit-ann-csv-support and try it with your data please? The command line is:

globox convert -f yolo -F vit-json --img_folder <path/to/img/folder> <yolo/folder/> <output.json>

If you prefer using the library:

from globox import AnnotationSet
from pathlib import Path

image_folder = Path("yolo/images/")

yolo_preds = AnnotationSet.from_yolo(
    folder = Path("yolo/predictions/"), 
    image_folder = image_folder, 
)

yolo_preds.save_vit_json(Path("output.json"), image_folder = image_folder)

@valentinitnelav
Copy link
Author

VIA says that cannot import the created output.json file as it is corrupted. If you have a look at the via_project_4Nov2022_11h43m.json file that I uploaded above, it needs some extra structure elements that are missing from the output.json. The structure of the two files differ.

I admit that the VIA's JSON structure can be confusing, but one of the power of VIA is to have flexibility regarding the attribute table - a user can define their own attribute table and that is why I use VIA, plus is just a simple html file that runs on any browser.

Also, I get negative coordinates for the bounding boxes (regions) and that should not happen. Also, I think the coordinates for VIA should be rounded to the nearest integer (they are integer pixel values).

At this link you find a zip file with:

  • images in the img folder,
  • yolo txt file in the labels folder,
  • via.html,
  • the output.json generated with the code below,
  • and my original example of an annotation json file via_project_4Nov2022_11h43m.json (the annotations there do not correspond to the YOLO prediction, they are manually drawn)

If I understood correctly, the acronym for the VGG Image Annotator is VIA, not VIT.

This is my Python code for testing on a sample of 4 images (to which the upload via_project_4Nov2022_11h43m.json file corresponds):

# Try the new vit-ann-csv-support branch from globox; 
# In a terminal (I use Linux):
'''
pip uninstall globox
pip install git+https://github.com/laclouis5/globox.git@vit-ann-csv-support
'''

import globox # https://github.com/laclouis5/globox
from pathlib import Path

local_dir = Path(__name__).absolute().parent # current directory based on where this script file is located
txt_folder = Path(local_dir, 'test', 'labels') # path to folder with the yolo prediction txt files
image_folder = Path(local_dir, 'test', 'img') # path to folder with images

yolo_preds = globox.AnnotationSet.from_yolo(
    folder = txt_folder, 
    image_folder = image_folder) 

# Save as json in local_dir/test
yolo_preds.save_vit_json(
    path = Path(local_dir, 'test', 'output.json'), 
    image_folder = image_folder)

@laclouis5
Copy link
Owner

VIA says that cannot import the created output.json file as it is corrupted. If you have a look at the via_project_4Nov2022_11h43m.json file that I uploaded above, it needs some extra structure elements that are missing from the output.json. The structure of the two files differ.

The JSON file you linked corresponds to the VIA project format. In addition to annotations data, this file format also saves VIA project settings. Meanwhile, AnnotationSet.save_vit_json() saves the annotations in VIA annotation format. However, the two formats are very close. The former only adds additional information on top of the later.

VIA project file:

{
  "_via_settings": {  },
  "_via_img_metadata": {  }
  "_via_attributes": {  }
  "_via_data_format_version": "..."
  "_via_image_id_list": [ ] 
}

The "_via_img_metadata" dictionary correspond to the VIA annotation format. You can copy/paste the content of that file here if you want. However, I would not recommend doing this. Just import the VIA annotation file (as returned by Globox) via the Annotation > Import Annotations (from json) menu of VIA, and not via Project > Load. This is likely what you really want. Globox won't support the VIA Project format since its only purpose is to deal with annotations, not with project settings of annotation tools.

Also, I get negative coordinates for the bounding boxes (regions) and that should not happen. Also, I think the coordinates for VIA should be rounded to the nearest integer (they are integer pixel values).

Globox supports negative coordinates for few reasons:

  • Some CNN may predict objects with negative coordinates
  • Some annotations tools may allow negative coordinates

In generally this should not cause any issue, both for annotations tool and for CNN's training. You can always clip the coordinates if this is a problem for you.

Also, Globox represents coordinates as floats rather than integers because rounding could cause losing some precision during annotation conversion and evaluation. Float coordinates are not an issue for VIA. If you really need integer coordinates for some reason, you can always round them.

If I understood correctly, the acronym for the VGG Image Annotator is VIA, not VIT.

Thanks for reporting this typo, I'll update the name in the next commit.


Could you please try to import the output.json file using the Annotation > Import Annotations (json) menu and tell me if this works for you?


One last note. VIA exists in two major versions: 2 and 3. I'm assuming that your are using VIA 2 because VIA 3 seems to work differently and its Project/Annotations format is different from VIA 2.

@valentinitnelav
Copy link
Author

Hi @laclouis5 ,

Thanks for putting your free time into this. Indeed, I loaded through menu project > Load. It works to load the output.json from menu "Annotation" > "Import annotations (from json)" .

However, I noticed that there is a big issue about the coordinates. It looks to me that the code doesn't read the coordinates properly.

Take for example the label file Cheilosia_morio_gbif_1952343590_media_15517.txt:

2 0.294375 0.355535 0.29125 0.399625 0.717285

2 = label id
0.294375 = is not the model confidence, but x_center
0.355535 = y_center
0.29125 = width
0.399625 = height
0.717285 = model confidence

Then there is also an issue regarding how VIA reads the image width and height and might differ from how PIL or cv2 read them - see this issue: https://gitlab.com/vgg/via/-/issues/380

@laclouis5
Copy link
Owner

laclouis5 commented Nov 8, 2022

Thanks for putting your free time into this. Indeed, I loaded through menu project > Load. It works to load the output.json from menu "Annotation" > "Import annotations (from json)" .

Great to hear that, import seems functional.

However, I noticed that there is a big issue about the coordinates. It looks to me that the code doesn't read the coordinates properly.

Take for example the label file Cheilosia_morio_gbif_1952343590_media_15517.txt:

2 0.294375 0.355535 0.29125 0.399625 0.717285

2 = label id 0.294375 = is not the model confidence, but x_center 0.355535 = y_center 0.29125 = width 0.399625 = height 0.717285 = model confidence

It looks like that the annotation format you are using is not YOLO nor one I know about. Yolo predictions are
are label confidence x_center y_center width height expressed in relative coordinates (between 0 and 1) while your format seems to store the confidence at the end of the line.

I could add support for this format in Globox if its a widely used format and there is demand for such case. In general I want to avoid bloating Globox with very specific or barely used format, the less the better.

I advise you to check on your side if the code generating such annotations is correct. You'll have a better compatibility with existing tools and less opportunities for bugs and errors if you store your annotations in a well known format.

As a fallback, you can add support for your own annotation format by implementing functions on top Globox. This package is designed to be easily extended, for instance:

from globox import BoundingBox, Annotation, AnnotationSet

def read_my_custom_annotation(file) -> Annotation:
  # Read raw data
  data = ...

  # Loop over and create bounding boxes
  boxes = []
  for _ in data:
    box = BoundingBox.create(...)
    boxes.append(box)

  # Return the annotation
  return Annotation(..., boxes=boxes)

def read_annotations(folder) -> AnnotationSet:
  # If annotations are stored in individual txt files:
  return AnnotationSet.from_folder(
    folder=folder,
    extension=".txt",
    parser=read_my_custom_annotation
  )
  # If the `parser` callable takes more than the only required `Path` argument, use `functools.partial()`

You can take inspiration from what I wrote for BoundingBox.from_txt(), Annotation.from_txt() and AnnotationSet.from_txt().

Then there is also an issue regarding how VIA reads the image width and height and might differ from how PIL or cv2 read them - see this issue: https://gitlab.com/vgg/via/-/issues/380

I'll take a look at this issue and make sure that Globox reads the correct image size.

Update:
For an unknown reason, the image you linked in the VIA issue appears to be slightly corrupted or at least some EXIF data confuses most image tools. Normally, image tools should take into account the EXIF Orientation flag and report the apparent image size, not the underlying data size.

As you noted, exiftool says the underlying data size is 336⨉448 (W⨉H) and orientation is 270 CW, thus the apparent size should be 448⨉336. However, both Globox and PIL report the opposite (336⨉448), which is wrong. The issue is not limited to those two. VIA, my computer image viewer and GitLab also read the image incorrectly (the image should be interpreted in landscape orientation, not portrait).

Opening then immediately saving the image with PIL solves orientation issue, indicating that it is probably a file corruption issue:

from PIL import Image

path = "Diptera_Anthomyiidae_Delia_lamelliseta_2075125.jpg"
Image.open(path).save(path)

Given that, Globox (but also PIL and other tools) seem to read the image size correctly. I recommend that you check the code generating the images for potential errors. If you cannot modify such code, save again the images with PIL of modify the EXIF values to solve the issue.

@valentinitnelav
Copy link
Author

Thanks @laclouis5

It looks like that the annotation format you are using is not YOLO nor one I know about. Yolo predictions are
are label confidence x_center y_center width height expressed in relative coordinates (between 0 and 1) while your format seems to store the confidence at the end of the line.

The format was given by detect.py from the YOLOv5 & v7 repositories if one chooses the --save-conf together with the --save-txt options. It puts the confidence at the end. In my case can be useful to also see the model confidence when I do a visual quality check of the detection results as this sometimes can give me useful feedback regarding where the model performs well or not.
So in VIA one would have two attributes: label id and confidence. I think this would be a general standard to have and others can use as well since is the general YOLO output in the circumstances described above.

@laclouis5
Copy link
Owner

laclouis5 commented Nov 16, 2022

Ok, I'll take a look at the format from the YOLOv7 repo and implement it later.

If everything is right about VIA support in globox for you (YOLOv7 aside), I'll close this issue and merge the changes in the main branch.

@valentinitnelav
Copy link
Author

I think it will work if you read confidence in the last position of the line. Please see https://github.com/ultralytics/yolov5/blob/master/detect.py#L159; when save_conf=True, conf is written at the end of the line

@laclouis5
Copy link
Owner

laclouis5 commented Nov 16, 2022

Alright, I finished the implementation of VIT and YOLOv7 annotation formats and published the new package to PyPI. For YOLOv7, just specify conf_last=True in the various from_yolo/to_yolo methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants