-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting from YOLO annotation files to a format readable by the VGG Image Annotator (VIA) #7
Comments
Hi @valentinitnelav, You got this error because you passed a import globox
import os
from pathlib import Path
# Get current directory based on where this script file is located
local_dir = Path(__name__).absolute().parent
yolo_preds = globox.AnnotationSet.from_yolo(
folder = local_dir / "test/labels/", # Path to yolo prediction txt files
image_folder = local_dir / "test/img/" # Path to images
) VIA annotation file format is currently not supported. I'll take a look and implement it if not too complex. |
Thanks @laclouis5 , If you want to invest some time into the file format for VIA, these links might be of help:
I have R code that does this for my own pipelines and was thinking to convert it to Python, but I simply do not get the time |
Thanks for the documentation, I invested some time on this issue. It looks like that the CSV format is hard to implement correctly because of weird formatting used by the VIT annotation tool. Moreover, CSV is generally a bad serialisation format so I will not implement VIT CSV parsing and conversion. The JSON format was much simpler and straightforward to implement and I think that it works properly. One weirdness of this format is that it requires reading the images to get their sizes in bytes. Could you checkout https://github.com/laclouis5/globox/tree/vit-ann-csv-support and try it with your data please? The command line is: globox convert -f yolo -F vit-json --img_folder <path/to/img/folder> <yolo/folder/> <output.json> If you prefer using the library: from globox import AnnotationSet
from pathlib import Path
image_folder = Path("yolo/images/")
yolo_preds = AnnotationSet.from_yolo(
folder = Path("yolo/predictions/"),
image_folder = image_folder,
)
yolo_preds.save_vit_json(Path("output.json"), image_folder = image_folder) |
VIA says that cannot import the created output.json file as it is corrupted. If you have a look at the via_project_4Nov2022_11h43m.json file that I uploaded above, it needs some extra structure elements that are missing from the output.json. The structure of the two files differ. I admit that the VIA's JSON structure can be confusing, but one of the power of VIA is to have flexibility regarding the attribute table - a user can define their own attribute table and that is why I use VIA, plus is just a simple html file that runs on any browser. Also, I get negative coordinates for the bounding boxes (regions) and that should not happen. Also, I think the coordinates for VIA should be rounded to the nearest integer (they are integer pixel values). At this link you find a zip file with:
If I understood correctly, the acronym for the VGG Image Annotator is VIA, not VIT. This is my Python code for testing on a sample of 4 images (to which the upload via_project_4Nov2022_11h43m.json file corresponds): # Try the new vit-ann-csv-support branch from globox;
# In a terminal (I use Linux):
'''
pip uninstall globox
pip install git+https://github.com/laclouis5/globox.git@vit-ann-csv-support
'''
import globox # https://github.com/laclouis5/globox
from pathlib import Path
local_dir = Path(__name__).absolute().parent # current directory based on where this script file is located
txt_folder = Path(local_dir, 'test', 'labels') # path to folder with the yolo prediction txt files
image_folder = Path(local_dir, 'test', 'img') # path to folder with images
yolo_preds = globox.AnnotationSet.from_yolo(
folder = txt_folder,
image_folder = image_folder)
# Save as json in local_dir/test
yolo_preds.save_vit_json(
path = Path(local_dir, 'test', 'output.json'),
image_folder = image_folder) |
The JSON file you linked corresponds to the VIA project format. In addition to annotations data, this file format also saves VIA project settings. Meanwhile, VIA project file: {
"_via_settings": { },
"_via_img_metadata": { }
"_via_attributes": { }
"_via_data_format_version": "..."
"_via_image_id_list": [ ]
} The
Globox supports negative coordinates for few reasons:
In generally this should not cause any issue, both for annotations tool and for CNN's training. You can always clip the coordinates if this is a problem for you. Also, Globox represents coordinates as floats rather than integers because rounding could cause losing some precision during annotation conversion and evaluation. Float coordinates are not an issue for VIA. If you really need integer coordinates for some reason, you can always round them.
Thanks for reporting this typo, I'll update the name in the next commit. Could you please try to import the One last note. VIA exists in two major versions: 2 and 3. I'm assuming that your are using VIA 2 because VIA 3 seems to work differently and its Project/Annotations format is different from VIA 2. |
Hi @laclouis5 , Thanks for putting your free time into this. Indeed, I loaded through menu project > Load. It works to load the However, I noticed that there is a big issue about the coordinates. It looks to me that the code doesn't read the coordinates properly. Take for example the label file
2 = label id Then there is also an issue regarding how VIA reads the image width and height and might differ from how PIL or cv2 read them - see this issue: https://gitlab.com/vgg/via/-/issues/380 |
Great to hear that, import seems functional.
It looks like that the annotation format you are using is not YOLO nor one I know about. Yolo predictions are I could add support for this format in Globox if its a widely used format and there is demand for such case. In general I want to avoid bloating Globox with very specific or barely used format, the less the better. I advise you to check on your side if the code generating such annotations is correct. You'll have a better compatibility with existing tools and less opportunities for bugs and errors if you store your annotations in a well known format. As a fallback, you can add support for your own annotation format by implementing functions on top Globox. This package is designed to be easily extended, for instance: from globox import BoundingBox, Annotation, AnnotationSet
def read_my_custom_annotation(file) -> Annotation:
# Read raw data
data = ...
# Loop over and create bounding boxes
boxes = []
for _ in data:
box = BoundingBox.create(...)
boxes.append(box)
# Return the annotation
return Annotation(..., boxes=boxes)
def read_annotations(folder) -> AnnotationSet:
# If annotations are stored in individual txt files:
return AnnotationSet.from_folder(
folder=folder,
extension=".txt",
parser=read_my_custom_annotation
)
# If the `parser` callable takes more than the only required `Path` argument, use `functools.partial()` You can take inspiration from what I wrote for
I'll take a look at this issue and make sure that Globox reads the correct image size. Update: As you noted, Opening then immediately saving the image with PIL solves orientation issue, indicating that it is probably a file corruption issue: from PIL import Image
path = "Diptera_Anthomyiidae_Delia_lamelliseta_2075125.jpg"
Image.open(path).save(path) Given that, Globox (but also PIL and other tools) seem to read the image size correctly. I recommend that you check the code generating the images for potential errors. If you cannot modify such code, save again the images with PIL of modify the EXIF values to solve the issue. |
Thanks @laclouis5
The format was given by detect.py from the YOLOv5 & v7 repositories if one chooses the |
Ok, I'll take a look at the format from the YOLOv7 repo and implement it later. If everything is right about VIA support in globox for you (YOLOv7 aside), I'll close this issue and merge the changes in the main branch. |
I think it will work if you read confidence in the last position of the line. Please see https://github.com/ultralytics/yolov5/blob/master/detect.py#L159; when |
Alright, I finished the implementation of VIT and YOLOv7 annotation formats and published the new package to PyPI. For YOLOv7, just specify |
I have YOLOv7 predictions like this:
I want to read the YOLO txt files and the image width & height and convert that info to a format readable by the VGG Image Annotator (VIA).
There are 2 options: CSV or JSON file formats.
Here are two files as examples:
via_project_4Nov2022_11h34m_csv.csv
via_project_4Nov2022_11h43m.json
Would it be possible to make use of the functionality of your package to easily convert from YOLo txt annotation files to something that i can easily import in the VGG Image Annotator (VIA)?
Meanwhile, I went ahead to try out
globox.AnnotationSet.from_yolo
and got this error messageAttributeError: 'str' object has no attribute 'is_dir'
:The text was updated successfully, but these errors were encountered: