Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in tiff images while executing python generate_aitod_imgs.py. #21

Open
Pranav051100 opened this issue Oct 23, 2022 · 21 comments
Open

Comments

@Pranav051100
Copy link

Hello Sir,

I followed the steps as mentioned, and go to the execution part of "python generate_aitod_imgs.py". However, in my terminal, the error depicted in the below given image is being shown. Please do let me know of a way in which I can rectify it and whether the error is in some of the tiff images only or something else. Thanks.

Screenshoterror

@Chasel-Tsui
Copy link
Collaborator

Maybe some error occurs when uncompressing the files, some .tiff files may be broken when using the ".zip" file. It seems that using the provided ".tar" file for uncompressing can solve this problem.

@Pranav051100
Copy link
Author

I tried it in the manner you suggested, and downloaded the tar files from xview for both the training set and the labels geojson file. However the same error persists.

I have attached another screenshot that shows the tiff images within the train_images folder (and there appears to be no issues with their names). However if you check the error in the perviously added screenshot, you can find the file name ._100.tiff where i guess its name is what is causing the issue but its not there in the train_images set while checking the file directly for that particular image (._100.tiff).

Please do let me know if there's a way I could go about to help resolve this, and any help you provide will be quite valuable.

Screenshot from 2022-10-24 15-54-46

@Chasel-Tsui
Copy link
Collaborator

Hi,
Is it possible to provide an email address? maybe I can directly send you the ai-tod dataset.

@Pranav051100
Copy link
Author

Yes sure thank you, my email id is pranavhari2000@gmail.com. You could share it on this one. After which I just need to replace the train_images folder by the one you send right?

@Chasel-Tsui
Copy link
Collaborator

Yes

@Pranav051100
Copy link
Author

Ok sir I received the onedrive link, and it seems that with regards to its contents (i.e the aitod dataset) there isn't any issue, as this error popping up in the terminal is purely based on the tiff images within the xview dataset.

I am actually using the aitod dataset for running your algorithm of RFLA, for which I followed the instructions of this page to use the aitod dataset. So can I directly use only the images of aitod part and leave out the xview images for its execution? Or I need to include the tiff images also? Because I am not aware of how to solve the issue thats popping up with regards to the unrecognized tiff files.

@Chasel-Tsui
Copy link
Collaborator

The link is the whole aitod dataset (as mentioned in the aitod paper, it is constructed partly based on the xview, so the provided data already contains the xview part), it is the same as what is generated from this repo. You can directly use the dataset from the provided link as training data. And you do not need to include the tiff images.

@Pranav051100
Copy link
Author

Oh ok sir. So I should directly include the dataset images and its annotations in a folder within the mmdet-rfla cloned folder before running "python setup.py develop" right??

@Chasel-Tsui
Copy link
Collaborator

"python setup.py develop" is for installing the repo, the organization of training data can refer to the official guide of (mmdetection)[https://github.com/open-mmlab/mmdetection/blob/master/docs/en/2_new_data_model.md], you need to create a folder named data inside the mmdet-rfla folder and put the images and anns into this data folder, please refer to the link for details.

@Pranav051100
Copy link
Author

Ok sir got it thanks for the help. Finally, after adding the data in this manner, I'd move forward by executing the train.py available within the tools folder right? (I tried it already tho, and that too requests the mmcv version to be between 1.3.2 and 1.4.0 and not the latest) Just wanted to clarify this last part about executing the train.py . Thank you.

@Chasel-Tsui
Copy link
Collaborator

Yes, you may need to install the mmcv of the required version and execute the train.py command with the config file.

@Pranav051100
Copy link
Author

Hello Sir, I was able to get through with the previously mentioned issues, and while I'm simply using the AI-TOD set of images (leaving out the tiff images), there's an issue with regards to unpacking that I'm facing (screenshot attached below). I am still quite new when it comes to analyzing the code and debugging hence I thought of putting it here to get some ideas from anything that you could help me with.

Please do let me know how to approach this and any help will be really appreciated. Thank you.

ScreenshotRFLA

@Chasel-Tsui
Copy link
Collaborator

Hi, maybe you can enter the mmdet-rfla dir and try to execute:

python tools/train.py configs/rfla/aitod_cascade_r50_rfla_rfla_klf_1x.py

Under the mmdetection framework, some errors may occur if the command is not executed under the correct directory.

@Pranav110500
Copy link

Hi Sir, thanks for providing the update, however even after executing the said statement in the mmdet-rfla dir, I have come upon a new error, and I'm not sure how this has come to happen, (I did try to delete and redo the setup of cloning the rfla folder and proceeding). Please do let me know if I should simply reset it and uninstall all packages etc and retry it.

WTFerror2

@Chasel-Tsui
Copy link
Collaborator

Hi, it seems that there is a mismatch between the installed mmcv and the required mmcv of the mmdet. The mmcv used in my mmdet-rfla is mmcv-full v1.3.8, maybe you can try it.

@Pranav051100
Copy link
Author

Sir, I'm having some issue with installing previous versions of mmcv-full in my system as well (like 1.3.8 as you mentioned) so for my openmmlab environment i am thinking of specifically reinstalling the appropriate versions of torch, mmcv and other requirements. Could you specify which all versions are you utilizing for the rfla code (just as you specified about mmcv 1.3.8)?

@Chasel-Tsui
Copy link
Collaborator

All packages in the rfla env are shown below. Besides, I used RTX3090 gpu with cuda 11.1:
addict 2.4.0
aitodpycocotools 12.0.3
albumentations 1.2.0
appdirs 1.4.4
asynctest 0.13.0
attrs 21.4.0
blessings 1.7
certifi 2022.6.15
charset-normalizer 2.1.0
cityscapesScripts 2.2.0
click 8.1.3
codecov 2.1.12
colorama 0.4.5
coloredlogs 15.0.1
coverage 6.4.1
cycler 0.10.0
Cython 0.29.30
flake8 4.0.1
gpustat 0.6.0
humanfriendly 10.0
idna 3.3
imagecorruptions 1.1.2
imageio 2.19.3
importlib-metadata 4.2.0
iniconfig 1.1.1
interrogate 1.5.0
isort 4.3.21
joblib 1.1.0
kiwisolver 1.3.1
kwarray 0.6.2
matplotlib 3.4.2
mccabe 0.6.1
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
mmcv-full 1.3.8
mmdet 2.13.0 /home/xuchang/mmdet-rfla
networkx 2.6.3
numpy 1.21.5
nvidia-ml-py3 7.352.0
olefile 0.46
onnx 1.7.0
onnxruntime 1.5.1
opencv-python-headless 4.6.0.66
packaging 21.3
Pillow 9.2.0
pip 21.2.2
pluggy 1.0.0
protobuf 4.21.2
psutil 5.9.1
py 1.11.0
pycocotools 2.0.4
pycodestyle 2.8.0
pyflakes 2.4.0
pyparsing 2.4.7
pyquaternion 0.9.9
pytest 7.1.2
python-dateutil 2.8.1
PyWavelets 1.3.0
PyYAML 5.4.1
qudida 0.0.4
requests 2.28.1
scikit-image 0.18.3
scikit-learn 1.0.2
scipy 1.7.3
setuptools 61.2.0
six 1.16.0
sklearn 0.0
tabulate 0.8.10
terminaltables 3.1.0
threadpoolctl 3.1.0
tifffile 2021.11.2
toml 0.10.2
tomli 2.0.1
torch 1.10.0
torchaudio 0.10.0
torchvision 0.11.0
tqdm 4.64.0
typing 3.7.4.3
typing_extensions 4.3.0
ubelt 1.1.2
urllib3 1.26.10
wheel 0.37.1
xdoctest 1.0.0
yapf 0.31.0
zipp 3.8.0

@Pranav051100
Copy link
Author

Pranav051100 commented Nov 30, 2022

Hi @Chasel-Tsui Sir, so I was able to re-install the packages and try the training execution. In this manner, it ran through and through for the given 12 epochs, using "python tools/train.py configs/rfla/aitod_faster_r50_rfla_kld_1x.py". But at this point a RunTimeError turns up (shown as below).

training_end

Even though I'm trying to make the changes in the bbox_nms.py folder by adding .cpu() in the inds variables, it isn't working, so I thought of asking you about the same.
The said error carries forward to when I'm trying to evaluate it as well using "python tools/test.py configs/rfla/aitod_faster_r50_rfla_kld_1x.py work_dirs/aitod_faster_r50_rfla_kld_1x/epoch_8.pth --eval mAP".

@Chasel-Tsui
Copy link
Collaborator

It seems that this error occurs in the inference stage when using the get_bboxes() function. Maybe you need to verify whether the inputs of the multiclass_nms() function in the bbox_head.py (as indicated in the error "line 371") are in the same device, if not, you need to transfer the input to the same device.

@Pranav051100
Copy link
Author

Hey sir, I am able unable to understand exactly how to transfer the input to the same device, as mentioned in my previous comment, I was trying to use .cpu() to get the inds (indices) to the cpu part, however I'm unable to do the same in the function mentioned in your comment above (in bbox_head.py). Can you uggest how exactly should I go about transfering to either the cpu or gpu?

@Chasel-Tsui
Copy link
Collaborator

Hi, to my best knowledge, this error occurs when the "inds" and its indexed variables are not in the same device. To tackle this problem, you can try to debug by printing out the device of correlated variables and check whether they are in the same device. Then, you can transfer the problematic variable by command "to(device)", for example, the "device" can be obtained by the command "inds.device". This checking process can be done in bbox_nms, or you can check whether the input in the multiclass_nms() are in the same device as the "inds".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants