Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_covidx no longer compatible with old datasets #127

Closed
Electro1111 opened this issue Feb 7, 2021 · 4 comments
Closed

create_covidx no longer compatible with old datasets #127

Electro1111 opened this issue Feb 7, 2021 · 4 comments

Comments

@Electro1111
Copy link

in addition to the issue I posted here:
#126 (comment)

the create_covidx.ipynb also doesnt seem to work for anything earlier like covidx5.txt because there are files in the train_covidx5.txt and test_covidx5.txt files that are missing in data/test and data/train after running create_covidx.ipynb

specifically

{'COVID-19(216).png', 'COVID-19(106).png', 'COVID-19(94).png', 'COVID-19(107).png', 'COVID-19(77).png', 'COVID-19(129).png', 'COVID-19(119).png', 'COVID-19(215).png', 'COVID-19(213).png', 'COVID-19(116).png', 'COVID-19(214).png', 'COVID-19(95).png', 'COVID-19(70).png', 'COVID-19(81).png', 'COVID-19(87).png', 'COVID-19(131).png', 'COVID-19(72).png'}

are missing from the data/test folder and

{'COVID-19(69).png', 'COVID-19(99).png', 'COVID-19(76).png', 'COVID-19(108).png', 'COVID-19(71).png', 'COVID-19(85).png', 'COVID-19(83).png', 'COVID-19(74).png', 'COVID-19(130).png', 'COVID-19(92).png', 'COVID-19(82).png', 'COVID-19(84).png', 'COVID-19(132).png', 'COVID-19(80).png', 'COVID-19(75).png', 'COVID-19(93).png', 'COVID-19(120).png', 'COVID-19(118).png', 'COVID-19(89).png', 'COVID-19(91).png', 'COVID-19(109).png', 'COVID-19(79).png', 'COVID-19(114).png', 'COVID-19(90).png', 'COVID-19(98).png', 'COVID-19(78).png', 'COVID-19(121).png', 'COVID-19(133).png', 'COVID-19(115).png'}

are missing from data/train.

@Electro1111
Copy link
Author

ok so I figured out the issue but not sure what to do about it

it seems that when the images are re-saved into the data folder they are renamed:

in COVIDx7A and in the data folder generated from the current version of create_covidx.ipynb the names of these files are like this missing the '-' that they had in COVIDx5 and earlier versions of the code:

{'COVID(106).png',
'COVID(107).png',
'COVID(116).png',
'COVID(119).png',
'COVID(129).png',
'COVID(131).png',
'COVID(213).png',
'COVID(214).png',
'COVID(215).png',
'COVID(216).png',
'COVID(70).png',
'COVID(72).png',
'COVID(77).png',
'COVID(81).png',
'COVID(87).png',
'COVID(94).png',
'COVID(95).png'}

@baranaldemir
Copy link

baranaldemir commented Mar 10, 2021

@Electro1111 Did you find any solution for this problem I have the same issue? I mean except for renaming the files ofc

@Electro1111
Copy link
Author

Hello!

the issue seems to be with the covid kaggle dataset versions. So the easiest solution is to download the old version of the covid kaggle dataset. Version 1 of the kaggle dataset should work for Covidx5 and earlier I believe since COVIDx5 was commit on 10/24, and version 2 of the kaggle dataset says it came out 3 months ago, which would mean roughly december.

It is also possible that changes to this repo were made, particularly to create_covidx.ipynb to make it work with the new names (but I am not sure), but to be safe it might be better to revert to an old version of this repo as well. I think the Commits on Nov 3, 2020 would be the one you want.

to summarize,

download version 1 of the covid kaggle dataset instead of version 4 (which is the current version)

clone the Nov 3, 2020 version of this repository.

let me know if this helps!

@haydengunraj
Copy link
Collaborator

Closing this now, and also adding that the current dataset is available in a prepared form on Kaggle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants