Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use the dataset :( #1

Closed
diveu opened this issue May 20, 2020 · 11 comments
Closed

Can't use the dataset :( #1

diveu opened this issue May 20, 2020 · 11 comments

Comments

@diveu
Copy link

diveu commented May 20, 2020

Hi! I'm working on my master thesis and trying to use your ICA labeled data to train my model to detect artifacts.
I keep getting 500 Error:
Downloading individual ICLabel training set CL label files...
Downloading label file 0 of 2...
HTTP Error: 500 https://labeling.ucsd.edu/download/ICLabels_experts.pkl
Downloading label file 1 of 2...
Done.
Loading full dataset...

and cant open features dataset:
`---------------------------------------------------------------------------
IOError Traceback (most recent call last)
in ()
----> 1 icl.load_data()

/Users/ivkitov/univer/diploma/diploma_code/data/ICLabel-Dataset/icldata.py in load_data(self)
954 self.check_for_download('train_features')
955 # topo maps, old psd, dipole, and handcrafted
--> 956 with h5py.File(join(self.datapath, 'features', 'features_0D1D2D.mat'), 'r') as f:
957 print('Loading 0D1D2D features...')
958 features.append(np.asarray(f['features']).T)

/Users/ivkitov/anaconda3/envs/python2Env/lib/python2.7/site-packages/h5py/_hl/files.pyc in init(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
406 fid = make_fid(name, mode, userblock_size,
407 fapl, fcpl=make_fcpl(track_order=track_order),
--> 408 swmr=swmr)
409
410 if isinstance(libver, tuple):

/Users/ivkitov/anaconda3/envs/python2Env/lib/python2.7/site-packages/h5py/_hl/files.pyc in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
171 if swmr and swmr_support:
172 flags |= h5f.ACC_SWMR_READ
--> 173 fid = h5f.open(name, flags, fapl=fapl)
174 elif mode == 'r+':
175 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.open()

IOError: Unable to open file (truncated file: eof = 6178545152, sblock->base_addr = 512, stored_eof = 14419006177)`

Can you help me please?

@ledovsky
Copy link

I have the same =(

@lucapton
Copy link
Owner

lucapton commented Mar 3, 2021

I'm sorry for having missed this for so long. The problem is that the automatic file download fails and then it can't find the file. I'll see if I can fix the problem. In the meantime, you can download the files manually.

The block of code containing file urls:

        self.base_url_download = 'https://labeling.ucsd.edu/download/'
        self.feature_train_zip_url = self.base_url_download + 'features.zip'
        self.feature_train_urls = [
            self.base_url_download + 'features_0D1D2D.mat',
            self.base_url_download + 'features_PSD_med_var_kurt.mat',
            self.base_url_download + 'features_AutoCorr.mat',
            self.base_url_download + 'features_ICAChanlocs.mat',
            self.base_url_download + 'features_MI.mat',
        ]
        self.label_train_urls = [
            self.base_url_download + 'ICLabels_experts.pkl',
            self.base_url_download + 'ICLabels_onlyluca.pkl',
        ]
        self.feature_test_url = self.base_url_download + 'features_testset_full.mat'
        self.label_test_url = self.base_url_download + 'ICLabels_test.pkl'
        self.db_url = self.base_url_download + 'anonymized_database.sqlite'
        self.cls_url = self.base_url_download + 'other_classifiers.mat'

@lucapton
Copy link
Owner

lucapton commented Mar 7, 2021

This is actually 2 different problems.

  1. ICLabels_experts.pkl should be ICLabels_expert.pkl
  2. Downloading features_0D1D2D.mat stops before the file is complete.

@lucapton
Copy link
Owner

lucapton commented Mar 8, 2021

Item (1) has been fixed, but I'm still having trouble with (2). My attempt at fixing it was to recreate the zip archive as a multi-disk zip where each file is no more than 1 GB. Unfortunately python's zipfile library does not support multi-disk zips. I'm looking into using libarchive instead but I've run out of time for now.

@lucapton
Copy link
Owner

I believe this is fixed. I confirmed the download now works but I can't actually load the dataset on my personal computer due to lack of RAM. If there are any problems, feel free to reopen this issue.

@datalw
Copy link

datalw commented Mar 18, 2021

@lucapton Thank you for sharing the valuable resource! I tried to download the data, but had a similar error:

Loading full dataset...
Traceback (most recent call last):
  File "e:\GoogleDriveBB\Program\ICLabel-Train\loading_data.py", line 4, in <module>
    icldata = icl.load_semi_supervised()
  File "e:\GoogleDriveBB\Program\ICLabel-Train\icldata.py", line 1231, in load_semi_supervised
    icl = self.load_data()
  File "e:\GoogleDriveBB\Program\ICLabel-Train\icldata.py", line 969, in load_data
    with h5py.File(join(self.datapath, 'features', 'features_0D1D2D.mat'), 'r') as f:
  File "C:\ProgramData\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 406, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "C:\ProgramData\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'features\features_0D1D2D.mat', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I also tried to download manually from the link https://labeling.ucsd.edu/download, but it seems that this page does not exist... I tried in different browsers and got the same page as below:
grafik

lucapton added a commit that referenced this issue Mar 19, 2021
@lucapton
Copy link
Owner

@datalw
You need to call download_trainset_features in order to download the files that it says are missing (I've updated the readme to show show that). If you want to download them manually, you have to use the link for the exact files:
https://labeling.ucsd.edu/download/features_0D1D2D.mat
https://labeling.ucsd.edu/download/features_PSD_med_var_kurt.mat
https://labeling.ucsd.edu/download/features_AutoCorr.mat
https://labeling.ucsd.edu/download/features_ICAChanlocs.mat
https://labeling.ucsd.edu/download/features_MI.mat

What I find curious though is that if you don't have the files you should have hit an assertion error at line 967 of iclabel.py. Any idea why it didn't?

@datalw
Copy link

datalw commented Mar 19, 2021

@datalw
You need to call download_trainset_features in order to download the files that it says are missing (I've updated the readme to show show that). If you want to download them manually, you have to use the link for the exact files:
https://labeling.ucsd.edu/download/features_0D1D2D.mat
https://labeling.ucsd.edu/download/features_PSD_med_var_kurt.mat
https://labeling.ucsd.edu/download/features_AutoCorr.mat
https://labeling.ucsd.edu/download/features_ICAChanlocs.mat
https://labeling.ucsd.edu/download/features_MI.mat

What I find curious though is that if you don't have the files you should have hit an assertion error at line 967 of iclabel.py. Any idea why it didn't?

@lucapton Thanks a lot! Right now both ways work - downloading via the three-line codes and with the links above : )
I have checked why it did not work, here is what I have found:
grafik

As you see, in the line 1708 I printed data_type, it is a string instead of a list as shown in the terminal. That's why val takes only one letter t, which cannot find its compatible case in the if-elif cases ; )

@lucapton
Copy link
Owner

Thanks for finding that! Glad it works for you.

@lucapton
Copy link
Owner

So I looked into it and I should say that the code works as-is in Python 2.7 but not in 3.x which it appears you're using. I just want to provide a warning that I wrote this all in 2.7 and can't guarantee anything for python 3. I realize that was a poor choice on my part but it's just a fact not. That said, if you run into anymore issue, let me know and I'll try to fix them. I'm about to push a change that makes this specific piece of the code work for both.

@datalw
Copy link

datalw commented Mar 25, 2021

So I looked into it and I should say that the code works as-is in Python 2.7 but not in 3.x which it appears you're using. I just want to provide a warning that I wrote this all in 2.7 and can't guarantee anything for python 3. I realize that was a poor choice on my part but it's just a fact not. That said, if you run into anymore issue, let me know and I'll try to fix them. I'm about to push a change that makes this specific piece of the code work for both.

Thanks for the note! There was no big incompatibility, as I run the dataset codes. What I had to change were only a few lines of print.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants