Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about loading data #5

Closed
ZeqiMao opened this issue Aug 24, 2020 · 3 comments
Closed

Questions about loading data #5

ZeqiMao opened this issue Aug 24, 2020 · 3 comments
Assignees

Comments

@ZeqiMao
Copy link

ZeqiMao commented Aug 24, 2020

Hi!
Thank you for this comprehensive dataset! This might be a really stupid question but I have trouble getting the audio file. I use google colab. I followed the tutorial and loaded the first 2 .gz file using the following code as a demo:

path = '/ME/My Drive/Colab Notebooks/DALI-master/'

dali_data_path = path + 'annot_tismir/'

dali_data = dali_code.get_the_DALI_dataset(dali_data_path, skip=[], keep=["0a0a723686924d228daef2a2f692d437","0a1a15671536498f8a856da781c017d7"])

dali_info = dali_code.get_info(dali_data_path + '0a1a15671536498f8a856da781c017d7.gz')

dali_info

(Got output <DALI.Annotations.Annotations at 0x7f9925c4c160>)

path_audio = '/ME/My Drive/Colab Notebooks/DALI-master/audio'

errors = dali_code.get_audio(dali_info, path_audio, skip=[], keep=[])

Here I got error message "TypeError: 'Annotations' object is not subscriptable".
Plus if I try to print(dali_info[0]), the same error message pops up.

Could you please tell me if there's anything wrong with my data loading?

Besides, I noticed a lot of youtube link is shows "working: False"...I'm not sure if this would affect data loading. Shall I submit request for an updated version of data?

entry = dali_data['0a1a15671536498f8a856da781c017d7']
entry.info

the output looks like
{'artist': 'Janis Ian',
'audio': {'path': 'DALI_v2.0/audio/0a1a15671536498f8a856da781c017d7.mp3',
'url': 'iepedfdjA80',
'working': False},...

Thank you in advance for your help.

@gabolsgabs
Copy link
Owner

Hello Program-Kitty,

Thank you for your interest in the DALI dataset. I'm sorry if the doc wasn't clear enough. I'm preparing a PR that hopefully will clarify some vague explanations.

The dali_code.get_info is meant to be use with a dali_info.gz file which I forgot to add in the zenodo repo for the second version. However, you can easily create it by yourself as follow:

dali_info = [['DALI_ID', 'NAME', 'YOUTUBE', 'WORKING']]
for i in dali_data:
    uid = dali_data[i].info['id']
    name = "-".join([dali_data[i].info['artist'].replace(" ", "_"), dali_data[i].info['title'].replace(" ", "_")])
    youtube = dali_data[i].info['audio']['url']
    working = ''
    dali_info.append([uid, name, youtube, working])

And then just call the function as normal:

errors = dali_code.get_audio(dali_info, path_audio, skip=[], keep=[])

This should work. Please let me know if you have any trouble or further issues.

Best regards,
Gabriel

@gabolsgabs gabolsgabs self-assigned this Aug 31, 2020
@ZeqiMao
Copy link
Author

ZeqiMao commented Sep 4, 2020

Hi gabolsgabs!
Thank you for your response -- I solved the issue following your instruction. I loaded dali_data (containing 7756 entries) following the instruction in tutorial. But when I tried to load the ground-truth file, I came across this issue:

path = '/ME/My Drive/Colab Notebooks/DALI-master/'
gt_file = path + 'gt_v1.0_22_11_18.gz'
gt = dali_code.utilities.read_gzip(gt_file)
dali_gt = dali_code.get_the_DALI_dataset(dali_data_path, gt_file, keep=gt.keys())

KeyError Traceback (most recent call last)
in ()
----> 1 dali_data = dali_code.update_with_ground_truth(dali_data, gt_file)

/usr/local/lib/python3.6/dist-packages/DALI/main.py in update_with_ground_truth(dali, gt_file)
64 if len(gt) > 0:
65 for i in gt:
---> 66 entry = dali[i]
67 change_time(entry, gt[i]['offset'], gt[i]['fr'])
68 entry.info['ground-truth'] = True

KeyError: '557037a547e84ddba8148c137eee0eb5'

I checked our dataset and didn't find this key in dali_data.
Could you please check if this is the case on your end? I wonder if there's any issue with our ground truth file...

@gabolsgabs
Copy link
Owner

The ground-truth file does not work for version 2. The ids are different and the alignment may be also different. If you plan to use the ground-truth (which only refers to the right offset and frame rate parameters) please use version 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants