-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEDx Talk with ID=D4TE28-L7FI is not available anymore #7
Comments
Hi @david-gimeno, Thanks for raising this issue. Could you post the full trace of the error? Thanks! |
You are right, i should have shared the full error trace. I have run the script twice and this is what I got both times: `Downloading mtedx_es.tgz from https://www.openslr.org/resources/100/mtedx_es.tgz Downloading es videos from YouTube Segmenting es audio files """ The above exception was the direct cause of the following exception: Traceback (most recent call last): Another curious aspect is that, although the D4TE28-L7FI is unavailable (i.e., it was not download), there are audio segments for this sample. How is this possible? Thanks in advance, David. |
Hi @david-gimeno , Thanks for posting the full error trace! This issue occurred because there is a mismatch between the actual video frames and the downloaded metadata found at # update your source code
$ git pull
# re-run your script
$ python get_data.py --root-path /home/david/phd --src-lang es #should resume where it stopped This issue has nothing to do with the file Also, the audio files Hope that fixes your issue! |
Thank so much! All the Spanish MuAViC database has been processed :) But just only one question more: Regarding the transcripts in On the other hand, I would like to tell you something, it is just a suggestion. According to my experience, instead of saving the video samples as .mp4, using .npz compressed files (using the numpy library) is very efficient in terms of storage or when creating data loaders for training models.
being Anyway, thank you again for your time. Best regards, David. |
Hi David, Glad that everything is working now! Regarding your question, the answer is "Yes". Transcripts follow the same order as manifest files for AVSR and AVST. And thank you for your suggestion, my team and I will definitely take it into consideration. I'm gonna close this issue for now if you don't mind. Thanks! |
I was downloading the MuAViC database for the Spanish language when suddenly a error message appeared when segmenting videos. It seems that the video with ID=D4TE28-L7FI is not available anymore. Do you have a backup of the database for these cases? In addition, the script was interrupted, I consider that it should not happen.
Best regards,
David.
The text was updated successfully, but these errors were encountered: