Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Youtube transcript retrieves an empty file o some channels #1593

Closed
brianorca opened this issue Jun 2, 2024 · 1 comment
Closed
Assignees
Labels
investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@brianorca
Copy link

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

I can get transcripts for some channels, such as https://www.youtube.com/watch?v=O5GY7_aVBtk

But other channels fail, such as https://www.youtube.com/watch?v=V9KJ7nvhRWk
This results in a zero-byte file in the AppData\Roaming\anythingllm-desktop\storage\documents[channelname] folder.

I did confirm that Youtube shows a transcript for that video. This seems to be across all videos in a channel.

Are there known steps to reproduce?

Select data connectors-Youtube Trascript in the document upload.
use https://www.youtube.com/watch?v=V9KJ7nvhRWk as the video link.
Check if it found any data in the file.

@brianorca brianorca added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jun 2, 2024
@timothycarambat timothycarambat added the investigating Core team or maintainer will or is currently looking into this issue label Jun 3, 2024
@shatfield4
Copy link
Collaborator

Have you tried this again recently? I have tried both links you provided and I'm able to get the transcript for both videos. I think you may have been rate limited from pulling in too many video transcripts, too quickly. This might happen because we are leveraging a hidden/undocumented API by YouTube in order to pull in all the transcripts (with unknown rate limits).

@timothycarambat timothycarambat closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

No branches or pull requests

3 participants