-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't grab all pages if download cap is enabled #48
Comments
While this would be nice I don't think it's possible. If it is possible it would require a lot more hacking into |
Yep looks like I need to add a PR to youtube-dl. |
Dateafter shouldn’t download pages outside of the range. |
While that upstream issue if added might help the logs, it likely won't stop the requirement that TubeSync will still need to index all YouTube video IDs in a playlist each time it does an index as they are not assured to be chronologically returned by YouTube when it gets crawled. The initial requirement with TubeSync is to "find all new video IDs" which still means indexing entire channels and playlists. This flag, if implemented upstream in |
I hate to suggest a major refactor, but I wanted to give some ideas that might help with this problem. Using youtube-dl to generate an index of all the videos and store them in a database to slowly download them does make sense, but when looking for new videos, tubesync seems to be configured to redownload the entire index of videos again to look for updates. A more efficient method to look for new videos would be to use the integrated YouTube RSS feeds. They're always ordered by "published" date https://www.youtube.com/feeds/videos.xml?channel_id=someidhere Adding an RSS/XML parser to the system might be a slight hassle, but it would significantly reduce the risk of youtube getting mad at excessive page indexing. |
Cheers for the suggestion! I had noticed the RSS feeds, but compared to the current Additionally, unless I'm blind, I can't see any way to get more than the most recent 14 or so videos via RSS (there's no Also I assume if a channel added > 14 videos between indexing it would have to fall back to the current way as well, which I guess is pretty unlikely but no doubt someone will find a channel that does this and trigger an edge case of missing content. Using the feeds could shave off a few requests per day, but not enough to likely solve issues for anyone experiencing 429 rate limiting issues, for which I'll probably have to just add in some 60 second delay between metadata requests to pad requests out for newly added channels or similar if people keep experiencing problems. I'll add it onto the future roadmap as a possible feature as using the feeds would be nicer to keep channels updated with new content. It won't replace anything too significant internally and it's also not that much work really, just use a different indexer once already indexed at least once. It wouldn't require any massive internal reworking. |
On initial index if "Download cap" is set then pages should only be fetched until it hits that the cap instead of fetching every single page of videos.
I'd like to avoid seeing this over and over in my logs if possible.
The text was updated successfully, but these errors were encountered: