Skip to content


Get YouTube data from API, not page scrape? #376

morbus opened this Issue · 4 comments

2 participants


G'day. I've been working on some local scripts to archive YouTube videos, and part of that is adding in metadata to the downloaded mp4 files from youtube-dl. After noticing that the description was coming out truncated (caused by me missing lxml), I'm now noticing that youtube-dl only knows the day (YYYY-MM-DD) and not the time the video was uploaded. This is through no fault of the script, however, as there is actually no complete timestamp on the video page itself.

The complete timestamp DOES seem to appear in the API though:

I'm not a Python coder, but I could probably figure out to get this support in, but the real question is: would you be OK with youtube-dl making an additional request per video URL? (If not, that's fine too, as I'd just write a custom API script to fetch the data and ignore the --write-info-json option.)


By using the YouTube API you are agreeing to its terms and conditions. See the "Terms" link from the top bar at this page:

Which, as of the time I'm writing this takes you to:

I'm worried about point number 5, "Caching". The YouTube API must be avoided if possible, IMHO. We should follow the steps a web browser does to avoid legal problems as much as possible.

Edit: also, in the prohibitions section, points 9, 11, etc.


@rg3: Fair enough and certainly acceptable. But, given that we're faking the user-agent for direct access, and that the data URL above doesn't require an API key or a specific client identification label, it'd be relatively hard for them to "tell" that youtube-dl is abusing the API (anymore than they can reliably tell that youtube-dl is downloading videos).

Still, I can understand the reticence.


The thing about the YouTube API terms and conditions is that they do not apply to the user of the application, but to the creator of the application.

BTW, I know there are a couple of InfoExtractors that use it, but I only recently found out about the terms and conditions for the API. :(


I'll close this. I ended up creating a cheap metadata-script which is innocent enough to include herein:

for FILEPATH in *
  FILENAME=$(basename "$FILEPATH")

  if [ $YOUTUBE_ID != "*" ]
    DATA=`curl -s$YOUTUBE_ID?v=2`
    PUBLISHED=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->published;' | sed 's/\..*Z/Z/'`
    AUTHOR=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->author->name;'`
    TITLE=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->title;'`
    DESCRIPTION=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->description;'`

    if [ "$1" ]

    AtomicParsley "$FILEPATH" --overWrite --podcastFlag true --artist "$AUTHOR" --title "$TITLE" --album "$ALBUM" --year "$PUBLISHED" --comment "$COMMENT" --description "$DESCRIPTION"
@morbus morbus closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.