Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Get YouTube data from API, not page scrape? #376

Closed
morbus opened this Issue · 4 comments

2 participants

@morbus

G'day. I've been working on some local scripts to archive YouTube videos, and part of that is adding in metadata to the downloaded mp4 files from youtube-dl. After noticing that the description was coming out truncated (caused by me missing lxml), I'm now noticing that youtube-dl only knows the day (YYYY-MM-DD) and not the time the video was uploaded. This is through no fault of the script, however, as there is actually no complete timestamp on the video page itself.

The complete timestamp DOES seem to appear in the API though:
https://gdata.youtube.com/feeds/api/videos/WN5IzJ99Qys?v=2

I'm not a Python coder, but I could probably figure out to get this support in, but the real question is: would you be OK with youtube-dl making an additional request per video URL? (If not, that's fine too, as I'd just write a custom API script to fetch the data and ignore the --write-info-json option.)

@rg3
Owner

By using the YouTube API you are agreeing to its terms and conditions. See the "Terms" link from the top bar at this page:

http://gdata.youtube.com/demo/index.html

Which, as of the time I'm writing this takes you to:

https://developers.google.com/youtube/terms

I'm worried about point number 5, "Caching". The YouTube API must be avoided if possible, IMHO. We should follow the steps a web browser does to avoid legal problems as much as possible.

Edit: also, in the prohibitions section, points 9, 11, etc.

@morbus

@rg3: Fair enough and certainly acceptable. But, given that we're faking the user-agent for direct access, and that the data URL above doesn't require an API key or a specific client identification label, it'd be relatively hard for them to "tell" that youtube-dl is abusing the API (anymore than they can reliably tell that youtube-dl is downloading videos).

Still, I can understand the reticence.

@rg3
Owner

The thing about the YouTube API terms and conditions is that they do not apply to the user of the application, but to the creator of the application.

BTW, I know there are a couple of InfoExtractors that use it, but I only recently found out about the terms and conditions for the API. :(

@morbus

I'll close this. I ended up creating a cheap metadata-script which is innocent enough to include herein:

for FILEPATH in *
do
  FILENAME=$(basename "$FILEPATH")
  EXTENSION="${FILENAME##*.}"
  YOUTUBE_ID="${FILENAME%.*}"

  if [ $YOUTUBE_ID != "*" ]
  then
    DATA=`curl -s https://gdata.youtube.com/feeds/api/videos/$YOUTUBE_ID?v=2`
    PUBLISHED=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->published;' | sed 's/\..*Z/Z/'`
    AUTHOR=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->author->name;'`
    TITLE=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->title;'`
    DESCRIPTION=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->description;'`
    COMMENT="http://www.youtube.com/watch?v=$YOUTUBE_ID"
    ALBUM=$AUTHOR

    if [ "$1" ]
    then
      TITLE="$AUTHOR: $TITLE"
      AUTHOR=$1
      ALBUM=$1
    fi

    AtomicParsley "$FILEPATH" --overWrite --podcastFlag true --artist "$AUTHOR" --title "$TITLE" --album "$ALBUM" --year "$PUBLISHED" --comment "$COMMENT" --description "$DESCRIPTION"
  fi
done
@morbus morbus closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.