Skip to content
This repository has been archived by the owner on Dec 3, 2023. It is now read-only.

Add bitrate information to format output #59

Closed
wants to merge 1 commit into from
Closed

Add bitrate information to format output #59

wants to merge 1 commit into from

Conversation

tifroz
Copy link
Contributor

@tifroz tifroz commented Jan 30, 2015

The PR adds a bitrate field to the format output when the data is available. An example output would look like this as a result:

{ itag: 34, filetype: 'flv', resolution: '360x640' }
{ itag: 18, filetype: 'mp4', resolution: '360x640', bitrate: '1127k' }
{ itag: 43, filetype: 'webm', resolution: '360x640' }
{ itag: 5, filetype: 'flv', resolution: '240x400' }
{ itag: 17, filetype: 'mp4', resolution: '144x176', bitrate: '113k' }

@@ -234,7 +234,9 @@ ytdl.getInfo = function(url, args, options, callback) {


var formatsRegex =
/^(\d+)\s+([a-z0-9]+)\s+(\d+x\d+|\d+p|audio only|\?x\d+|unknown)/;
/^(\d+)\s+([a-z0-9]+)\s+(\d+x\d+|\d+p|audio only|\?x\d+|unknown)(.*)$/;
var extendedRegex =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be put into the first regex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe it would be better to split the rest of the string by commas, and parse all of those fields not just the bitrate

@tifroz
Copy link
Contributor Author

tifroz commented Jan 30, 2015

So, yes and yes on both comments, however here are the reasons I did it the way I did it:

  • Need to split because as you parse additional fields, a single regex would become difficult to read & debug.
  • I have not parsed any additional fields for now because I don't fully understand the columns structure from the raw output (in particular entry 278 seems to have extra columns with the data webm container, VP9). I could not find any doc that would explain to my satisfaction so I elected to make my regex conservative and only capture the part I understand & need.

...this said, if you are more knowledgeable about the ytdl output and can point me to the doc that describes the columns structure, I would be happy to expand on my initial effort and parse additional fields. The ideal way to proceed would be to split on , and then parse each column individually I think

Example raw output:

[ '[youtube] 0k2Zzkw_-0I: Downloading webpage',
  '[youtube] 0k2Zzkw_-0I: Extracting video information',
  '[youtube] 0k2Zzkw_-0I: Downloading DASH manifest',
  '[info] Available formats for 0k2Zzkw_-0I:',
  'format code extension resolution  note ',
  '140         m4a       audio only  DASH audio  128k , aac  @128k (44100Hz), 4.44M... (length: 90)',
  '171         webm      audio only  DASH audio  185k , audio@128k (44100Hz), 5.36M... (length: 82)',
  '141         m4a       audio only  DASH audio  255k , aac  @256k (44100Hz), 8.90M... (length: 82)',
  '278         webm      192x144     DASH video  117k , webm container, VP9, 15fps,... (length: 100)',
  '160         mp4       192x144     DASH video  120k , 15fps, video only, 3.90MiB',
  '242         webm      320x240     DASH video  207k , 30fps, video only, 5.96MiB',
  '133         mp4       320x240     DASH video  248k , 30fps, video only, 8.53MiB',
  '243         webm      480x360     DASH video  358k , 30fps, video only, 10.21MiB',
  '134         mp4       480x360     DASH video  605k , 30fps, video only, 13.12MiB',
  '244         webm      640x480     DASH video  792k , 30fps, video only, 20.51MiB',
  '135         mp4       640x480     DASH video 1105k , 30fps, video only, 26.55MiB',
  '17          3gp       176x144     ',
  '36          3gp       320x240     ',
  '5           flv       400x240     ',
  '43          webm      640x360     ',
  '18          mp4       640x360     (best)' ]

@fent
Copy link
Collaborator

fent commented Jan 31, 2015

Can't find documentation on the output for this. From the output above, it looks like sometimes it has extra info, sometimes it doesn't, and when it does the number of fields is not constant. But it looks like we would be able to get the bitrate, fps, and size of the format.

@tifroz
Copy link
Contributor Author

tifroz commented Jan 31, 2015

Happy to add the format (e.g. DASH video, DASH audio), fps (videos only) in addition to the bitrate. Not sure what you are referring to with the size of the format though?

@fent
Copy link
Collaborator

fent commented Jan 31, 2015

Isn't the rightmost part the size? ie 26.55MiB for 135

@tifroz
Copy link
Contributor Author

tifroz commented Feb 1, 2015

I coded the parsing of the extra fields & I ran into a bug when testing. Happy to fix it in this PR however it would change the existing data structure returned by getFormat() because the itag would need to be a string (currently an integer)
I believe this is the field used by youtube-dl to id a particular version of a video (value used with the --format option), so it's likely people are using this value.

To avoid any regression, I think the best would be to keep the itag unchanged for now (but deprecate it), and add a new field of type string (call it tag?). The field tag would always be present but itag would only be present when the value can be parsed into an integer (youtube).

I will let you weigh-in before I make the fix

To reproduce, just try getting the format from http://vimeo.com/91089952 for example. I am creating a new issue for tracking purposes

@fent
Copy link
Collaborator

fent commented Feb 1, 2015

How about naming the field code since the output from youtube-dl refers to it as format code?

@tifroz
Copy link
Contributor Author

tifroz commented Feb 3, 2015

Updated to fix #60 as well.

Note that the regex is still split into multiple patterns, primarily because some attributes don't always show up, and the code is much more readable using if (exists) {//...} v. a long regex pattern

filetype : result[2],
resolution : result[3],
});
}
if (/^\d+$/.test(compiledResult.code)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would only be set for youtube videos right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The itag attributes will be set for any video with an integer code value. So the itag would be set for youtube and any of the other ~500+ web sites supported by youtube dl in theory that use integer code values (if any).

@fent
Copy link
Collaborator

fent commented Feb 11, 2015

tests still fail :(

@fent fent mentioned this pull request Feb 11, 2015
@fent
Copy link
Collaborator

fent commented Feb 11, 2015

There's a new PR #62

It uses the --dump-json object which prints the data that both getInfo() and getFormats() would parse. Which would deprecate this PR :(. Sorry for not realizing to use that earlier.

@tifroz
Copy link
Contributor Author

tifroz commented Feb 11, 2015

--dump-json is a much, much better option if it can be made to work.

I played with --dump-json, but couldn't get it to work initially - but I do remember thinking ah! this must be why @fent wrote the getFormats() API. I am glad I was wrong...much happier with the output & stability of the --dump-json option.

Let's close this PR after you accept #62

Not related but I am planning to work on a PR to allow submitting multiple urls at a time. Let me know if you have any thoughts on the topic?

@fent
Copy link
Collaborator

fent commented Feb 12, 2015

when I first made this, that option didn't exist. when it was first added, IIRC it didn't include a downloadable url from the formats.

@fent fent closed this Feb 12, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants