Skip to content

Commit face7e1

Browse files
authored
Add metascraper-audio (#113)
1 parent 531093b commit face7e1

File tree

21 files changed

+385
-11478
lines changed

21 files changed

+385
-11478
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,9 @@ Here is an example of the metadata that **metascraper** can collect:
103103

104104
- `description` — eg. *Venture capitalists are raising money at the fastest rate...*<br/>
105105
The publisher's chosen description of the article.
106+
107+
- `audio` — eg. *https://cf-media.sndcdn.com/U78RIfDPV6ok.128.mp3*<br/>
108+
A audio URL that best represents the article.
106109

107110
- `video` — eg. *https://assets.entrepreneur.com/content/preview.mp4*<br/>
108111
A video URL that best represents the article.
@@ -185,6 +188,7 @@ const metascraper = require('metascraper')([
185188
| [`metascraper-author`](https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-author) | [![npm](https://img.shields.io/npm/v/metascraper-author.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-author) | [![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-author&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-author) |
186189
| [`metascraper-date`](https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-date) | [![npm](https://img.shields.io/npm/v/metascraper-date.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-date) | [![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-date&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-date) |
187190
| [`metascraper-description`](https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-description) | [![npm](https://img.shields.io/npm/v/metascraper-description.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-description) | [![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-description&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-description) |
191+
| [`metascraper-audio`](https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-audio) | [![npm](https://img.shields.io/npm/v/metascraper-audio.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-audio) | [![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-audio&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-audio) |
188192
| [`metascraper-video`](https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-video) | [![npm](https://img.shields.io/npm/v/metascraper-video.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-video) | [![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-video&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-video) |
189193
| [`metascraper-image`](https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-image) | [![npm](https://img.shields.io/npm/v/metascraper-image.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-image) | [![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-image&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-image) |
190194
| [`metascraper-logo`](https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-logo) | [![npm](https://img.shields.io/npm/v/metascraper-logo.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-logo) | [![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-logo&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-logo) |

packages/metascraper-audio/.npmrc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
unsafe-perm=true
2+
save-prefix=~
3+
shrinkwrap=false
4+
save=false

packages/metascraper-audio/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# metascraper-audio
2+
3+
[![npm](https://img.shields.io/npm/v/metascraper-audio.svg?style=flat-square)](https://www.npmjs.com/package/metascraper-audio)
4+
[![Dependency Status](https://david-dm.org/microlinkhq/metascraper.svg?path=packages/metascraper-audio&style=flat-square)](https://david-dm.org/microlinkhq/metascraper?path=packages/metascraper-audio)
5+
6+
> Get audio property from HTML markup.
7+
8+
## Install
9+
10+
```bash
11+
$ npm install metascraper-audio --save
12+
```
13+
14+
## License
15+
16+
**metascraper-audio** © [microlink.io](https://microlink.io), Released under the [MIT](https://github.com/microlinkhq/metascraper-audio/blob/master/LICENSE.md) License.<br>
17+
Authored and maintained by microlink.io with help from [contributors](https://github.com/microlinkhq/metascraper-audio/contributors).
18+
19+
> [microlink.io](https://microlink.io) · GitHub [@microlink.io](https://github.com/microlinkhq) · Twitter [@microlinkhq](https://twitter.com/microlinkhq)
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
exports['og:audio 1'] = {
2+
"audio": "https://browserless.js.org/static/demo.mp3"
3+
}
4+
5+
exports['og:audio:secure_url 1'] = {
6+
"audio": "https://browserless.js.org/static/demo.mp3"
7+
}
8+
9+
exports['twitter:player:stream 1'] = {
10+
"audio": "https://browserless.js.org/static/demo.mp3"
11+
}
12+
13+
exports['audio:src 1'] = {
14+
"audio": "https://browserless.js.org/static/demo.mp3"
15+
}
16+
17+
exports['audio:source:src 1'] = {
18+
"audio": "https://browserless.js.org/static/demo.mp3"
19+
}
20+

packages/metascraper-audio/index.js

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
'use strict'
2+
3+
const { isMime, url: urlFn, isAudioUrl } = require('@metascraper/helpers')
4+
5+
/**
6+
* Wrap a rule with validation and formatting logic.
7+
*
8+
* @param {Function} rule
9+
* @return {Function} wrapped
10+
*/
11+
12+
const createWrapper = fn => rule => ({ htmlDom, url }) => {
13+
const value = rule(htmlDom)
14+
return fn(value, url)
15+
}
16+
17+
const wrapAudio = createWrapper((value, url) => {
18+
const urlValue = urlFn(value, { url })
19+
return isAudioUrl(urlValue) && urlValue
20+
})
21+
22+
const withContentType = (url, contentType) =>
23+
isMime(contentType, 'audio') ? url : false
24+
25+
/**
26+
* Rules.
27+
*/
28+
module.exports = () => ({
29+
audio: [
30+
wrapAudio($ => $('meta[property="og:audio:secure_url"]').attr('content')),
31+
wrapAudio($ => $('meta[property="og:audio"]').attr('content')),
32+
wrapAudio($ => {
33+
const contentType = $(
34+
'meta[property="twitter:player:stream:content_type"]'
35+
).attr('content')
36+
const streamUrl = $('meta[property="twitter:player:stream"]').attr(
37+
'content'
38+
)
39+
return contentType ? withContentType(streamUrl, contentType) : streamUrl
40+
}),
41+
wrapAudio($ => $('audio').attr('src')),
42+
wrapAudio($ => $('audio > source').attr('src'))
43+
]
44+
})
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
{
2+
"name": "metascraper-audio",
3+
"description": "Get audio property from HTML markup",
4+
"homepage": "https://metascraper.js.org",
5+
"version": "0.0.0",
6+
"main": "index.js",
7+
"author": {
8+
"email": "ian@ianstormtaylor.com",
9+
"name": "Ian Storm Taylor"
10+
},
11+
"repository": {
12+
"type": "git",
13+
"url": "https://github.com/microlinkhq/metascraper/tree/master/packages/metascraper-audio"
14+
},
15+
"bugs": {
16+
"url": "https://github.com/microlinkhq/metascraper/issues"
17+
},
18+
"dependencies": {
19+
"@metascraper/helpers": "^4.0.1"
20+
},
21+
"devDependencies": {
22+
"lodash": "latest",
23+
"mocha": "latest",
24+
"nyc": "latest",
25+
"should": "latest",
26+
"snap-shot": "latest",
27+
"standard": "11"
28+
},
29+
"engines": {
30+
"node": ">= 8"
31+
},
32+
"files": [
33+
"index.js"
34+
],
35+
"scripts": {
36+
"test": "NODE_PATH=.. TZ=UTC NODE_ENV=test nyc mocha test"
37+
},
38+
"license": "MIT",
39+
"peerDependencies": {
40+
"metascraper": "^4"
41+
},
42+
"standard": {
43+
"env": [
44+
"mocha"
45+
]
46+
}
47+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
'use strict'
2+
3+
const snapshot = require('snap-shot')
4+
5+
const metascraper = require('metascraper')([require('metascraper-audio')()])
6+
7+
describe('metascraper-audio', () => {
8+
it('og:audio', async () => {
9+
const html = `<meta property="og:audio" content="https://browserless.js.org/static/demo.mp3">`
10+
const url = 'https://browserless.js.org'
11+
const metadata = await metascraper({ html, url })
12+
snapshot(metadata)
13+
})
14+
15+
it('og:audio:secure_url', async () => {
16+
const html = `<meta property="og:audio:secure_url" content="https://browserless.js.org/static/demo.mp3">`
17+
const url = 'https://browserless.js.org'
18+
const metadata = await metascraper({ html, url })
19+
snapshot(metadata)
20+
})
21+
22+
it('twitter:player:stream', async () => {
23+
const html = `<meta property="twitter:player:stream" content="https://browserless.js.org/static/demo.mp3">`
24+
const url = 'https://browserless.js.org'
25+
const metadata = await metascraper({ html, url })
26+
snapshot(metadata)
27+
})
28+
29+
it('audio:src', async () => {
30+
const html = `<audio src="https://browserless.js.org/static/demo.mp3">`
31+
const url = 'https://browserless.js.org'
32+
const metadata = await metascraper({ html, url })
33+
snapshot(metadata)
34+
})
35+
36+
it('audio:source:src', async () => {
37+
const html = `<audio><source src="https://browserless.js.org/static/demo.mp3"></source></audio>`
38+
const url = 'https://browserless.js.org'
39+
const metadata = await metascraper({ html, url })
40+
snapshot(metadata)
41+
})
42+
})
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
--require should
2+
--reporter spec
3+
--timeout 120000
4+
--slow 300
5+
--bail
6+
--recursive

packages/metascraper-helpers/index.js

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,29 @@ const {
1313
} = require('lodash')
1414

1515
const imageExtensions = difference(require('image-extensions'), ['gif'])
16+
const audioExtensions = difference(require('audio-extensions'), ['mp4'])
1617
const videoExtensions = union(require('video-extensions'), ['gif'])
18+
const langs = require('iso-639-3').map(({ iso6391 }) => iso6391)
1719
const condenseWhitespace = require('condense-whitespace')
18-
const audioExtensions = require('audio-extensions')
20+
const urlRegex = require('url-regex')({ exact: true })
1921
const isRelativeUrl = require('is-relative-url')
2022
const fileExtension = require('file-extension')
2123
const { resolve: resolveUrl } = require('url')
2224
const _normalizeUrl = require('normalize-url')
2325
const smartquotes = require('smartquotes')
26+
const mimeTypes = require('mime-types')
2427
const chrono = require('chrono-node')
25-
const urlRegex = require('url-regex')({ exact: true })
2628
const isIso = require('isostring')
2729
const toTitle = require('title')
30+
2831
const { URL } = require('url')
2932

33+
const MIMES_EXTENSIONS = {
34+
audio: audioExtensions,
35+
video: videoExtensions,
36+
image: imageExtensions
37+
}
38+
3039
const REGEX_BY = /^[\s\n]*by|@[\s\n]*/i
3140

3241
const REGEX_LOCATION = /^[A-Z\s]+\s+[-]\s+/
@@ -83,12 +92,21 @@ const protocol = url => {
8392
const createUrlExtensionValidator = collection => url =>
8493
isUrl(url) && includes(collection, extension(url))
8594

95+
const createExtensionValidator = collection => url =>
96+
includes(collection, extension(url))
97+
8698
const isVideoUrl = createUrlExtensionValidator(videoExtensions)
8799

88100
const isAudioUrl = createUrlExtensionValidator(audioExtensions)
89101

90102
const isImageUrl = createUrlExtensionValidator(imageExtensions)
91103

104+
const isVideoExtension = createExtensionValidator(videoExtensions)
105+
106+
const isAudioExtension = createExtensionValidator(audioExtensions)
107+
108+
const isImageExtension = createExtensionValidator(imageExtensions)
109+
92110
const extension = url => fileExtension(url).split('?')[0]
93111

94112
const description = value => isString(value) && getDescription(value)
@@ -120,10 +138,21 @@ const date = value => {
120138
if (parsed) return parsed.toISOString()
121139
}
122140

123-
const lang = value => isString(value) && toLower(value.substring(0, 2))
141+
const lang = value => {
142+
if (isEmpty(value)) return false
143+
const lang = toLower(value.trim().substring(0, 2))
144+
const isLang = includes(langs, lang)
145+
return isLang ? lang : false
146+
}
124147

125148
const title = value => isString(value) && titleize(value)
126149

150+
const isMime = (type, mime) => {
151+
const extension = mimeTypes.extension(type)
152+
const collection = MIMES_EXTENSIONS[extension]
153+
return includes(collection, mime)
154+
}
155+
127156
module.exports = {
128157
author,
129158
title,
@@ -139,8 +168,12 @@ module.exports = {
139168
protocol,
140169
publisher,
141170
normalizeUrl,
171+
isMime,
142172
isUrl,
143173
isVideoUrl,
144174
isAudioUrl,
145-
isImageUrl
175+
isImageUrl,
176+
isVideoExtension,
177+
isAudioExtension,
178+
isImageExtension
146179
}

packages/metascraper-helpers/package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,10 @@
2222
"file-extension": "~4.0.5",
2323
"image-extensions": "~1.1.0",
2424
"is-relative-url": "~2.0.0",
25+
"iso-639-3": "~1.1.0",
2526
"isostring": "0.0.1",
2627
"lodash": "~4.17.10",
28+
"mime-types": "~2.1.20",
2729
"normalize-url": "~3.3.0",
2830
"smartquotes": "~2.3.1",
2931
"title": "~3.3.2",

0 commit comments

Comments
 (0)