Article and oEmbed extractor for Node.js
Clone or download
Latest commit e58d96c May 16, 2018
Permalink
Failed to load latest commit information.
bin v2.3.4 Nov 30, 2017
src v2.3.6 Apr 29, 2018
test v2.3.6 Apr 29, 2018
.eslintignore v0.5.10 Jun 14, 2016
.eslintrc.json v2.3.4 Nov 30, 2017
.gitignore v1.6.31 Jun 6, 2017
.travis.yml v3.1.7 May 16, 2018
LICENSE Update license to MIT Mar 18, 2016
README.md v3.1.7 May 16, 2018
index.js v2.3.4 Nov 30, 2017
package.json v3.1.7 May 16, 2018
reset.js v2.3.4 Nov 30, 2017

README.md

article-parser

Extract main article, main image and meta data from URL.

NPM Build Status codecov Dependency Status NSP Status

Usage

npm install article-parser

Then:

var {
  extract
} = require('article-parser');

let url = 'https://goo.gl/MV8Tkh';

extract(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

APIs

configure(Object conf)

{
  fetchOptions: Object,
  wordsPerMinute: Number,
  htmlRules: Object,
  SoundCloudKey: String,
  YouTubeKey: String,
  EmbedlyKey: String
}
  • fetchOptions: Object, simple version of node-fetch options. Only headers, timeout and agent are available here.
  • wordsPerMinute: Number, default 300, use to estimate time to read
  • htmlRules: Object, options to to clean HTML with sanitize-html
  • SoundCloudKey: String, use to get audio duration. Get it here.
  • YouTubeKey: String, use to get video duration. Get it here.
  • EmbedlyKey: String, use to extract with Embedly API. Refer here.

Default configurations may work for most case.

extract(String url)

Extract article data from specified url.

var {
  extract
} = require('article-parser');

let url = 'https://www.youtube.com/watch?v=tRGJj59G1x4';

extract(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

Now article would be something like this:

{
  title: 'Zato ESB - Test demo hosted on company server',
  alias: 'zato-esb-test-demo-hosted-on-company-server-1500021746537-PAQXw8IYcU',
  url: 'https://www.youtube.com/watch?v=tRGJj59G1x4',
  canonicals:
   [ 'https://www.youtube.com/watch?v=tRGJj59G1x4',
     'https://youtu.be/tRGJj59G1x4',
     'https://www.youtube.com/v/tRGJj59G1x4',
     'https://www.youtube.com/embed/tRGJj59G1x4' ],
  description: 'Our sample: https://github.com/greenglobal/zato-demo Zato homepage: https://zato.io Tutorial: "Zato — a powerful Python-based ESB solution for your SOA" http...',
  content: '<iframe src="https://www.youtube.com/embed/tRGJj59G1x4?feature=oembed" frameborder="0" allowfullscreen></iframe>',
  image: 'https://i.ytimg.com/vi/tRGJj59G1x4/hqdefault.jpg',
  author: 'Dong Nguyen',
  source: 'YouTube',
  domain: 'youtube.com',
  publishedTime: '',
  duration: 292
}

extractWithEmbedly(String url [, String EmbedlyKey])

Extract article data from specified url using Embedly Extract API:

The second parameter is optional. If you've added your Embedly key via configure() method, you can ignore it here.

var {
  extractWithEmbedly
} = require('article-parser');

let url = 'https://goo.gl/MV8Tkh';

extractWithEmbedly(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

getConfig()

Return the current configurations.

Test

git clone https://github.com/ndaidong/article-parser.git
cd article-parser
npm install
npm test

License

The MIT License (MIT)