web-master/packages/node-web-fetch at master · saltyshiomix/web-master

Name	Name	Last commit message	Last commit date
parent directory ..
lib	lib
LICENSE	LICENSE
README.md	README.md
index.d.ts	index.d.ts
index.js	index.js
package-lock.json	package-lock.json
package.json	package.json
tsconfig.build.json	tsconfig.build.json
tsconfig.json	tsconfig.json

😎 @web-master/node-web-fetch 😎

Fetch web data as easy as possible

Description

It is the combination of @web-master/node-web-crawler and @web-master/node-web-scraper.

It can:

FETCH
- SCRAPE
  - It scrapes the specific page
  - It gathers data from the page according to the ScrapeConfig
- CRAWL
  - It scrapes the specific page and gathers links
  - It crawls the links and scrapes each page of the link
  - It gathers data from each page according to CrawlConfig

Installation

$ npm install --save @web-master/node-web-fetch

Usage

Single Page Scraping

Basic

import fetch from '@web-master/node-web-fetch';

const data = await fetch({
  target: 'http://example.com',
  fetch: {
    title: 'h1',
    info: {
      selector: 'p > a',
      attr: 'href',
    },
  },
});

console.log(data);
// {
//   title: 'Example Domain',
//   info: 'http://www.iana.org/domains/example'
// }

Waitable (by using `puppeteer`)

import fetch from '@web-master/node-web-fetch';

const data = await fetch({
  target: 'http://example.com',
  waitFor: 3 * 1000, // wait for the content loaded! (like single page apps)
  fetch: {
    title: 'h1',
    info: {
      selector: 'p > a',
      attr: 'href',
    },
  },
});

console.log(data);
// {
//   title: 'Example Domain',
//   info: 'http://www.iana.org/domains/example'
// }

Multi Pages Crawling

You Know the target urls already

import fetch from '@web-master/node-web-fetch';

const pages = await fetch({
  target: [
    'https://example1.com',
    'https://example2.com',
    'https://example3.com',
  ],
  fetch: () => ({
    title: 'h1',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

You Don't Know the Target Urls so Want to Crawl Dynamically

import fetch from '@web-master/node-web-fetch';

const pages = await fetch({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

Waitable (by using `puppeteer`)

import fetch from '@web-master/node-web-fetch';

const pages = await fetch({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  waitFor: 3 * 1000, // wait for the content loaded! (like single page apps)
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

TypeScript Support

import fetch from '@web-master/node-web-fetch';

interface HackerNewsPage {
  title: string;
}

const pages: HackerNewsPage[] = await fetch({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node-web-fetch

node-web-fetch

lib

lib

LICENSE

LICENSE

README.md

README.md

index.d.ts

index.d.ts

index.js

index.js

package-lock.json

package-lock.json

package.json

package.json

tsconfig.build.json

tsconfig.build.json

tsconfig.json

tsconfig.json

README.md

Description

Installation

Usage

Single Page Scraping

Basic

Waitable (by using `puppeteer`)

Multi Pages Crawling

You Know the target urls already

You Don't Know the Target Urls so Want to Crawl Dynamically

Waitable (by using `puppeteer`)

TypeScript Support

Related

Files

node-web-fetch

Directory actions

More options

Directory actions

More options

Latest commit

History

node-web-fetch

Folders and files

parent directory

Description

Installation

Usage

Single Page Scraping

Basic

Waitable (by using puppeteer)

Multi Pages Crawling

You Know the target urls already

You Don't Know the Target Urls so Want to Crawl Dynamically

Waitable (by using puppeteer)

TypeScript Support

Related

Waitable (by using `puppeteer`)

Waitable (by using `puppeteer`)