Skip to content

Commit

Permalink
Merge 26a13e4 into 0e80ba3
Browse files Browse the repository at this point in the history
  • Loading branch information
ndaidong committed Nov 13, 2022
2 parents 0e80ba3 + 26a13e4 commit 85a09c5
Show file tree
Hide file tree
Showing 8 changed files with 117 additions and 62 deletions.
5 changes: 2 additions & 3 deletions .github/workflows/ci-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ on: [push, pull_request]
jobs:
test:

runs-on: ubuntu-20.04
runs-on: ubuntu-22.04

strategy:
matrix:
node_version: [14.x, 15.x, 16.x, 17.x, 18.x]
node_version: [14.x, 15.x, 16.x, 17.x, 18.x, 19.x]

steps:
- uses: actions/checkout@v2
Expand All @@ -24,7 +24,6 @@ jobs:

- name: run npm scripts
run: |
npm i -g standard
npm install
npm run lint
npm run build --if-present
Expand Down
26 changes: 25 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,35 @@ Extract main article, main image and meta data from URL.

[![Deploy](https://button.deta.dev/1/svg)](https://go.deta.dev/deploy?repo=https://github.com/ndaidong/article-parser-deta)


## Intro

*article-parser* is a part of tool sets for content builder:

- [feed-reader](https://github.com/ndaidong/feed-reader): extract & normalize RSS/ATOM/JSON feed
- [article-parser](https://github.com/ndaidong/article-parser): extract main article from given URL
- [oembed-parser](https://github.com/ndaidong/oembed-parser): extract oEmbed data from supported providers

You can use one or combination of these tools to build news sites, create automated content systems for marketing campaign or gather dataset for NLP projects...

```
┌────────────────┐
┌───────► article-parser ├──────────┐
│ └────────────────┘ │
┌─────────────┐ ┌─────────┴────┐ ┌────────▼─────────┐ ┌─────────────┐
│ feed-reader ├───► feed entries │ │ content database ├───► public APIs │
└─────────────┘ └─────────┬────┘ └────────▲─────────┘ └─────────────┘
│ ┌────────────────┐ │
└───────► oembed-parser ├──────────┘
└────────────────┘
```

## Demo

- [Give it a try!](https://demos.pwshub.com/article-parser)
- [Example FaaS](https://extract-article.deta.dev/?url=https://www.freethink.com/technology/virtual-world)


## Install & Usage

### Node.js
Expand Down Expand Up @@ -215,7 +239,7 @@ Basically, the meaning of `transformation` can be interpreted like this:

Here is an example transformation:

```ts
```js
{
patterns: [
/([\w]+.)?domain.tld\/*/,
Expand Down
17 changes: 17 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Security Policy

## Supported Versions

Due to resource limitations, only the latest stable minor release is getting bugfixes (including security ones).

So e.g. if the latest stable version is 7.2.5, then 7.2.x line will still get security fixes but older versions (like 7.1.x) won't get any fixes.

Description above is a general rule and may be altered on case by case basis.

## Reporting a Vulnerability

You can report low severity vulnerabilities as GitHub issues.

More severe vulnerabilities should be reported to the email ndaidong@pwshub.com.

---
34 changes: 17 additions & 17 deletions dist/article-parser.esm.js

Large diffs are not rendered by default.

64 changes: 32 additions & 32 deletions dist/cjs/article-parser.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion dist/cjs/package.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "article-parser",
"version": "7.2.4",
"version": "7.2.5",
"main": "./article-parser.js"
}
15 changes: 8 additions & 7 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"version": "7.2.4",
"version": "7.2.5",
"name": "article-parser",
"description": "To extract main article from given URL",
"homepage": "https://demos.pwshub.com/article-parser",
Expand Down Expand Up @@ -33,10 +33,10 @@
},
"dependencies": {
"@mozilla/readability": "^0.4.2",
"bellajs": "^11.0.7",
"bellajs": "^11.1.1",
"cross-fetch": "^3.1.5",
"linkedom": "^0.14.16",
"sanitize-html": "^2.7.2",
"linkedom": "^0.14.19",
"sanitize-html": "^2.7.3",
"string-similarity": "^4.0.4"
},
"standard": {
Expand All @@ -47,9 +47,10 @@
},
"devDependencies": {
"@types/sanitize-html": "^2.6.2",
"esbuild": "^0.15.9",
"jest": "^29.0.3",
"nock": "^13.2.9"
"esbuild": "^0.15.13",
"jest": "^29.3.1",
"nock": "^13.2.9",
"standard": "^17.0.0"
},
"keywords": [
"article",
Expand Down
16 changes: 15 additions & 1 deletion src/utils/extractMetaData.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,10 @@ export default (html) => {
'twitter:description'
]
const imageAttrs = [
'image',
'og:image',
'og:image:url',
'og:image:secure_url',
'twitter:image',
'twitter:image:src'
]
Expand All @@ -57,7 +60,18 @@ export default (html) => {
'article:published_time',
'article:modified_time',
'og:updated_time',
'datepublished'
'dc.date',
'dc.date.issued',
'dc.date.created',
'dc:created',
'dcterms.date',
'datepublished',
'datemodified',
'updated_time',
'modified_time',
'published_time',
'release_date',
'date'
]

const document = new DOMParser().parseFromString(html, 'text/html')
Expand Down

0 comments on commit 85a09c5

Please sign in to comment.