Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,9 @@ jobs:
if: matrix.os == 'ubuntu-latest'

- name: Run build
run: npx lerna run build --stream -- --platform
run: |
cargo build --release
npx lerna run build --stream -- --platform

- name: Upload artifact
uses: actions/upload-artifact@v2
Expand Down Expand Up @@ -196,6 +198,7 @@ jobs:
shell: bash
- name: Lerna publish
run: |
find ./packages/ -type d -maxdepth 1 -exec cp LICENSE {} \;
echo "//registry.npmjs.org/:_authToken=$NPM_TOKEN" >> ~/.npmrc
npx lerna publish from-package --no-verify-access --yes
env:
Expand Down
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[workspace]
members = [
"./packages/crc32"
"./packages/crc32",
"./packages/jieba"
]
File renamed without changes.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Make rust crates binding to NodeJS use [napi-rs](https://github.com/Brooooooklyn

# Packages

| Package | Status | Description |
| ---------------------------------------------- | ------------------------------------------------------------------- | ------------------------------------------- |
| [`@node-rs/crc32`](./packages/crc32/README.md) | ![](https://github.com/Brooooooklyn/node-rs/workflows/CI/badge.svg) | Fastest `CRC32` implementation using `SIMD` |
| Package | Status | Description |
| ---------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------- |
| [`@node-rs/crc32`](./packages/crc32/README.md) | ![](https://github.com/Brooooooklyn/node-rs/workflows/CI/badge.svg) | Fastest `CRC32` implementation using `SIMD` |
| [`@node-rs/jieba`](./packages/jieba/README,md) | ![](https://github.com/Brooooooklyn/node-rs/workflows/CI/badge.svg) | [`jieba-rs`](https://github.com/messense/jieba-rs) binding |
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"private": true,
"workspaces": ["packages/*"],
"scripts": {
"bench": "lerna run bench --stream --no-prefix",
"bench": "lerna run bench --concurrency 1 --stream --no-prefix",
"build:ts": "tsc -b tsconfig.project.json -verbose",
"lint": "eslint . -c ./.eslintrc.yml 'packages/**/*.{ts,js}'",
"test": "ava",
Expand Down
3 changes: 3 additions & 0 deletions packages/crc32/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,8 @@ napi-rs = { version = "0.2" }
napi-rs-derive = { version = "0.1" }
crc32fast = "1.2"

[target.'cfg(unix)'.dependencies]
jemallocator = { version = "0.3", features = ["disable_initial_exec_tls"] }

[build-dependencies]
napi-build = { version = "0.1" }
16 changes: 8 additions & 8 deletions packages/crc32/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ The 4 tested implementations are:
- **js_crc32c** Javascript implemented CRC-32C
- **js_crc32** Javascript implemented CRC-32 from [buffer-crc32](https://github.com/brianloveswords/buffer-crc32)

# Performance
## Performance

```bash
sse4_crc32c_node_rs for inputs 1024B x 5,108,123 ops/sec ±1.86% (89 runs sampled)
sse4_crc32c_node_rs for inputs 16931844B, avg 2066B x 271 ops/sec ±1.15% (85 runs sampled)
@node-rs/crc32 for inputs 1024B x 5,108,123 ops/sec ±1.86% (89 runs sampled)
@node-rs/crc32 for inputs 16931844B, avg 2066B x 271 ops/sec ±1.15% (85 runs sampled)
sse4_crc32c_hw for inputs 1024B x 3,543,443 ops/sec ±1.39% (93 runs sampled)
sse4_crc32c_hw for inputs 16931844B, avg 2066B x 209 ops/sec ±0.78% (76 runs sampled)
sse4_crc32c_sw for inputs 1024B x 1,460,284 ops/sec ±2.35% (90 runs sampled)
Expand All @@ -27,7 +27,7 @@ js_crc32 for inputs 16931844B, avg 2066B x 22.12 ops/sec ±5.20% (40 runs sample
+---------------------+-------------------+----------------------+
| │ 1024B │ 16931844B, avg 2066B |
+---------------------+-------------------+----------------------+
| sse4_crc32c_node_rs │ 5,108,123 ops/sec │ 271 ops/sec |
| @node-rs/crc32 │ 5,108,123 ops/sec │ 271 ops/sec |
+---------------------+-------------------+----------------------+
| sse4_crc32c_hw │ 3,543,443 ops/sec │ 209 ops/sec |
+---------------------+-------------------+----------------------+
Expand All @@ -39,13 +39,13 @@ js_crc32 for inputs 16931844B, avg 2066B x 22.12 ops/sec ±5.20% (40 runs sample
+---------------------+-------------------+----------------------+
```

# Support matrix
## Support matrix

| | node 10 | node12 | node13 | node14 |
| ----------------- | ------- | ------ | ------ | ------ |
| Windows 64 latest | | ✅ | ✅ | ✅ |
| macOS latest | | ✅ | ✅ | ✅ |
| Linux | | ✅ | ✅ | ✅ |
| Windows 64 latest | | ✓ | ✓ | ✓ |
| macOS latest | | ✓ | ✓ | ✓ |
| Linux | | ✓ | ✓ | ✓ |

## API

Expand Down
12 changes: 6 additions & 6 deletions packages/crc32/benchmark/crc32.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ egestas tempus leo. Duis condimentum turpis duis.`)
const initialCrc32 = crc32Node(TEST_BUFFER)
const initialCrc32c = Sse4Crc32.calculate(TEST_BUFFER)

console.assert(crc32(TEST_BUFFER), initialCrc32)
console.assert(crc32c(TEST_BUFFER), initialCrc32c)
console.assert(crc32(TEST_BUFFER) === initialCrc32)
console.assert(crc32c(TEST_BUFFER) === initialCrc32c)

const suite = new Suite('crc32c without initial crc')

suite
.add('SIMD + NAPI', () => {
.add('@node/rs crc32c', () => {
crc32c(TEST_BUFFER)
})
.add('sse4_crc32', () => {
Expand All @@ -46,7 +46,7 @@ suite
const suite2 = new Suite('crc32c with initial crc')

suite2
.add('SIMD + NAPI', () => {
.add('@node/rs crc32c', () => {
crc32c(TEST_BUFFER, initialCrc32c)
})
.add('sse4_crc32', () => {
Expand All @@ -63,7 +63,7 @@ suite2
const suite3 = new Suite('crc32 without initial crc')

suite3
.add('SIMD + NAPI', () => {
.add('@node/rs crc32', () => {
crc32(TEST_BUFFER)
})
.add('Node crc', () => {
Expand All @@ -80,7 +80,7 @@ suite3
const suite4 = new Suite('crc32 with initial crc')

suite4
.add('SIMD + NAPI', () => {
.add('@node/rs crc32', () => {
crc32(TEST_BUFFER, initialCrc32)
})
.add('Node crc32', () => {
Expand Down
4 changes: 2 additions & 2 deletions packages/crc32/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
},
"scripts": {
"bench": "cross-env NODE_ENV=production node benchmark/crc32.js",
"build": "cargo build --release && napi --release ./crc32",
"build:debug": "cargo build && napi ./index"
"build": "napi --release ./crc32",
"build:debug": "napi ./index"
},
"bugs": {
"url": "https://github.com/Brooooooklyn/node-rs/issues"
Expand Down
8 changes: 6 additions & 2 deletions packages/crc32/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ use crc32fast::Hasher;
use napi::{Buffer, CallContext, Env, Number, Object, Result, Value};
use std::convert::TryInto;

#[cfg(unix)]
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

mod bytes;
mod crc32;
mod crc32_table;
Expand All @@ -24,7 +28,7 @@ fn init<'env>(
}

#[js_function(2)]
fn crc32c<'a>(ctx: CallContext<'a>) -> Result<Value<'a, Number>> {
fn crc32c(ctx: CallContext) -> Result<Value<Number>> {
let input_data = ctx.get::<Buffer>(0)?;
let init_state = ctx.get::<Number>(1);
let result = if init_state.is_ok() {
Expand All @@ -36,7 +40,7 @@ fn crc32c<'a>(ctx: CallContext<'a>) -> Result<Value<'a, Number>> {
}

#[js_function(2)]
fn crc32<'a>(ctx: CallContext<'a>) -> Result<Value<'a, Number>> {
fn crc32(ctx: CallContext) -> Result<Value<Number>> {
let input_data = ctx.get::<Buffer>(0)?;
let init_state = ctx.get::<Number>(1);
let mut hasher = if init_state.is_ok() {
Expand Down
2 changes: 1 addition & 1 deletion packages/helper/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"registry": "https://registry.npmjs.org/",
"access": "public"
},
"files": ["lib"],
"files": ["lib", "LICENSE"],
"repository": {
"type": "git",
"url": "git+https://github.com/Brooooooklyn/node-rs.git"
Expand Down
20 changes: 20 additions & 0 deletions packages/jieba/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[package]
name = "node-rs-jieba"
version = "0.1.0"
authors = ["LongYinan <lynweklm@gmail.com>"]
edition = "2018"

[lib]
crate-type = ["cdylib"]

[dependencies]
jieba-rs = { version = "0.4", features = ["default-dict", "tfidf", "textrank"] }
napi-rs = { version = "0.2" }
napi-rs-derive = { version = "0.1" }
once_cell = "1.3"

[target.'cfg(unix)'.dependencies]
jemallocator = { version = "0.3", features = ["disable_initial_exec_tls"] }

[build-dependencies]
napi-build = { version = "0.1" }
64 changes: 64 additions & 0 deletions packages/jieba/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# `@node-rs/jieba`

![](https://github.com/Brooooooklyn/node-rs/workflows/CI/badge.svg)

[jieba-rs](https://github.com/messense/jieba-rs) binding to NodeJS

## Without node-gyp

`node-rs/jieba` was prebuilt into binary already, so you don't need fighting with `node-gyp` and c++ toolchains.

## Performance

Due to [jieba-rs is 33% faster than cppjieba](https://blog.paulme.ng/posts/2019-06-30-optimizing-jieba-rs-to-be-33percents-faster-than-cppjieba.html), and N-API is faster than `v8` C++ API, `@node-rs/jieba` is faster than `nodejieba`.

```bash
@node-rs/jieba x 3,763 ops/sec ±1.18% (92 runs sampled)
nodejieba x 2,783 ops/sec ±0.67% (91 runs sampled)
Cut 1184 words bench suite: Fastest is @node-rs/jieba

@node-rs/jieba x 16.10 ops/sec ±1.58% (44 runs sampled)
nodejieba x 9.81 ops/sec ±2.39% (29 runs sampled)
Cut 246568 words bench suite: Fastest is @node-rs/jieba

@node-rs/jieba x 1,739 ops/sec ±0.87% (92 runs sampled)
nodejieba x 931 ops/sec ±1.31% (89 runs sampled)
Tag 1184 words bench suite: Fastest is @node-rs/jieba

@node-rs/jieba x 6.19 ops/sec ±2.01% (20 runs sampled)
nodejieba x 3.06 ops/sec ±5.39% (12 runs sampled)
Tag 246568 words bench suite: Fastest is @node-rs/jieba
```

## Support matrix

| | node 10 | node12 | node13 | node14 |
| ----------------- | ------- | ------ | ------ | ------ |
| Windows 64 latest | ✓ | ✓ | ✓ | ✓ |
| macOS latest | ✓ | ✓ | ✓ | ✓ |
| Linux | ✓ | ✓ | ✓ | ✓ |

## Usage

```javascript
const { load, cut } = require('@node-rs/jieba')

load()

cut('我们中出了一个叛徒', false)

// ["我们", "中", "出", "了", "一个", "叛徒"]
```

```javascript
const { load, cut } = require('@node-rs/jieba')

load()

extract(
'今天纽约的天气真好啊,京华大酒店的张尧经理吃了一只北京烤鸭。后天纽约的天气不好,昨天纽约的天气也不好,北京烤鸭真好吃',
3,
)

// ["北京烤鸭", "纽约", "天气"]
```
24 changes: 24 additions & 0 deletions packages/jieba/__tests__/jieba.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import test from 'ava'
import * as nodejieba from 'nodejieba'

import { cut, tag, extract } from '../index'

const sentence = '我是拖拉机学院手扶拖拉机专业的。不用多久,我就会升职加薪,走上人生巅峰。'

test('cut result should be equal to nodejieba', (t) => {
t.deepEqual(cut(sentence).join(''), nodejieba.cut(sentence).join(''))
})

test('tag result shoule be equal to nodejieba', (t) => {
t.deepEqual(tag(sentence), nodejieba.tag(sentence))
})

test('extract should be equal to nodejieba', (t) => {
const sentence =
'今天纽约的天气真好啊,京华大酒店的张尧经理吃了一只北京烤鸭。后天纽约的天气不好,昨天纽约的天气也不好,北京烤鸭真好吃'
const topn = 3
t.deepEqual(
extract(sentence, topn),
nodejieba.extract(sentence, topn).map((t) => t.word),
)
})
8 changes: 8 additions & 0 deletions packages/jieba/__tests__/tag.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
import test from 'ava'

import { load } from '../index'

test('should be able to load', (t) => {
const fn = () => load()
t.notThrows(fn)
})
9 changes: 9 additions & 0 deletions packages/jieba/__tests__/userdict.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import test from 'ava'
import { loadDict, cut } from '../index'

test('should be able to load custom dict', (t) => {
const userdict = Buffer.from('出了 10000')
loadDict(userdict)
const fixture = '我们中出了一个叛徒'
t.deepEqual(cut(fixture), ['我们', '中', '出了', '一个', '叛徒'])
})
51 changes: 51 additions & 0 deletions packages/jieba/benchmark/jieba.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
const { Suite } = require('benchmark')
const nodejieba = require('nodejieba')
const chalk = require('chalk')
const fs = require('fs')
const { join } = require('path')

const { load, cut, tag } = require('../index')

load()
nodejieba.load()

const fixture = fs.readFileSync(join(__dirname, 'weicheng.txt'), 'utf8')

const preface = `
重印前记《围城》一九四七年在上海初版,一九四八年再版,一九四九年三版,以后国内没有重印过。偶然碰见它的新版,那都是香港的“盗印”本。没有看到台湾的“盗印”,据说在那里它是禁书。美国哥伦比亚大学夏志清教授的英文著作里对它作了过高的评价,导致了一些西方语言的译本。日本京都大学荒井健教授很久以前就通知我他要翻译,近年来也陆续在刊物上发表了译文。现在,人民文学出版社建议重新排印,以便原著在国内较易找着,我感到意外和忻辛。
我写完《围城》,就对它不很满意。出版了我现在更不满意的一本文学批评以后,我抽空又长篇小说,命名《百合心》,也脱胎于法文成语(Iecoeurd“artichaut),中心人物是一个女角。大约已写成了两万字。一九四九年夏天,全家从上海迁居北京,手忙脚乱中,我把一叠看来像乱纸的草稿扔到不知哪里去了。兴致大扫,一直没有再鼓起来,倒也从此省心省事。年复一年,创作的冲动随年衰减,创作的能力逐渐消失——也许两者根本上是一回事,我们常把自己的写作冲动误认为自己的写作才能,自以为要写就意味着会写。相传幸运女神偏向着年轻小伙子,料想文艺女神也不会喜欢老头儿的;不用说有些例外,而有例外正因为有公例。我慢慢地从省心进而收心,不作再写小说的打算。事隔三十余年,我也记不清楚当时腹稿里的人物和情节。就是追忆清楚了,也还算不得数,因为开得出菜单并不等于摆得成酒席,要不然,谁都可以马上称为善做菜的名厨师又兼大请客的阔东道主了,秉承曹雪芹遗志而拟定”后四十回“提纲的学者们也就可以凑得成和的得上一个或半个高鹗了。剩下来的只是一个顽固的信念:假如《百合心》写得成,它会比《围城》好一点。事情没有做成的人老有这类根据不充分的信念;我们对采摘不到的葡萄,不但想像它酸,也很可能想像它是分外地甜。
这部书禄版时的校读很草率,留下不少字句和标点的脱误,就无意中为翻译者安置了拦路石和陷阱。我乘重印的机会,校看一遍,也顺手有节制地修必了一些字句。《序》里删去一节,这一节原是郑西谛先生要我添进去的。在去年美国出版的珍妮·凯利(JeanneKelly)女士和茅国权(NathanK.Mao)先生的英译本里,那一节已省去了。
一九八0年二月这本书第二次印刷,我又改正了几个错字。两次印刷中,江秉祥同志给了技术上和艺术上的帮助,特此志谢。
一九八一年二月我乘第三次印刷的机会,修订了一些文字。有两处多年朦混过去的讹误,是这本书的德译者莫妮克(MonikaMotsch)博士发觉的。
一九八二年十二月为了塞尔望——许来伯(SylvieServan-Schreiber)女士的法语译本,我去年在原书里又校正了几外错漏,也修改了几处词句。恰好这本书又要第次印刷,那些改正就可以安插了。苏联索洛金(V.Sorokin)先生去年提醒我,他的俄译本比原著第一次重印本早问世五个月,我也借此带便提一下。
`

const prefaceLength = preface.length

function createBench(suitename, transform, napi, jieba, input) {
const cutSuite = new Suite(suitename)
console.assert(transform(napi(input)) === transform(jieba(input)))

cutSuite
.add('@node-rs/jieba', () => {
napi(input)
})
.add('nodejieba', () => {
jieba(input)
})
.on('cycle', function (event) {
console.info(String(event.target))
})
.on('complete', function () {
console.info(`${this.name} bench suite: Fastest is ${chalk.green(this.filter('fastest').map('name'))}`)
})
.run()
}

createBench(`Cut ${prefaceLength} words`, (output) => output.join(''), cut, nodejieba.cut, preface)

createBench(`Cut ${fixture.toString().length} words`, (output) => output.join(''), cut, nodejieba.cut, fixture)

createBench(`Tag ${prefaceLength} words`, (output) => typeof output, tag, nodejieba.tag, preface)

createBench(`Tag ${fixture.toString().length} words`, (output) => typeof output, tag, nodejieba.tag, fixture)
Loading