Skip to content

Commit

Permalink
jieba-php 0.25
Browse files Browse the repository at this point in the history
  • Loading branch information
fukuball committed Feb 16, 2016
1 parent 3e3d0f2 commit 40c0834
Show file tree
Hide file tree
Showing 16 changed files with 638,723 additions and 639 deletions.
126 changes: 125 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ jieba-php
[![Codacy Badge](https://api.codacy.com/project/badge/grade/9360ebe8fc2d47d8a64f49f57d2f016f)](https://www.codacy.com/app/fukuball/jieba-php)
[![Made with Love](https://img.shields.io/badge/made%20with-%e2%9d%a4-ff69b4.svg)](http://www.fukuball.com)

"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件,目前翻譯版本為 jieba-0.20 版本,未來再慢慢往上升級,效能也需要再改善,請有興趣的開發者一起加入開發!若想使用 Python 版本請前往 [fxsjy/jieba](https://github.com/fxsjy/jieba)
"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件,目前翻譯版本為 jieba-0.25 版本,未來再慢慢往上升級,效能也需要再改善,請有興趣的開發者一起加入開發!若想使用 Python 版本請前往 [fxsjy/jieba](https://github.com/fxsjy/jieba)

現在已經可以支援繁體中文!只要將字典切換為 big 模式即可!

"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

Expand Down Expand Up @@ -476,6 +478,67 @@ array(21) {
}
```

功能 5):切換成繁體字典
==============

代碼示例 (Tutorial)

```php
ini_set('memory_limit', '1024M');

require_once dirname(dirname(__FILE__))."/vendor/multi-array/MultiArray.php";
require_once dirname(dirname(__FILE__))."/vendor/multi-array/Factory/MultiArrayFactory.php";
require_once dirname(dirname(__FILE__))."/class/Jieba.php";
require_once dirname(dirname(__FILE__))."/class/Finalseg.php";
use Fukuball\Jieba\Jieba;
use Fukuball\Jieba\Finalseg;
Jieba::init(array('mode'=>'default','dict'=>'big'));
Finalseg::init();

$seg_list = Jieba::cut("怜香惜玉也得要看对象啊!");
var_dump($seg_list);

$seg_list = Jieba::cut("憐香惜玉也得要看對象啊!");
var_dump($seg_list);
```

Output:

```php
array(7) {
[0]=>
string(12) "怜香惜玉"
[1]=>
string(3) "也"
[2]=>
string(3) "得"
[3]=>
string(3) "要"
[4]=>
string(3) "看"
[5]=>
string(6) "对象"
[6]=>
string(3) "啊"
}
array(7) {
[0]=>
string(12) "憐香惜玉"
[1]=>
string(3) "也"
[2]=>
string(3) "得"
[3]=>
string(3) "要"
[4]=>
string(3) "看"
[5]=>
string(6) "對象"
[6]=>
string(3) "啊"
}
```

常見問題
========
1) 模型的數據是如何生成的? https://github.com/fxsjy/jieba/issues/7
Expand Down Expand Up @@ -920,6 +983,67 @@ array(21) {
}
```

Function 5):Use Traditional Chinese
==============

Example (Tutorial)

```php
ini_set('memory_limit', '1024M');

require_once dirname(dirname(__FILE__))."/vendor/multi-array/MultiArray.php";
require_once dirname(dirname(__FILE__))."/vendor/multi-array/Factory/MultiArrayFactory.php";
require_once dirname(dirname(__FILE__))."/class/Jieba.php";
require_once dirname(dirname(__FILE__))."/class/Finalseg.php";
use Fukuball\Jieba\Jieba;
use Fukuball\Jieba\Finalseg;
Jieba::init(array('mode'=>'default','dict'=>'big'));
Finalseg::init();

$seg_list = Jieba::cut("怜香惜玉也得要看对象啊!");
var_dump($seg_list);

$seg_list = Jieba::cut("憐香惜玉也得要看對象啊!");
var_dump($seg_list);
```

Output:

```php
array(7) {
[0]=>
string(12) "怜香惜玉"
[1]=>
string(3) "也"
[2]=>
string(3) "得"
[3]=>
string(3) "要"
[4]=>
string(3) "看"
[5]=>
string(6) "对象"
[6]=>
string(3) "啊"
}
array(7) {
[0]=>
string(12) "憐香惜玉"
[1]=>
string(3) "也"
[2]=>
string(3) "得"
[3]=>
string(3) "要"
[4]=>
string(3) "看"
[5]=>
string(6) "對象"
[6]=>
string(3) "啊"
}
```

詞性說明
==============
```
Expand Down
2 changes: 1 addition & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"description": "結巴中文分詞(PHP 版本):做最好的 PHP 中文分詞、中文斷詞組件",
"keywords": ["Jieba", "PHP"],
"license": "MIT",
"version": "0.24",
"version": "0.25",
"authors": [
{
"name": "fukuball",
Expand Down
2 changes: 2 additions & 0 deletions src/class/Jieba.php
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ public static function init($options = array())

if ($options['dict']=='small') {
$f_name = "dict.small.txt";
} else if ($options['dict']=='big') {
$f_name = "dict.big.txt";
} else {
$f_name = "dict.txt";
}
Expand Down
21 changes: 19 additions & 2 deletions src/cmd/demo.php
Original file line number Diff line number Diff line change
Expand Up @@ -12,31 +12,48 @@
* @version GIT: <fukuball/jieba-php>
* @link https://github.com/fukuball/jieba-php
*/
ini_set('memory_limit', '600M');
ini_set('memory_limit', '1024M');

require_once dirname(dirname(__FILE__))."/vendor/multi-array/MultiArray.php";
require_once dirname(dirname(__FILE__))."/vendor/multi-array/Factory/MultiArrayFactory.php";
require_once dirname(dirname(__FILE__))."/class/Jieba.php";
require_once dirname(dirname(__FILE__))."/class/Finalseg.php";
use Fukuball\Jieba\Jieba;
use Fukuball\Jieba\Finalseg;
Jieba::init(array('mode'=>'test','dict'=>'samll'));
Jieba::init(array('mode'=>'test','dict'=>'big'));
Finalseg::init();

$seg_list = Jieba::cut("怜香惜玉也得要看对象啊!");
var_dump($seg_list);

$seg_list = Jieba::cut("憐香惜玉也得要看對象啊!");
var_dump($seg_list);

echo "Full Mode: \n";
$seg_list = Jieba::cut("我来到北京清华大学", true);
var_dump($seg_list);

echo "Full Mode: \n";
$seg_list = Jieba::cut("我來到北京清華大學", true);
var_dump($seg_list);

echo "Default Mode: \n";
$seg_list = Jieba::cut("我来到北京清华大学", false);
var_dump($seg_list);

echo "Default Mode: \n";
$seg_list = Jieba::cut("我來到北京清華大學", false);
var_dump($seg_list);

$seg_list = Jieba::cut("他来到了网易杭研大厦");
var_dump($seg_list);

$seg_list = Jieba::cut("他來到了網易杭研大廈");
var_dump($seg_list);

$seg_list = Jieba::cutForSearch("小明硕士毕业于中国科学院计算所,后在日本京都大学深造");
var_dump($seg_list);

$seg_list = Jieba::cutForSearch("小明碩士畢業于中國科學院計算所,後在日本京都大學深造");
var_dump($seg_list);
?>
4 changes: 2 additions & 2 deletions src/cmd/demo_extract_tags.php
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
* @version GIT: <fukuball/jieba-php>
* @link https://github.com/fukuball/jieba-php
*/
ini_set('memory_limit', '600M');
ini_set('memory_limit', '1024M');

require_once dirname(dirname(__FILE__))."/vendor/multi-array/MultiArray.php";
require_once dirname(dirname(__FILE__))."/vendor/multi-array/Factory/MultiArrayFactory.php";
Expand All @@ -22,7 +22,7 @@
use Fukuball\Jieba\Jieba;
use Fukuball\Jieba\Finalseg;
use Fukuball\Jieba\JiebaAnalyse;
Jieba::init(array('mode'=>'test','dict'=>'samll'));
Jieba::init(array('mode'=>'test','dict'=>'big'));
Finalseg::init();
JiebaAnalyse::init();

Expand Down
14 changes: 10 additions & 4 deletions src/cmd/demo_posseg.php
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
* @version GIT: <fukuball/jieba-php>
* @link https://github.com/fukuball/jieba-php
*/
ini_set('memory_limit', '600M');
ini_set('memory_limit', '1024M');

require_once dirname(dirname(__FILE__))."/vendor/multi-array/MultiArray.php";
require_once dirname(dirname(__FILE__))."/vendor/multi-array/Factory/MultiArrayFactory.php";
Expand All @@ -22,13 +22,19 @@
use Fukuball\Jieba\Jieba;
use Fukuball\Jieba\Finalseg;
use Fukuball\Jieba\Posseg;
Jieba::init();
Jieba::init(array('mode'=>'test','dict'=>'big'));
Finalseg::init();
Posseg::init();

$seg_list = Posseg::cut("这是一个伸手不见五指的黑夜。我叫孙悟空,我爱北京,我爱Python和C++。");
var_dump($seg_list);

//$seg_list = Posseg::posTagReadable($seg_list);
//var_dump($seg_list);
$seg_list = Posseg::posTagReadable($seg_list);
var_dump($seg_list);

$seg_list = Posseg::cut("這是一個伸手不見五指的黑夜。我叫孫悟空,我愛北京,我愛Python和C++");
var_dump($seg_list);

$seg_list = Posseg::posTagReadable($seg_list);
var_dump($seg_list);
?>
6 changes: 3 additions & 3 deletions src/cmd/gen_dict_json.php
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
require_once dirname(dirname(__FILE__))."/vendor/multi-array/Factory/MultiArrayFactory.php";
use Fukuball\Tebru\MultiArray;

$content = fopen(dirname(dirname(__FILE__))."/dict/dict.txt", "r");
$content = fopen(dirname(dirname(__FILE__))."/dict/dict.big.txt", "r");

$trie = new MultiArray(array());

Expand All @@ -39,6 +39,6 @@

}

file_put_contents(dirname(dirname(__FILE__))."/dict/dict.txt.json", json_encode($trie->storage));
file_put_contents(dirname(dirname(__FILE__))."/dict/dict.txt.cache.json", json_encode($trie->cache));
file_put_contents(dirname(dirname(__FILE__))."/dict/dict.big.txt.json", json_encode($trie->storage));
file_put_contents(dirname(dirname(__FILE__))."/dict/dict.big.txt.cache.json", json_encode($trie->cache));
?>
Loading

0 comments on commit 40c0834

Please sign in to comment.