-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
230 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
# Getting started | ||
|
||
## Installation | ||
|
||
### Installing Python and pip | ||
|
||
Before installing Kinoko, you need to make sure you have Python and `pip` | ||
– the Python package manager – up and running. You can verify if you're already | ||
good to go with the following commands: | ||
|
||
``` sh | ||
python --version | ||
# Python 2.7.13 or above | ||
pip --version | ||
# pip 9.0.1 or above | ||
``` | ||
|
||
Or else, you can install them by either of following | ||
|
||
- [pyenv](https://github.com/pyenv/pyenv#homebrew-on-macos) | ||
```bash | ||
curl https://pyenv.run | bash | ||
``` | ||
- [Anaconda](https://www.anaconda.com/distribution/) | ||
```bash | ||
curl https://repo.anaconda.com/archive/Anaconda3-2019.03-MacOSX-x86_64.sh | bash | ||
``` | ||
|
||
### Installing Kinoko | ||
|
||
using either the PyPI repo or directly from GitHub | ||
|
||
- official PyPI repo | ||
```bash | ||
pip install kinoko | ||
``` | ||
- directly from GitHub code | ||
```bash | ||
pip install git+https://github.com/koyo922/kinoko@master # or other branch | ||
``` | ||
|
||
??? tip "Speedup in mainland China" | ||
consider using Aliyun mirror of PyPI for speed up | ||
```bash | ||
pip install -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com kinoko | ||
``` | ||
|
||
## Usage | ||
|
||
A few demos below: | ||
|
||
### chasing HTTP-redirection(3xx) | ||
|
||
```bash | ||
$ chaseurl --help | ||
Usage: | ||
chase_url [options] | ||
|
||
Options: | ||
-i INPUT --input=INPUT input file [default: /dev/stdin] | ||
BETTER TO USE FILE THAN PIPE, for a meaningful progressbar | ||
-o OUTPUT --output=OUTPUT output file [default: /dev/stdout] | ||
-m MAX_DEPTH --max_depth=MAX_DEPTH max depth of redirection [default: 5] | ||
-t TEMPLATE --template=TEMPLATE output template [default: {n_jumps} {url} {tgt_url}] | ||
supported elements: (n_jumps, url, tgt_url, all_jumps, exception) | ||
NOTE: curly braces are needed, <tab> need to be bash-escaped via $'\t' | ||
|
||
$ echo 'http://www.jingdong.com' | chaseurl 2>/dev/null | ||
2 http://www.jingdong.com https://www.jd.com/ | ||
``` | ||
|
||
```bash | ||
$ echo -e 'http://www.jingdong.com\nhttp://www.baidu.com' > url.txt | ||
$ chaseurl -i url.txt -o 'redir.txt' --template '{tgt_url}' | ||
$ cat redir.txt | ||
https://www.jd.com/ | ||
http://www.baidu.com | ||
``` | ||
|
||
!!! note | ||
|
||
- `#!bash 2>/dev/null` suppress the progress bar output | ||
- default output template is `'{n_jumps}\t{url}\t{tgt_url}'` | ||
|
||
### get `logger` object | ||
|
||
```python tab='script.py' | ||
from kinoko.misc.log_writer import init_log | ||
logger = init_log(__name__) | ||
... | ||
logger.info('msg: %s', msg) | ||
``` | ||
|
||
```bash tab='call from console' | ||
LEVEL=DEBUG python script.py | ||
# default LEVEL=INFO | ||
``` | ||
|
||
!!! caution | ||
The default logger in python `logging` module is not multiprocess-rotation-safe; | ||
we are planing to fix it in version 1.1.0 | ||
|
||
### bash utils | ||
|
||
```bash | ||
$ colormsg "some message" WARNING # default LEVEL=INFO | ||
==> some message # in yellow color | ||
``` | ||
|
||
!!! warning | ||
`colormsg` does not work on Mac Bash | ||
|
||
```bash | ||
# turn on 64GB of virtual memory at /home/work/swap | ||
vmem.sh -a on -s 64 -f /home/work/swap | ||
``` | ||
|
||
### csv utils | ||
|
||
aggregate a tsv/csv file | ||
|
||
```bash | ||
cat <<'EOF' > infile.tsv | ||
わ わ 笑 14614975 | ||
わら わ 笑 1000 | ||
で で で 11270299 | ||
で で で 1000 | ||
が が が 11097238 | ||
EOF | ||
|
||
# aggregation by first 3 columns, summing the last column | ||
aggtsv --infile infile.tsv --sep $'\t' \ | ||
-k 0 1 2 -r 3 -a sum | ||
わ わ 笑 14614975 | ||
わら わ 笑 1000 | ||
で で で 11271299 | ||
が が が 11097238 | ||
``` | ||
|
||
patch a tsv file via one or more reference files | ||
|
||
```bash | ||
$ cat <<'EOF' > ref.tsv | ||
jiaose 角色 juese | ||
xxx 色情词 <DEL> | ||
EOF | ||
|
||
$ cat <<'EOF' > in.tsv | ||
field1 field2 角色 jiaose field4 | ||
field1 field2 角色 jiaose field4 | ||
field1 field2 色情词 xxx field4 | ||
EOF | ||
|
||
$ patchtsv -r ref.tsv -d $'\t' \ | ||
-i in.tsv -o out.tsv \ | ||
-k 3 2 -v 3 | ||
$ cat out.tsv | ||
field1 field2 角色 juese field4 | ||
field1 field2 角色 juese field4 | ||
``` | ||
|
||
### functional utils | ||
|
||
sliding window of any sequence | ||
|
||
```python | ||
>>> from kinoko.func import sliding | ||
>>> for grp in sliding(range(10), size=5 , step=3): | ||
... print(grp) | ||
... | ||
[0, 1, 2, 3, 4] | ||
[3, 4, 5, 6, 7] | ||
|
||
>>> for grp in sliding(range(10), size=5 , step=3, skip_non_full=False): | ||
... print(grp) | ||
... | ||
[0, 1, 2, 3, 4] | ||
[3, 4, 5, 6, 7] | ||
[6, 7, 8, 9] | ||
[9] | ||
``` | ||
|
||
C-equivalent static vars of function | ||
|
||
```python | ||
>>> from kinoko.func import static_vars | ||
>>> @static_vars(counter=0) | ||
... def foo(): | ||
... foo.counter += 1 | ||
... print(foo.counter * 10) | ||
... | ||
... foo() | ||
10 | ||
... foo() | ||
20 | ||
... foo() | ||
30 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,27 @@ | ||
# Welcome to MkDocs | ||
# Kinoko for Text | ||
|
||
For full documentation visit [mkdocs.org](http://mkdocs.org). | ||
## Speed up your text workflow | ||
|
||
## Commands | ||
Kinoko is a Python/Bash package designed for common text/NLP pre/post processing tasks. | ||
It has various handy tools integrated for easy installation and usage. | ||
|
||
* `mkdocs new [dir-name]` - Create a new project. | ||
* `mkdocs serve` - Start the live-reloading docs server. | ||
* `mkdocs build` - Build the documentation site. | ||
* `mkdocs help` - Print this help message. | ||
## Quick Start | ||
|
||
## Project layout | ||
Install the latest version of kinoko with `pip`: | ||
|
||
mkdocs.yml # The configuration file. | ||
docs/ | ||
index.md # The documentation homepage. | ||
... # Other markdown pages, images and other files. | ||
```bash | ||
pip install kinoko | ||
``` | ||
|
||
## What to expect | ||
|
||
- Various Python Class/Function for: | ||
- common text processing tasks(csv processing, etc.) | ||
- python code profiling & efficiency-boosting utils(e.g. parallelization) | ||
- Bash scripts for: | ||
- virtual-memory management | ||
- color printing | ||
- More to come ... | ||
- Any contribution is welcomed | ||
|
||
For detailed instructions see the [getting started guide](getting-started.md). |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters