Skip to content

Commit

Permalink
Merge pull request #7 from fastnlp/dev0.5.0
Browse files Browse the repository at this point in the history
Dev0.5.0 Update
  • Loading branch information
dqwang122 committed Aug 26, 2019
2 parents ec08564 + ffd5fd8 commit 2b9aab4
Show file tree
Hide file tree
Showing 287 changed files with 11,847 additions and 6,783 deletions.
3 changes: 3 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
language: python
python:
- "3.6"

env:
- TRAVIS=1
# command to install dependencies
install:
- pip install --quiet -r requirements.txt
Expand Down
29 changes: 22 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
![Hex.pm](https://img.shields.io/hexpm/l/plug.svg)
[![Documentation Status](https://readthedocs.org/projects/fastnlp/badge/?version=latest)](http://fastnlp.readthedocs.io/?badge=latest)

fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个序列标注([NER](reproduction/seqence_labelling/ner)、POS-Tagging等)、中文分词、[文本分类](reproduction/text_classification)[Matching](reproduction/matching)[指代消解](reproduction/coreference_resolution)[摘要](reproduction/Summarization)等任务; 也可以使用它构建许多复杂的网络模型,进行科研。它具有如下的特性:
fastNLP 是一款轻量级的 NLP 工具包。你既可以使用它快速地完成一个序列标注([NER](reproduction/seqence_labelling/ner)、POS-Tagging等)、中文分词、[文本分类](reproduction/text_classification)[Matching](reproduction/matching)[指代消解](reproduction/coreference_resolution)[摘要](reproduction/Summarization)等任务; 也可以使用它快速构建许多复杂的网络模型,进行科研。它具有如下的特性:

- 统一的Tabular式数据容器,让数据预处理过程简洁明了。内置多种数据集的DataSet Loader,省去预处理代码;
- 统一的Tabular式数据容器,让数据预处理过程简洁明了。内置多种数据集的Loader和Pipe,省去预处理代码;
- 多种训练、测试组件,例如训练器Trainer;测试器Tester;以及各种评测metrics等等;
- 各种方便的NLP工具,例如预处理embedding加载(包括ELMo和BERT); 中间数据cache等;
- 部分[数据集与预训练模型](https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?c=A1A0A0)的自动下载
- 详尽的中文[文档](https://fastnlp.readthedocs.io/)[教程](https://fastnlp.readthedocs.io/zh/latest/user/tutorials.html)以供查阅;
- 提供诸多高级模块,例如Variational LSTM, Transformer, CRF等;
- 在序列标注、中文分词、文本分类、Matching、指代消解、摘要等任务上封装了各种模型可供直接使用,详细内容见 [reproduction](reproduction) 部分;
Expand All @@ -36,29 +37,39 @@ pip install fastNLP
python -m spacy download en
```

目前使用pypi安装fastNLP的版本是0.4.1,有较多功能仍未更新,最新内容以master分支为准。
fastNLP0.5.0版本将在近期推出,请密切关注。


## fastNLP教程

- [0. 快速入门](https://fastnlp.readthedocs.io/zh/latest/user/quickstart.html)
- [1. 使用DataSet预处理文本](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_1_data_preprocess.html)
- [2. 使用DataSetLoader加载数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_load_dataset.html)
- [2. 使用Loader和Pipe加载并处理数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_load_dataset.html)
- [3. 使用Embedding模块将文本转成向量](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_3_embedding.html)
- [4. 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_4_loss_optimizer.html)
- [5. 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_5_datasetiter.html)
- [6. 快速实现序列标注模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_6_seq_labeling.html)
- [7. 使用Modules和Models快速搭建自定义模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_7_modules_models.html)
- [8. 使用Metric快速评测你的模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_8_metrics.html)
- [9. 使用Callback自定义你的训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_9_callback.html)
- [10. 使用fitlog 辅助 fastNLP 进行科研](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_10_fitlog.html)



## 内置组件

大部分用于的 NLP 任务神经网络都可以看做由编码器(encoder)、解码器(decoder)两种模块组成。
大部分用于的 NLP 任务神经网络都可以看做由词嵌入(embeddings)和两种模块:编码器(encoder)、解码器(decoder)组成。

以文本分类任务为例,下图展示了一个BiLSTM+Attention实现文本分类器的模型流程图:


![](./docs/source/figures/text_classification.png)

fastNLP 在 modules 模块中内置了两种模块的诸多组件,可以帮助用户快速搭建自己所需的网络。 两种模块的功能和常见组件如下:
fastNLP 在 embeddings 模块中内置了几种不同的embedding:静态embedding(GloVe、word2vec)、上下文相关embedding
(ELMo、BERT)、字符embedding(基于CNN或者LSTM的CharEmbedding)

与此同时,fastNLP 在 modules 模块中内置了两种模块的诸多组件,可以帮助用户快速搭建自己所需的网络。 两种模块的功能和常见组件如下:

<table>
<tr>
Expand All @@ -81,7 +92,7 @@ fastNLP 在 modules 模块中内置了两种模块的诸多组件,可以帮助

## 项目结构

![](./docs/source/figures/workflow.png)
<img src="./docs/source/figures/workflow.png" width="60%" height="60%">

fastNLP的大致工作流程如上图所示,而项目结构如下:

Expand All @@ -102,9 +113,13 @@ fastNLP的大致工作流程如上图所示,而项目结构如下:
<td><b> fastNLP.modules </b></td>
<td> 实现了用于搭建神经网络模型的诸多组件 </td>
</tr>
<tr>
<td><b> fastNLP.embeddings </b></td>
<td> 实现了将序列index转为向量序列的功能,包括读取预训练embedding等 </td>
</tr>
<tr>
<td><b> fastNLP.io </b></td>
<td> 实现了读写功能,包括数据读入,模型读写等 </td>
<td> 实现了读写功能,包括数据读入与预处理,模型读写,自动下载等 </td>
</tr>
</table>

Expand Down
4 changes: 2 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

apidoc:
$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ)
$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ) && python3 format.py

server:
cd build/html && python -m http.server

dev:
rm -rf build/html && make html && make server
rm -rf build && make html && make server

.PHONY: help Makefile

Expand Down
65 changes: 65 additions & 0 deletions docs/count.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import os
import sys


def find_all_modules():
modules = {}
children = {}
to_doc = set()
root = '../fastNLP'
for path, dirs, files in os.walk(root):
for file in files:
if file.endswith('.py'):
name = ".".join(path.split('/')[1:])
if file.split('.')[0] != "__init__":
name = name + '.' + file.split('.')[0]
__import__(name)
m = sys.modules[name]
modules[name] = m
try:
m.__all__
except:
print(name, "__all__ missing")
continue
if m.__doc__ is None:
print(name, "__doc__ missing")
continue
if "undocumented" not in m.__doc__:
to_doc.add(name)
for module in to_doc:
t = ".".join(module.split('.')[:-1])
if t in to_doc:
if t not in children:
children[t] = set()
children[t].add(module)
for m in children:
children[m] = sorted(children[m])
return modules, to_doc, children


def create_rst_file(modules, name, children):
m = modules[name]
with open("./source/" + name + ".rst", "w") as fout:
t = "=" * len(name)
fout.write(name + "\n")
fout.write(t + "\n")
fout.write("\n")
fout.write(".. automodule:: " + name + "\n")
if len(m.__all__) > 0:
fout.write(" :members: " + ", ".join(m.__all__) + "\n")
fout.write(" :inherited-members:\n")
fout.write("\n")
if name in children:
fout.write("子模块\n------\n\n.. toctree::\n\n")
for module in children[name]:
fout.write(" " + module + "\n")


def main():
modules, to_doc, children = find_all_modules()
for name in to_doc:
create_rst_file(modules, name, children)


if __name__ == "__main__":
main()
8 changes: 5 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,14 @@
autodoc_default_options = {
'member-order': 'bysource',
'special-members': '__init__',
'undoc-members': True,
'undoc-members': False,
}

autoclass_content = "class"

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# template_bridge
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
Expand Down Expand Up @@ -113,7 +115,7 @@
# -- Options for HTMLHelp output ---------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = 'fastNLPdoc'
htmlhelp_basename = 'fastNLP doc'

# -- Options for LaTeX output ------------------------------------------------

Expand Down
6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.batch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.batch
==================

.. automodule:: fastNLP.core.batch
:members:
:undoc-members:
:show-inheritance:
:members: BatchIter, DataSetIter, TorchLoaderIter
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.callback.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.callback
=====================

.. automodule:: fastNLP.core.callback
:members:
:undoc-members:
:show-inheritance:
:members: Callback, GradientClipCallback, EarlyStopCallback, FitlogCallback, EvaluateCallback, LRScheduler, ControlC, LRFinder, TensorboardCallback, WarmupCallback, SaveModelCallback, EchoCallback, TesterCallback, CallbackException, EarlyStopError
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.const.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.const
==================

.. automodule:: fastNLP.core.const
:members:
:undoc-members:
:show-inheritance:
:members: Const
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.dataset
====================

.. automodule:: fastNLP.core.dataset
:members:
:undoc-members:
:show-inheritance:
:members: DataSet
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.field.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.field
==================

.. automodule:: fastNLP.core.field
:members:
:undoc-members:
:show-inheritance:
:members: Padder, AutoPadder, EngChar2DPadder
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.instance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.instance
=====================

.. automodule:: fastNLP.core.instance
:members:
:undoc-members:
:show-inheritance:
:members: Instance
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.losses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.losses
===================

.. automodule:: fastNLP.core.losses
:members:
:undoc-members:
:show-inheritance:
:members: LossBase, LossFunc, LossInForward, CrossEntropyLoss, BCELoss, L1Loss, NLLLoss
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.metrics
====================

.. automodule:: fastNLP.core.metrics
:members:
:undoc-members:
:show-inheritance:
:members: MetricBase, AccuracyMetric, SpanFPreRecMetric, ExtractiveQAMetric
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.optimizer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.optimizer
======================

.. automodule:: fastNLP.core.optimizer
:members:
:undoc-members:
:show-inheritance:
:members: Optimizer, SGD, Adam, AdamW
:inherited-members:

9 changes: 3 additions & 6 deletions docs/source/fastNLP.core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,13 @@ fastNLP.core
============

.. automodule:: fastNLP.core
:members:
:undoc-members:
:show-inheritance:
:members: DataSet, Instance, FieldArray, Padder, AutoPadder, EngChar2DPadder, Vocabulary, DataSetIter, BatchIter, TorchLoaderIter, Const, Tester, Trainer, cache_results, seq_len_to_mask, get_seq_len, logger, Callback, GradientClipCallback, EarlyStopCallback, FitlogCallback, EvaluateCallback, LRScheduler, ControlC, LRFinder, TensorboardCallback, WarmupCallback, SaveModelCallback, EchoCallback, TesterCallback, CallbackException, EarlyStopError, LossFunc, CrossEntropyLoss, L1Loss, BCELoss, NLLLoss, LossInForward, AccuracyMetric, SpanFPreRecMetric, ExtractiveQAMetric, Optimizer, SGD, Adam, AdamW, SequentialSampler, BucketSampler, RandomSampler, Sampler
:inherited-members:

子模块
----------
------

.. toctree::
:titlesonly:

fastNLP.core.batch
fastNLP.core.callback
Expand All @@ -26,4 +24,3 @@ fastNLP.core
fastNLP.core.trainer
fastNLP.core.utils
fastNLP.core.vocabulary

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.sampler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.sampler
====================

.. automodule:: fastNLP.core.sampler
:members:
:undoc-members:
:show-inheritance:
:members: Sampler, BucketSampler, SequentialSampler, RandomSampler
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.tester.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.tester
===================

.. automodule:: fastNLP.core.tester
:members:
:undoc-members:
:show-inheritance:
:members: Tester
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.trainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.trainer
====================

.. automodule:: fastNLP.core.trainer
:members:
:undoc-members:
:show-inheritance:
:members: Trainer
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.utils
==================

.. automodule:: fastNLP.core.utils
:members:
:undoc-members:
:show-inheritance:
:members: cache_results, seq_len_to_mask, get_seq_len
:inherited-members:

6 changes: 3 additions & 3 deletions docs/source/fastNLP.core.vocabulary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ fastNLP.core.vocabulary
=======================

.. automodule:: fastNLP.core.vocabulary
:members:
:undoc-members:
:show-inheritance:
:members: Vocabulary, VocabularyOption
:inherited-members:

7 changes: 7 additions & 0 deletions docs/source/fastNLP.embeddings.bert_embedding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.embeddings.bert_embedding
=================================

.. automodule:: fastNLP.embeddings.bert_embedding
:members: BertEmbedding, BertWordPieceEncoder
:inherited-members:

7 changes: 7 additions & 0 deletions docs/source/fastNLP.embeddings.char_embedding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.embeddings.char_embedding
=================================

.. automodule:: fastNLP.embeddings.char_embedding
:members: CNNCharEmbedding, LSTMCharEmbedding
:inherited-members:

7 changes: 7 additions & 0 deletions docs/source/fastNLP.embeddings.contextual_embedding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.embeddings.contextual_embedding
=======================================

.. automodule:: fastNLP.embeddings.contextual_embedding
:members: ContextualEmbedding
:inherited-members:

7 changes: 7 additions & 0 deletions docs/source/fastNLP.embeddings.elmo_embedding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fastNLP.embeddings.elmo_embedding
=================================

.. automodule:: fastNLP.embeddings.elmo_embedding
:members: ElmoEmbedding
:inherited-members:

Loading

0 comments on commit 2b9aab4

Please sign in to comment.