EduNLP is a library for advanced Natural Language Processing in Python and is one of the projects of EduX plan of BDAA. It's built on the very latest research, and was designed from day one to be used in real educational products.
EduNLP now comes with pretrained pipelines and currently supports segment, tokenization and vertorization. It supports varies of preprocessing for NLP in educational scenario, such as formula parsing, multi-modal segment.
EduNLP is commercial open-source software, released under the Apache-2.0 license.
EduNLP requires Python version 3.6, 3.7, 3.8 or 3.9. EduNLP use PyTorch as the backend tensor library.
We recommend installing EduNLP by pip
:
# basic installation pip install EduNLP # full installation pip install EduNLP[full]
But you can also install from source:
git clone https://github.com/bigdata-ustc/EduNLP.git cd EduNLP # basic installation pip install . # full installation pip install .[full]
One basic usage of EduNLP is to convert an item into a vector, i.e.,
from EduNLP import get_pretrained_i2v
i2v = get_pretrained_i2v("d2v_all_256", "./model")
item_vector, token_vector = i2v(["the content of item 1", "the content of item 2"])
For absolute beginners, start with the :doc:`Tutorial to EduNLP <tutorial/en/index>` :doc:`(中文版) <tutorial/zh/index>`. It covers the basic concepts of EduNLP and a step-by-step on training, loading and using the language models.
We will continuously publish new datasets in Standard Item Format (SIF) to encourage the relevant research works. The data resources can be accessed via another EduX project EduData
EduNLP is free software; you can redistribute it and/or modify it under the terms of the Apache License 2.0. We welcome contributions. Join us on GitHub and check out our contribution guidelines (中文版).
If this repository is helpful for you, please cite our work
@misc{bigdata2021edunlp, title={EduNLP}, author={bigdata-ustc}, publisher = {GitHub}, journal = {GitHub repository}, year = {2021}, howpublished = {\url{https://github.com/bigdata-ustc/EduNLP}}, }
.. toctree:: :caption: Introduction :hidden: self
.. toctree:: :maxdepth: 1 :caption: Tutorial :hidden: :glob: tutorial/en/index tutorial/en/sif tutorial/en/parse tutorial/en/seg tutorial/en/tokenize tutorial/en/pretrain tutorial/en/vectorization
.. toctree:: :maxdepth: 1 :caption: 用户指南 :hidden: tutorial/zh/index tutorial/zh/sif tutorial/zh/parse tutorial/zh/seg tutorial/zh/tokenize tutorial/zh/pretrain tutorial/zh/vectorization
.. toctree:: :maxdepth: 2 :caption: API Reference :hidden: :glob: api/sif api/utils api/formula api/tokenizer api/pretrain api/ModelZoo api/i2v api/vector