Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

SpExtor: Sparse Entity Extractor

The implementation of the method in our COLING 2018 paper "A Practical Incremental Learning Framework For Sparse Entity Extraction".


Paper and Slides


Annotated Data mentioned in the paper in addition to many other datasets can be downloaded from This Link.

Source Tree

└── src
    ├── main  
    │   ├── java 
    │   │   ├──       :  implements active learning
    │   │   ├──                 :  core code
    │   │   ├──    :  Implements an class to use Core NLP NERFeatureFactory
    │   │   ├──   :  implementation of the entity set expansion method
    │   │   └──       :  feature factory for the entity set expansion method
    │   │
    │   └── resources
    │       └── wordnet.dict    : contains WordNet dictionaries files
    └── test 
        ├── java 
        │   ├──                    : prepares the testing data
        │   ├──     : testing class using CoreNLPFeaturizer which uses Core NLP NERFeatureFactory
        │   └──               : main testing class to run SpExtor to learn a model from data
        └── resources   : contains the gold training and testing data

How to use

  • Clone SpExtor to your local machine:
  •   git clone
  • Download and install IntelliJ IDEA from
  • Open IntelliJ, click on open, navigate to where you cloned SpExtor, select the folder SpExtor, hit Open.
  • In the src -> test -> java ->, modify the parameters as you desire and then run the code.
  • You can serialize the final CRF model from Active Learning.
  • The sigma values over the different batches can be found under SpExtor/out.


This work is licensed under GPL-3.0 and CreativesForGood licenses. A copy of the first license can be found in this repository. The other license can be found over this link C4G License.

GPLv3 Logo CreativesForGood Logo


If you do make use of SpExtor or any of its components please cite the following publication:

Hussein S. Al-Olimat, Steven Gustafson, Jason Mackay, Krishnaprasad Thirunarayan, and Amit Sheth. 2018. 
A practical incremental learning framework for sparse entity extraction. In Proceedings of the 27th International
Conference on Computational Linguistics (COLING 2018), pages 700–710. Association for Computational Linguistics.

  author = 	"Al-Olimat, Hussein S.
            and Gustafson, Steven
            and Mackay, Jason
            and Thirunarayan, Krishnaprasad
            and Sheth, Amit",
  title = "A Practical Incremental Learning Framework For Sparse Entity Extraction",
  booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
  year = "2018",
  publisher = "Association for Computational Linguistics",
  pages = "700--710",
  location = "Santa Fe, New Mexico, USA",
  url = ""

We would also be very happy if you provide a link to the github repository:

... Sparse Entity Extractor tool (SpExtor)\footnote{