DisKoveror is a Text Analytics framework developed by Serendio. Built on top of other open source packages, it provides a flexible and extensible way to extract Entities, Topics, Categories, Sentiments, and Keywords from unstructured text regardless of its source. The key advantage of DisKoveror over the numerous open source options is it provides access to the best-of-breed components through a plug and play approach and a unified programming interface. DisKoveror has also improved the output quality, in some cases, through Training sets, domain specific ontology, and folksonomy.
DisKoveror has been used to mine brand sentiments from social media, understand customer satisfaction from emails, extract topics from Tweets, compute social influence score, auto-categorize legal documents and much more.
DisKoveror can be accessed through Command Line API, Java API or a RESTful interface.
License: Apache 2.0
The architecture of the system is as given below.
DisKoveror supports Java APIs and a RESTful interface.
#####DisKoveror leverages the open source modules shown below:
######Name Entity extraction
######Sentiment extraction
######Topic extraction
######Keyword extraction
###Getting Started
- JDK (Version 7 or above)
- Maven (Apache Maven 3.0.5 or above)
- Thrift server (Apache Thrift 0.9.2)
- Python (version 2.7.X)
- Pip (version 7.1.X)
The requirements.txt file specifies the software packages along with their versions to be installed. Execute the below command to install all python related dependencies for the Sentiment and Topics.
/diskoveror-ta/src/main/python$ sudo pip install -r requirements.txt
Start the thrift servers for Topics and Sentiments
/diskoveror-ta/src/main/python$ python server.py
To package it in a single executable jar for distribution (.jar file), the following command has to be run from the command line.
/diskoveror-ta$ mvn package dependency:copy-dependencies clean