Highly performant version of open-text-summarizer
C Shell
Latest commit 0ddc31e Feb 20, 2014 Reetesh Ranjan summarizer makefile changes.
Permalink
Failed to load latest commit information.
articles
dict Initial version Dec 23, 2013
etc
src summarizer makefile changes. Feb 20, 2014
LICENSE Initial version Dec 23, 2013
Makefile.am
Makefile.in * Summarizer daemon Jan 17, 2014
README.md
aclocal.m4
config.h.in
configure
configure.ac * Summarizer daemon Jan 17, 2014
depcomp
install-sh
missing Initial version Dec 23, 2013

README.md

summarizer

Version

1.0

Overview

This is an adaptation of Open-Text-Summarizer (aka ots) with
mean optimizations for a faster output.

Original work: http://libots.sourceforge.net/
Github version: https://github.com/neopunisher/Open-Text-Summarizer

Build

$ ./configure [--prefix=/usr/local/summarizer/]
$ make
$ [sudo] make install

Usage

Summarizer command line application

$ [prefix]/bin/summarizer -i <file-to-summarize> -r <summary-ratio>

Start/stop summarizer daemon

$ sudo service summarizerd start
$ sudo service summarizerd stop
$ [prefix]/bin/summarizerd -h (for command line options)

Performance Comparison

System: 1 VCPU, 512MB RAM, 20GB SSD

Article: washingtonpost1.txt
ots: between 40ms and 50ms
summarizer: between 10ms and 12ms

Article: washingtonpost2.txt
ots: between 40ms and 50ms
summarizer: between 8ms and 10ms

Limitations

Languages: English only (as of now)

Daemon protocol

NOTE: Refer src/daemontest.c for sample client application

Request

[2 bytes] Summarizerd protocol [Accepted: 0x1421]
[2 bytes] Summarizerd version  [Accepted: 1]
[4 bytes] Ratio ("Read" as float by daemon: refer daemontest.c)
[4 bytes] Document name length [Max: 256]
[N bytes] Document name (as long as above field's value)

Response

[2 bytes] Summarizerd protocol [Accepted: 0x1421]
[2 bytes] Summarizerd version  [Accepted: 1]
[4 bytes] Status code [0: summary, 1: bad request, 2: internal error]
[4 bytes] Length of summary (if status == summary)
[N bytes] Summary (as long as above field's value)

Tweaks

Summarizerd supports multiple command line options to tweak its config. Here
are the config options:

*  Number of worker threads to use
*  Number of clients to keep in listening queue
*  Socket port to listen on
*  Log/PID files, logging level
*  For debugging, foreground mode can be used

Enter '$ [prefix]/bin/summarizerd -h' for all the options

Bugs

*  The etc/summarizerd init script is hardcoded to use /usr/local/summarizer
   as the default prefix