Highly performant version of open-text-summarizer
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
articles
dict
etc
src
LICENSE
Makefile.am
Makefile.in
README.md
aclocal.m4
config.h.in
configure
configure.ac
depcomp
install-sh
missing

README.md

summarizer

Version

1.0

Overview

This is an adaptation of Open-Text-Summarizer (aka ots) with
mean optimizations for a faster output.

Original work: http://libots.sourceforge.net/
Github version: https://github.com/neopunisher/Open-Text-Summarizer

Build

$ ./configure [--prefix=/usr/local/summarizer/]
$ make
$ [sudo] make install

Usage

Summarizer command line application

$ [prefix]/bin/summarizer -i <file-to-summarize> -r <summary-ratio>

Start/stop summarizer daemon

$ sudo service summarizerd start
$ sudo service summarizerd stop
$ [prefix]/bin/summarizerd -h (for command line options)

Performance Comparison

System: 1 VCPU, 512MB RAM, 20GB SSD

Article: washingtonpost1.txt
ots: between 40ms and 50ms
summarizer: between 10ms and 12ms

Article: washingtonpost2.txt
ots: between 40ms and 50ms
summarizer: between 8ms and 10ms

Limitations

Languages: English only (as of now)

Daemon protocol

NOTE: Refer src/daemontest.c for sample client application

Request

[2 bytes] Summarizerd protocol [Accepted: 0x1421]
[2 bytes] Summarizerd version  [Accepted: 1]
[4 bytes] Ratio ("Read" as float by daemon: refer daemontest.c)
[4 bytes] Document name length [Max: 256]
[N bytes] Document name (as long as above field's value)

Response

[2 bytes] Summarizerd protocol [Accepted: 0x1421]
[2 bytes] Summarizerd version  [Accepted: 1]
[4 bytes] Status code [0: summary, 1: bad request, 2: internal error]
[4 bytes] Length of summary (if status == summary)
[N bytes] Summary (as long as above field's value)

Tweaks

Summarizerd supports multiple command line options to tweak its config. Here
are the config options:

*  Number of worker threads to use
*  Number of clients to keep in listening queue
*  Socket port to listen on
*  Log/PID files, logging level
*  For debugging, foreground mode can be used

Enter '$ [prefix]/bin/summarizerd -h' for all the options

Bugs

*  The etc/summarizerd init script is hardcoded to use /usr/local/summarizer
   as the default prefix