Keywords.py is a python script that generates keywords from a text. It has been used during the [SPINDLE] (http://openspires.oucs.ox.ac.uk/spindle/) project to [generate keywords from automatic transcriptions] (http://blogs.oucs.ox.ac.uk/openspires/2012/06/29/automatic-keyword-generation-from-automatic-speech-to-text-transcriptions/).
Usage:
python keywords.py text.txt
or
>>> from keywords import keywords_and_ngrams
Output:
List object containing two lists of tuples. The first list of tuples contains keywords, log-likelihood values. The second list of tuples contains bigrams, number of appearances values.
keyword-0 ll-0
keyword-1 ll-1
keyword-2 ll-2
bigram-0 n-appearances-bigram-0
bigram-1 n-appearances-bigram-1
bigram-2 n-appearances-bigram-2
Example
From the [Automatic Keyword Generation from Automatic Speech-to-Text Transcriptions blog post] (http://blogs.oucs.ox.ac.uk/openspires/2012/06/29/automatic-keyword-generation-from-automatic-speech-to-text-transcriptions/):
[[["automatic", 154.36391852338383],
["keywords", 100.22612939881635],
["transcriptions", 71.04632660561263],
["corpus", 54.20602606031698],
["generated", 52.54525739261641],
["word", 43.869201333759946],
["keyword", 38.434091570196095],
["reference", 27.60386703890638],
["accuracy", 26.693961750667555],
["frequency", 26.58439010818277],
...
], [[["automatic", "transcriptions"], 3]]]
Parameters
- nKeywords: number of keywords generated by the script (default 100)
- thresholdLL: log-likelihood value threshold (default 19)
- nBigrams: number of bigrams generated by the script (default 25)
- thresholdBigrams: minimun of appearances of a bigram (default 2)