Navigation Menu

Skip to content

sgrau/spindle-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

Keywords.py is a python script that generates keywords from a text. It has been used during the [SPINDLE] (http://openspires.oucs.ox.ac.uk/spindle/) project to [generate keywords from automatic transcriptions] (http://blogs.oucs.ox.ac.uk/openspires/2012/06/29/automatic-keyword-generation-from-automatic-speech-to-text-transcriptions/).

How to use it

Usage:

python keywords.py text.txt

or

>>> from keywords import keywords_and_ngrams

Output:

List object containing two lists of tuples. The first list of tuples contains keywords, log-likelihood values. The second list of tuples contains bigrams, number of appearances values.

keyword-0 ll-0
keyword-1 ll-1
keyword-2 ll-2

bigram-0 n-appearances-bigram-0
bigram-1 n-appearances-bigram-1
bigram-2 n-appearances-bigram-2

Example

From the [Automatic Keyword Generation from Automatic Speech-to-Text Transcriptions blog post] (http://blogs.oucs.ox.ac.uk/openspires/2012/06/29/automatic-keyword-generation-from-automatic-speech-to-text-transcriptions/):

[[["automatic", 154.36391852338383], 
["keywords", 100.22612939881635], 
["transcriptions", 71.04632660561263], 
["corpus", 54.20602606031698], 
["generated", 52.54525739261641], 
["word", 43.869201333759946], 
["keyword", 38.434091570196095], 
["reference", 27.60386703890638], 
["accuracy", 26.693961750667555], 
["frequency", 26.58439010818277], 
...
], [[["automatic", "transcriptions"], 3]]]

Parameters

  • nKeywords: number of keywords generated by the script (default 100)
  • thresholdLL: log-likelihood value threshold (default 19)
  • nBigrams: number of bigrams generated by the script (default 25)
  • thresholdBigrams: minimun of appearances of a bigram (default 2)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published