Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
bnc.p is a python script that generates keywords from a text. It has been used during the SPINDLE project to generate keywords from automatic transcriptions.

How to use it


python text.txt


>>> from keywords import keywords_and_ngrams


List object containing two lists of tuples. The first list of tuples contains keywords, log-likelihood values. The second list of tuples contains bigrams, number of appearances values.

keyword-0 ll-0
keyword-1 ll-1
keyword-2 ll-2

bigram-0 n-appearances-bigram-0
bigram-1 n-appearances-bigram-1
bigram-2 n-appearances-bigram-2


From the Automatic Keyword Generation from Automatic Speech-to-Text Transcriptions blog post:

[[["automatic", 154.36391852338383], 
["keywords", 100.22612939881635], 
["transcriptions", 71.04632660561263], 
["corpus", 54.20602606031698], 
["generated", 52.54525739261641], 
["word", 43.869201333759946], 
["keyword", 38.434091570196095], 
["reference", 27.60386703890638], 
["accuracy", 26.693961750667555], 
["frequency", 26.58439010818277], 
], [[["automatic", "transcriptions"], 3]]]


  • nKeywords: number of keywords generated by the script (default 100)
  • thresholdLL: log-likelihood value threshold (default 19)
  • nBigrams: number of bigrams generated by the script (default 25)
  • thresholdBigrams: minimun of appearances of a bigram (default 2)