Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

* updated changelog

* fixed lots of doctest-related issues


svn/trunk@8784
  • Loading branch information...
commit 4b802fc7a1093aa6f412fa39525ac90acda127f0 1 parent 392f099
@stevenbird stevenbird authored
View
14 ChangeLog
@@ -1,4 +1,4 @@
-Version 2.0.1 2011-04-??
+Version 2.0.1 (rc1) 2011-04-11
NLTK:
* added interface to the Stanford POS Tagger
@@ -12,7 +12,7 @@ NLTK:
* fixed issue with NLTK's tokenize module colliding with the Python tokenize module
* fixed issue with stemming Unicode strings
* changed ViterbiParser.nbest_parse to parse
-* KNBC Japanese corpus reader
+* ChaSen and KNBC Japanese corpus readers
* preserve case in concordance display
* fixed bug in simplification of Brown tags
* a version of IBM Model 1 as described in Koehn 2010
@@ -28,9 +28,15 @@ NLTK:
* simplifications and corrections of Earley Chart Parser rules
* several changes to the feature chart parsers for correct unification
* bugfixes: FreqDist.plot, FreqDist.max, NgramModel.entropy, CategorizedCorpusReader, DecisionTreeClassifier
+* removal of Python >2.4 language features for 2.4 compatibility
+* removal of deprecated functions and associated warnings
+* added semantic domains to wordnet corpus reader
+* changed wordnet similarity functions to include instance hyponyms
+* updated to use latest version of Boxer
Data:
-* Japanese corpora...
+* JEITA Public Morphologically Tagged Corpus (in ChaSen format)
+* KNB Annotated corpus of Japanese blog posts
* Fixed some minor bugs in alvey.fcfg, and added number of parse trees in alvey_sentences.txt
* added more comtrans data
@@ -39,7 +45,7 @@ Documentation:
* NLTK Japanese book (chapter 12) by Masato Hagiwara
NLTK-Contrib:
-* Contribute a version of the Viethen and Dale referring expression algorithms
+* Viethen and Dale referring expression algorithms
Thanks to the following contributors to 2.0.1 (since 2.0b9, July 2010)
Yonatan Becker, Steven Bethard, David Coles, Dan Garrette,
View
7 nltk/corpus/reader/bracket_parse.py
@@ -8,7 +8,7 @@
import sys
-from nltk.tree import bracket_parse, Tree
+from nltk.tree import Tree
from util import *
from api import *
@@ -75,14 +75,15 @@ def _normalize(self, t):
def _parse(self, t):
try:
- return bracket_parse(self._normalize(t))
+ return Tree.parse(self._normalize(t))
+
except ValueError, e:
sys.stderr.write("Bad tree detected; trying to recover...\n")
# Try to recover, if we can:
if e.args == ('mismatched parens',):
for n in range(1, 5):
try:
- v = bracket_parse(self._normalize(t+')'*n))
+ v = Tree.parse(self._normalize(t+')'*n))
sys.stderr.write(" Recovered by adding %d close "
"paren(s)\n" % n)
return v
View
6 nltk/sem/chat80.py
@@ -403,7 +403,7 @@ def cities2table(filename, rel_name, dbname, verbose=False, setup=False):
cur.close()
except ImportError:
import warnings
- warnings.warn("To run this function, first install pysqlite.")
+ warnings.warn("To run this function, first install pysqlite, or else use Python 2.5 or later.")
def sql_query(dbname, query):
"""
@@ -423,7 +423,7 @@ def sql_query(dbname, query):
return cur.execute(query)
except ImportError:
import warnings
- warnings.warn("To run this function, first install pysqlite.")
+ warnings.warn("To run this function, first install pysqlite, or else use Python 2.5 or later.")
raise
def _str2records(filename, rel):
@@ -780,7 +780,7 @@ def sql_demo():
print row
except ImportError:
import warnings
- warnings.warn("To run the SQL demo, first install pysqlite.")
+ warnings.warn("To run the SQL demo, first install pysqlite, or else use Python 2.5 or later.")
if __name__ == '__main__':
View
8 nltk/test/ccg.doctest
@@ -196,12 +196,12 @@ Note that while the two derivations are different, they are semantically equival
(((S\NP)/NP)\.,((S\NP)/NP))
-----------------------------------------------------------------------<
((S\NP)/NP)
+ ------------------------------------------------------------------------------->B
+ ((S\NP)/N)
------------------------------------->
(N\.,N)
------------------------------------------------<
N
- -------------------------------------------------------->
- NP
------------------------------------------------------------------------------------------------------------------------------->
(S\NP)
-----------------------------------------------------------------------------------------------------------------------------------<
@@ -216,12 +216,12 @@ Note that while the two derivations are different, they are semantically equival
(((S\NP)/NP)\.,((S\NP)/NP))
-----------------------------------------------------------------------<
((S\NP)/NP)
- ------------------------------------------------------------------------------->B
- ((S\NP)/N)
------------------------------------->
(N\.,N)
------------------------------------------------<
N
+ -------------------------------------------------------->
+ NP
------------------------------------------------------------------------------------------------------------------------------->
(S\NP)
-----------------------------------------------------------------------------------------------------------------------------------<
View
3  nltk/test/chat80.doctest
@@ -199,9 +199,8 @@ to SQL:
Given this grammar, we can express, and then execute, queries in English.
- >>> from nltk.parse import load_earley
>>> from string import join
- >>> cp = load_earley('grammars/book_grammars/sql0.fcfg')
+ >>> cp = nltk.data.load('grammars/book_grammars/sql0.fcfg')
>>> query = 'What cities are in China'
>>> trees = cp.nbest_parse(query.split())
>>> answer = trees[0].node['SEM']
View
2  nltk/test/probability.doctest
@@ -65,7 +65,7 @@ from the whole corpus, not just the training corpus
>>> symbols = list(set([word for sent in corpus for (word,tag) in sent]))
>>> print len(symbols)
1464
- >>> trainer = nltk.HiddenMarkovModelTrainer(tag_set, symbols)
+ >>> trainer = nltk.tag.HiddenMarkovModelTrainer(tag_set, symbols)
We divide the corpus into 90% training and 10% testing
View
22 nltk/test/tree.doctest
@@ -158,26 +158,26 @@ then it simply delegates to `Tree.parse()`.
Trees can be compared for equality:
- >>> tree == bracket_parse(str(tree))
+ >>> tree == Tree.parse(str(tree))
True
- >>> tree2 == bracket_parse(str(tree2))
+ >>> tree2 == Tree.parse(str(tree2))
True
>>> tree == tree2
False
- >>> tree == bracket_parse(str(tree2))
+ >>> tree == Tree.parse(str(tree2))
False
- >>> tree2 == bracket_parse(str(tree))
+ >>> tree2 == Tree.parse(str(tree))
False
- >>> tree != bracket_parse(str(tree))
+ >>> tree != Tree.parse(str(tree))
False
- >>> tree2 != bracket_parse(str(tree2))
+ >>> tree2 != Tree.parse(str(tree2))
False
>>> tree != tree2
True
- >>> tree != bracket_parse(str(tree2))
+ >>> tree != Tree.parse(str(tree2))
True
- >>> tree2 != bracket_parse(str(tree))
+ >>> tree2 != Tree.parse(str(tree))
True
>>> tree < tree2 or tree > tree2
@@ -567,7 +567,7 @@ variable:
Define a helper funciton to create new parented trees:
>>> def make_ptree(s):
- ... ptree = ParentedTree.convert(bracket_parse(s))
+ ... ptree = ParentedTree.convert(Tree.parse(s))
... all_ptrees.extend(t for t in ptree.subtrees()
... if isinstance(t, Tree))
... return ptree
@@ -838,7 +838,7 @@ variable:
Define a helper funciton to create new parented trees:
>>> def make_mptree(s):
- ... mptree = MultiParentedTree.convert(bracket_parse(s))
+ ... mptree = MultiParentedTree.convert(Tree.parse(s))
... all_mptrees.extend(t for t in mptree.subtrees()
... if isinstance(t, Tree))
... return mptree
@@ -1126,6 +1126,6 @@ This used to cause an infinite loop (fixed in svn 6269):
This used to discard the ``(B b)`` subtree (fixed in svn 6270):
- >>> print bracket_parse('((A a) (B b))')
+ >>> print Tree.parse('((A a) (B b))')
( (A a) (B b))
View
2  nltk/test/treetransforms.doctest
@@ -11,7 +11,7 @@ Unit tests for the TreeTransformation class
>>> sentence = "(TOP (S (S (VP (VBN Turned) (ADVP (RB loose)) (PP (IN in) (NP (NP (NNP Shane) (NNP Longman) (POS 's)) (NN trading) (NN room))))) (, ,) (NP (DT the) (NN yuppie) (NNS dealers)) (VP (AUX do) (NP (NP (RB little)) (ADJP (RB right)))) (. .)))"
- >>> tree = bracket_parse(sentence)
+ >>> tree = Tree.parse(sentence)
>>> print tree
(TOP
(S
View
46 nltk/test/wordnet.doctest
@@ -171,13 +171,13 @@ The old behavior can be achieved by setting simulate_root to be False.
A score of 1 represents identity i.e. comparing a sense with itself
will return 1.
- >>> dog.path_similarity(cat)
+ >>> dog.path_similarity(cat) # doctest: +ELLIPSIS
0.2...
- >>> hit.path_similarity(slap)
+ >>> hit.path_similarity(slap) # doctest: +ELLIPSIS
0.142...
- >>> wn.path_similarity(hit, slap)
+ >>> wn.path_similarity(hit, slap) # doctest: +ELLIPSIS
0.142...
>>> print hit.path_similarity(slap, simulate_root=False)
@@ -194,13 +194,13 @@ of the taxonomy in which the senses occur. The relationship is given
as -log(p/2d) where p is the shortest path length and d the taxonomy
depth.
- >>> dog.lch_similarity(cat)
+ >>> dog.lch_similarity(cat) # doctest: +ELLIPSIS
2.028...
- >>> hit.lch_similarity(slap)
+ >>> hit.lch_similarity(slap) # doctest: +ELLIPSIS
1.312...
- >>> wn.lch_similarity(hit, slap)
+ >>> wn.lch_similarity(hit, slap) # doctest: +ELLIPSIS
1.312...
>>> print hit.lch_similarity(slap, simulate_root=False)
@@ -225,7 +225,7 @@ shortest path to the root node is the longest will be selected. Where
the LCS has multiple paths to the root, the longer path is used for
the purposes of the calculation.
- >>> dog.wup_similarity(cat)
+ >>> dog.wup_similarity(cat) # doctest: +ELLIPSIS
0.857...
>>> hit.wup_similarity(slap)
@@ -263,9 +263,9 @@ information content, the result is dependent on the corpus used to
generate the information content and the specifics of how the
information content was created.
- >>> dog.res_similarity(cat, brown_ic)
+ >>> dog.res_similarity(cat, brown_ic) # doctest: +ELLIPSIS
7.911...
- >>> dog.res_similarity(cat, genesis_ic)
+ >>> dog.res_similarity(cat, genesis_ic) # doctest: +ELLIPSIS
7.204...
``synset1.jcn_similarity(synset2, ic):``
@@ -275,9 +275,9 @@ Information Content (IC) of the Least Common Subsumer (most specific
ancestor node) and that of the two input Synsets. The relationship is
given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).
- >>> dog.jcn_similarity(cat, brown_ic)
+ >>> dog.jcn_similarity(cat, brown_ic) # doctest: +ELLIPSIS
0.449...
- >>> dog.jcn_similarity(cat, genesis_ic)
+ >>> dog.jcn_similarity(cat, genesis_ic) # doctest: +ELLIPSIS
0.285...
``synset1.lin_similarity(synset2, ic):``
@@ -287,7 +287,7 @@ Information Content (IC) of the Least Common Subsumer (most specific
ancestor node) and that of the two input Synsets. The relationship is
given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).
- >>> dog.lin_similarity(cat, semcor_ic)
+ >>> dog.lin_similarity(cat, semcor_ic) # doctest: +ELLIPSIS
0.886...
@@ -405,7 +405,7 @@ Bug 160: wup_similarity breaks when the two synsets have no common hypernym
>>> t = wn.synsets('picasso')[0]
>>> m = wn.synsets('male')[1]
- >>> t.wup_similarity(m)
+ >>> t.wup_similarity(m) # doctest: +ELLIPSIS
0.631...
>>> t = wn.synsets('titan')[1]
@@ -418,14 +418,14 @@ Bug 21: "instance of" not included in LCS (very similar to bug 160)
>>> a = wn.synsets("writings")[0]
>>> b = wn.synsets("scripture")[0]
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
- >>> a.jcn_similarity(b, brown_ic)
+ >>> a.jcn_similarity(b, brown_ic) # doctest: +ELLIPSIS
0.175...
Bug 221: Verb root IC is zero
>>> from nltk.corpus.reader.wordnet import information_content
>>> s = wn.synsets('say', wn.VERB)[0]
- >>> information_content(s, brown_ic)
+ >>> information_content(s, brown_ic) # doctest: +ELLIPSIS
4.623...
Bug 161: Comparison between WN keys/lemmas should not be case sensitive
@@ -451,7 +451,7 @@ Bug 382: JCN Division by zero error
>>> shlep = wn.synset('shlep.v.02')
>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
- >>> tow.jcn_similarity(shlep, brown_ic)
+ >>> tow.jcn_similarity(shlep, brown_ic) # doctest: +ELLIPSIS
1...e+300
Bug 428: Depth is zero for instance nouns
@@ -473,7 +473,7 @@ Bug 470: shortest_path_distance ignored instance hypernyms
>>> google = wordnet.synsets("google")[0]
>>> earth = wordnet.synsets("earth")[0]
- >>> google.wup_similarity(earth)
+ >>> google.wup_similarity(earth) # doctest: +ELLIPSIS
0.1...
Bug 484: similarity metrics returned -1 instead of None for no LCS
@@ -505,17 +505,17 @@ Bug 482: Some nouns not being lemmatised by WordNetLemmatizer().lemmatize
Bug 284: instance hypernyms not used in similarity calculations
- >>> wn.synset('john.n.02').lch_similarity(wn.synset('dog.n.01'))
+ >>> wn.synset('john.n.02').lch_similarity(wn.synset('dog.n.01')) # doctest: +ELLIPSIS
1.335...
- >>> wn.synset('john.n.02').wup_similarity(wn.synset('dog.n.01'))
+ >>> wn.synset('john.n.02').wup_similarity(wn.synset('dog.n.01')) # doctest: +ELLIPSIS
0.571...
- >>> wn.synset('john.n.02').res_similarity(wn.synset('dog.n.01'), brown_ic)
+ >>> wn.synset('john.n.02').res_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
2.224...
- >>> wn.synset('john.n.02').jcn_similarity(wn.synset('dog.n.01'), brown_ic)
+ >>> wn.synset('john.n.02').jcn_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
0.075...
- >>> wn.synset('john.n.02').lin_similarity(wn.synset('dog.n.01'), brown_ic)
+ >>> wn.synset('john.n.02').lin_similarity(wn.synset('dog.n.01'), brown_ic) # doctest: +ELLIPSIS
0.252...
- >>> wn.synset('john.n.02').hypernym_paths()
+ >>> wn.synset('john.n.02').hypernym_paths() # doctest: +ELLIPSIS
[[Synset('entity.n.01'), ..., Synset('john.n.02')]]
Issue 541: add domains to wordnet
Please sign in to comment.
Something went wrong with that request. Please try again.