Browse files

week 5 assignment

  • Loading branch information...
jdcheesman committed Apr 30, 2012
1 parent 7bfe8f8 commit b65f622e983581379f4f89f654097bcd2716742c
Binary file not shown.
@@ -0,0 +1,73 @@
+We've provided you with three basic tools for developing your PCFG. All of the
+tools revolve around flat tab separated grammar files (ending in .gr).
+lines beginning with the # symbol are considered comments. These files contain
+rules of the form:
+ weight <tab> parent <tab> child1 <space> child2 ...
+We've given you a set of grammar files to get you started, but which you should
+modify as you please. They have the following organization:
+ # a baseline grammar which can generate any sentence over the vocab
+ # a starter grammar which contains the top level smoothing rules and
+ # gives a few simple example rules
+ # a grammar describing the part(s) of speech for each term in the vocab
+PCFG Parser
+This program takes in a sentence file and a sequence of grammar files and parses
+each of the sentences with the grammar. It will print out the maximum probability
+parse tree and the log probability for that parse. It also computes the perplexity
+of your grammar on the given sentence file. Because your grade will be a function
+of your perplexity on the dev and test data sets, this is going to be the key tool
+for evaluating your grammar. To invoke the PCFGParser call:
+ java -jar pcfg.jar parse dev.sen *.gr
+to print the parse trees in the Penn treebank format use the -t option:
+ java -jar pcfg.jar parse -t dev.sen *.gr
+PCFG Generator
+This program takes a sequence of grammar files and samples sentences from the
+distribution of sentences described by your PCFG. This can be useful for finding
+glaring errors in your grammar, or undesirable biases. To generate the default
+number of sentences, 20, you can just call:
+ java -jar pcfg.jar generate *.gr
+or to generate an arbitrary number of sentences, use the -n option:
+ java -jar pcfg.jar generate -n 100 *.gr
+and to print the parse trees along with the actual sentences, use the -t option (first):
+ java -jar pcfg.jar generate -t -n 100 *.gr
+Validate Grammar
+This program checks checks the terminals of your grammar against a hard-coded list
+of allowed words (also given in the allowed words file). This is useful for making
+sure that you haven't created any non-terminals which never generate a terminal.
+While you won't be explicitly penalized for such mistakes, they will only hurt your
+perplexity because they will hold out probability for symbols which never actually
+occur in the dev or test set. It also makes sure that you have some rule which generates
+every word in the list of allowed words. The starter grammar files already satisfy this,
+but this will show you if you accidentally change things for the worse. This utility
+operates sort of like the unix diff where the first file is implicitly the list of allowed
+words. Specifically, it will print words in either set difference marked with the following
+ java -jar pcfg.jar validate *.gr
+To submit, run the following:
+ java -jar pcfg.jar submit *.gr
@@ -0,0 +1,130 @@
+# The start symbol is START.
+# These two rules are required; choose their weights carefully!
+60 START S1
+40 START S2
+5 S1 NP AnyVP EOS
+1 S1 NP Conj NP AnyVP EOS
+5 S1 Q EOS
+1 S1 WhPronoun Q ?
+1 S1 WhAdv Q ?
+1 S1 Num NounP VPP EOS
+1 S1 Conj 3PS Conj NP Does EOS
+1 S1 Conj NP Conj NP VP EOS
+1 S1 NP Passive With EOS
+1 S1 NP Conj NP VerbBase Adv EOS
+1 S1 NP VP Pause NP EOS
+1 S1 NP Pause NPP Pause Is NP EOS
+5 VP VerbT NP
+5 VP VerbTPS NP
+5 VP VerbT Adj
+5 VP VerbPastTense NP
+5 VP Modal VerbBase NP
+5 VP Modal VerbBase PP
+5 VP Modal VerbBase PresentParticiples
+5 VP Modal Have Been PresentParticiples
+5 VP Modal VerbBase PresentParticiples With
+5 VP Modal VerbBase PastParticiple NP
+5 VP Modal VerbBase PastParticiple
+5 VP Modal VerbBase PastParticiple With
+5 VP Is Loc
+5 VPP VerbBase NPP
+5 VPP VerbBase Adj
+5 VPP VerbPastTense NPP
+5 VPP Modal VerbBase NPP
+5 VPP Modal Have Been PresentParticiples
+1 VPP Modal VerbBase PastParticiple NPP
+5 VPP Are Loc
+20 3PS NP VerbTPS
+5 3PS NP VerbPastTense
+5 3PP NPP VerbBase
+5 3PP NPP VerbPastTense
+1 Any3P 3PS
+1 Any3P 3PP
+1 Passive Is PastParticiple
+1 Passive Was PastParticiple
+1 Passive Modal Have Been PastParticiple
+1 PassiveP Are PastParticiple
+1 PassiveP Were PastParticiple
+1 Q Do NPP VerbBase
+1 Q Do NPP VerbBase AnyNP
+1 Q Do NPP VerbBase AnyVP
+1 Q Do NPP VerbBase PersonalPronoun VerbBase
+1 Q Does NP VerbBase
+1 Q Does NP VerbBase AnyVP
+1 Q Does NP VerbBase AnyNP
+1 Q Does NP VerbBase PersonalPronoun VerbBase
+5 Q Are PersonalPronounP PresentParticiples NP Travel
+1 Q Are PersonalPronounP PresentParticiples NP TravelPT
+1 With Prep AnyNP
+20 NP Det Nbar
+20 NP Det Noun
+5 NP Num Noun
+5 NP Adj Noun
+5 NP Det Adj Noun
+1 NP Proper
+1 NP Det Proper
+1 NP Det PNP
+10 NP Nbar
+20 NP Det Adj Nbar
+1 NP Proper Conj Proper
+1 NP It Is NP Who
+20 Nbar Noun
+1 Nbar Nbar PP
+20 NPP DetP NbarP
+20 NPP NbarP
+20 NPP Num NbarP
+1 NPP They Are NbarP Who
+20 NbarP NounP
+1 NbarP NbarP PP
+20 PPP PresentParticiples To Places
+1 PPP PresentParticiples To VerbBase
+1 PPP PresentParticiples Adv To VerbBase
+5 Travel TravelVerb To Places
+5 TravelPT TravelVerbPT To Places
+1 Loc Prep Det Places
+1 Loc Prep Det PNP
+1 PP Prep NP
+1 PP Prep NPP
+1 PP Prep PersonalPronoun
+1 PP Prep PersonalPronounP
+1 AnyVP VP
+1 AnyVP VPP
+1 AnyNP NP
+1 AnyNP NPP
Oops, something went wrong.

0 comments on commit b65f622

Please sign in to comment.