Skip to content

Commit

Permalink
Merge pull request #25 from philip-schrodt/master
Browse files Browse the repository at this point in the history
added null_verbs/actors options to config.ini and updated documentati…
  • Loading branch information
Philip Schrodt committed Jul 30, 2016
2 parents f4ca89e + e83d97a commit 3589e15
Show file tree
Hide file tree
Showing 5 changed files with 26 additions and 14 deletions.
Binary file modified Petrarch2.pdf
Binary file not shown.
7 changes: 4 additions & 3 deletions Petrarch2.tex
Original file line number Diff line number Diff line change
Expand Up @@ -423,8 +423,9 @@ \subsection{\texttt{--nullverbs} and \texttt{--nullactors}: Identifying verb and
These are complementary command line options intended for use in dictionary development:

\begin{itemize}
\item \texttt{--nullverbs} (or \texttt{-nv}) produces a file of verb phrases which have a valid source and target but which are not in the dictionary
\item \texttt{--nullactors} (or \texttt{-na}) produces a file of noun phrases which are associated with a codable verb phrase but which are not in the dictionary
\item \texttt{--nullverbs} (or \texttt{-nv}) produces a file of verb phrases which have a valid source and target but which are not in the dictionary. This can also be invoked using the option \texttt{null\_verbs = True} in the \textit{config.ini} file.
\item \texttt{--nullactors} (or \texttt{-na}) produces a file of noun phrases which are associated with a codable verb phrase but which are not in the dictionary. This can also be invoked using the option \texttt{null\_actors = True} and setting \texttt{new\_actor\_length} \textgreater 0 in the \textit{config.ini} file.

\end{itemize}

In other words, \texttt{--nullverbs} shows unrecorded actions that are occurring between known actors, and \texttt{--nullactors} shows unrecorded actors that are engaging in known behaviors. With both options, no events are generated and instead the output file is a series of JSON records which can be read as Python dictionaries: examples are given below.
Expand Down Expand Up @@ -497,7 +498,7 @@ \subsubsection{\texttt{--nullactors}}
\noindent \textbf{Implementation notes:}
\begin{itemize}
\setlength{\itemsep}{4pt}
\item The required integer following the command sets the \texttt{new\_actor\_length} \\parameter---the maximum number of words that a new actor can have---and overrides any settings in the \texttt{config.ini} file. Large values of this will give add a lot of extended noun phrases that probably aren't named-entities into the output; small values run the risk that a named entity preceded by several adjectives---e.g. \textit{``The beleaguered Japanese Prime Minister Shinzo Abe''}---will be missed. So, adjust this value based on your source texts, experience, and post-filtering programs.
\item If set in the command line, the required integer following the command sets the \texttt{new\_actor\_length} parameter---the maximum number of words that a new actor can have---and overrides any settings in the \texttt{config.ini} file (if set in the \texttt{config.ini} file, that value will be used). Large values of this will give add a lot of extended noun phrases that probably aren't named-entities into the output; small values run the risk that a named entity preceded by several adjectives---e.g. \textit{``The beleaguered Japanese Prime Minister Shinzo Abe''}---will be missed. So, adjust this value based on your source texts, experience, and post-filtering programs: values of 4 to 8 will usually work for typical English-language names.
\item If a source or target is in the dictionary, the phrase will be followed by the code in brackets \texttt{[\ldots]}
\item \texttt{"evtcode"} is the CAMEO code for the verb phrase; \texttt{"evttext"} is the text which generated this code.
\item The actors not in the dictionary are assigned temporary codes of the form \texttt{"*i*"} where $i$ is an integer. These will show up in agent code such as \texttt{"*4*GOV"} (instead of \texttt{"- - -GOV"}) and in the source and target markers in the verb text, e.g. \texttt{"<*2*> ... contradicted by <*3*>"}. See Section \ref{sssec:writeeventtext} on the construction of event texts.
Expand Down
2 changes: 1 addition & 1 deletion petrarch2/PETRglobals.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,10 @@
# NULL CODING OPTIONS
NullVerbs = False # Only get verb phrases that are not in the dictionary but are associated with coded noun phrases
NullActors = False # Only get actor phrases that are not in the dictionary but associated with coded verb phrases
NewActorLength = 0 # Maximum length for new actors extracted from noun phrases

# CODING OPTIONS
# Defaults are more or less equivalent to TABARI
NewActorLength = 0 # Maximum length for new actors extracted from noun phrases
RequireDyad = True # Events require a non-null source and target
StoponError = False # Raise stop exception on errors rather than recovering

Expand Down
4 changes: 3 additions & 1 deletion petrarch2/PETRreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,10 +179,12 @@ def get_config_boolean(optname):
raise
print("new_actor_length =", PETRglobals.NewActorLength)

PETRglobals.StoponError = get_config_boolean('stop_on_error')
PETRglobals.StoponError = get_config_boolean('stop_on_error')
PETRglobals.WriteActorRoot = get_config_boolean('write_actor_root')
PETRglobals.WriteActorText = get_config_boolean('write_actor_text')
PETRglobals.WriteEventText = get_config_boolean('write_event_text')
PETRglobals.NullVerbs = get_config_boolean('null_verbs')
PETRglobals.NullActors = get_config_boolean('null_actors')

if parser.has_option(
'Options', 'require_dyad'): # this one defaults to True
Expand Down
27 changes: 18 additions & 9 deletions petrarch2/data/config/PETR_config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,6 @@ code_by_sentence = True
# CODING OPTIONS:
# Defaults are more or less equivalent to TABARI

# new_actor_length: Maximum length for new actors extracted from noun phrases if no
# actor or agent generating a code is found. To disable and just
# use null codes "---", set to zero; this is the default.
# Setting this to a large number will extract anything found in a (NP
# noun phrase, though usually true actors contain a small number of words
# This must be an integer.
new_actor_length = 0


# write_actor_root: If True, the event record will include the text of the actor root:
# The root is the text at the head of the actor synonym set in the
# dictionary. Default is False
Expand All @@ -81,6 +72,24 @@ write_actor_text = True
# the verb phrase that was used to identify the event. Default is False
write_event_text = True

# NULL CODING OPTIONS
# null_verbs: If True, only get verb phrases that are not in the dictionary but are associated
# with coded noun phrases
null_verbs = False

# null_actors: If True, only get actor phrases that are not in the dictionary but associated with
# coded verb phrases. This also requires new_actor_length to be set to a value > 0:
# typically a value of 4 to 8 will give good results.
null_actors = False

# new_actor_length: Maximum length for new actors extracted from noun phrases if no
# actor or agent generating a code is found. To disable and just
# use null codes "---", set to zero; this is the default.
# Setting this to a large number will extract anything found in a (NP
# noun phrase, though usually true actors contain a small number of words
# This must be an integer.
new_actor_length = 0

# require_dyad: Events require a non-null source and target: setting this false is likely
# to result in a very large number of nonsense events. As happened with the
# infamous GDELT data set of 2013-2014. And certainly no one wants to see
Expand Down

0 comments on commit 3589e15

Please sign in to comment.