# ASER-like normalization example

This is an example showing the ASER-like normalization, which conducts ASER-like personal word normalization on a piece of free-text.

In [1]:
import normalization
from corenlp_utils import *

In [2]:
nmlzer = normalization.ParsingBasedNormalizer()

##### Start stanfordcorenlp server

First, we need to start the stanford-corenlp server for parsing texts. You may run `start_corenlp_server.sh` to start the stanford-core-nlp server.

After it successfully started, you may run the following code to process sentences.  
Make sure the paths and ports below and in your `start_corenlp_server.sh` are the same.

In [3]:
# stanford-corenlp path and port 
# for instance, download the stanfordcorenlp from https://nlp.stanford.edu/software/stanford-corenlp-latest.zip and extract,
# the path can set to the extracted folder (/path/to/stanford-corenlp-4.4.0)
STANFORD_CORENLP_PATH = "/home/ubuntu/stanfordcorenlp/stanford-corenlp-4.4.0"
STANFORD_CORENLP_PORT = 10086

In [4]:
annotators = ["tokenize", "ssplit", "pos", "lemma", "ner", "parse"]
# annotators = ["tokenize", "ssplit", "pos", "lemma", "ner", "parse", "coref"]
corenlp_client, _ = get_corenlp_client(
    corenlp_path=STANFORD_CORENLP_PATH, corenlp_port=STANFORD_CORENLP_PORT, annotators=annotators
)



In [5]:
doc = "He knows his weakness."
res = parse_sentence(doc, corenlp_client, annotators)
parsed_result = res['parsed_info']

In [6]:
parsed_result

[{'text': 'He knows his weakness.',
  'words': ['he', 'know', 'he', 'weakness', '.'],
  'pos_tags': ['PRP', 'VBZ', 'PRP$', 'NN', '.'],
  'dependencies': [((1, 'knows', 'VBZ'), 'nsubj', (0, 'He', 'PRP')),
   ((1, 'knows', 'VBZ'), 'obj', (3, 'weakness', 'NN')),
   ((1, 'knows', 'VBZ'), 'punct', (4, '.', '.')),
   ((3, 'weakness', 'NN'), 'nmod:poss', (2, 'his', 'PRP$'))],
  'lemmas': ['he', 'know', 'he', 'weakness', '.'],
  'ners': ['O', 'O', 'O', 'O', 'O'],
  'mentions': {},
  'parse': '(ROOT (S (NP (PRP He)) (VP (VBZ knows) (NP (PRP$ his) (NN weakness))) (. .)))'}]

Finally, lets run the normalization.  
The returning `person_spans` contains all the spans identified as person/people.  
The `coref` contains the coreference relationships between different personal spans.  
For instance, in the following example, there are two mentions for `P0`, i.e. "he" at (0, 1) and "his" at (2, 3).

In [7]:
for info in parsed_result:
    person_spans = nmlzer.get_personal_words(info)
    coref = nmlzer.node_person_coref(person_spans, info)
    print(info['text'])
    print('person mentions:', person_spans)
    print('coreference:', coref)
    print()

He knows his weakness.
person mentions: [(((0, 1), {'target': [0], 'target_word': ['he']}, ['he']), 'default'), (((2, 3), {'target': [2], 'target_word': ['he']}, ['he']), 'possessive')]
coreference: {'persons': {'P0': [(((0, 1), {'target': [0], 'target_word': ['he']}, ['he']), 'default'), (((2, 3), {'target': [2], 'target_word': ['he']}, ['he']), 'possessive')]}, 'subset': []}

