# Using LX-NER to make a quantitative analysis of a text
This is an example notebook that illustrates how you can use the LX-NER web service to
analyse a text.

**Before you run this example**, replace `access_key_goes_here` by your webservice access key, below:

In [1]:
LXNER_WS_API_KEY = 'access_key_goes_here'
LXNER_WS_API_URL = 'https://portulanclarin.net/workbench/lx-ner/api/'

## Importing required Python modules
The next cell will take care of installing the `requests` package,
if not already installed, and make it available to use in this notebook.

In [2]:
try:
    import requests
except:
    !pip3 install requests
    import requests
from IPython.display import HTML, display_html

## Wrapping the complexities of the JSON-RPC API in a simple, easy to use function

The `WSException` class defined below, will be used later to identify errors
from the webservice.

In [3]:
class WSException(Exception):
    'Webservice Exception'
    def __init__(self, errordata):
        "errordata is a dict returned by the webservice with details about the error"
        super().__init__(self)
        assert isinstance(errordata, dict)
        self.message = errordata["message"]
        # see https://json-rpc.readthedocs.io/en/latest/exceptions.html for more info
        # about JSON-RPC error codes
        if -32099 <= errordata["code"] <= -32000:  # Server Error
            if errordata["data"]["type"] == "WebServiceException":
                self.message += f": {errordata['data']['message']}"
            else:
                self.message += f": {errordata['data']!r}"
    def __str__(self):
        return self.message

The next function invoques the LX-Suite webservice through it's public JSON-RPC API.

In [4]:
def recognize(text, format):
    '''
    Arguments
        text: a string with a maximum of 4000 characters, Portuguese text, with
             the input to be processed
        format: either "tagged" or "JSON"

    Returns a string or JSON object with the output according to specification in
       https://portulanclarin.net/workbench/lx-ner/
    
    Raises a WSException if an error occurs.
    '''

    request_data = {
        'method': 'recognize',
        'jsonrpc': '2.0',
        'id': 0,
        'params': {
            'text': text,
            'format': format,
            'key': LXNER_WS_API_KEY,
        },
    }
    request = requests.post(LXNER_WS_API_URL, json=request_data)
    response_data = request.json()
    if "error" in response_data:
        raise WSException(response_data["error"])
    else:
        return response_data["result"]

## Highlighting recognized entities
Let's define a function to pretty print a text with recognized named entities highlighted:

In [5]:
def print_text_with_nes(paragraphs):
    html = ["<div class=\"ner-output\">"]
    for paragraph in paragraphs:
        html.append("<p>")
        for sentence in paragraph:
            html.append("<span class=\"sentence\">")
            within_ne = False
            within_ne_rb = False
            for token in sentence:
                # ne = named entity recognized with statistical recognizer
                # ne_rb = named entity recognized with rule-based recognizer
                ne, ne_rb = token["ne"], token["ne_rb"]
                if within_ne and not ne.startswith("I-"):
                    # close previous named entity
                    html.append("</span>")
                if within_ne_rb and not ne_rb.startswith("I-"):
                    # close previous rule-based named entity
                    html.append("</span>")
                if ne.startswith("B-"):
                    html.append(f'<span class="ne {ne[2:].lower()}">')
                    within_ne = True
                if ne_rb.startswith("B-"):
                    html.append(f'<span class="ne {ne_rb[2:].lower()}">')
                    within_ne_rb = True
                html.append(token["form"])
                if "R" in token["space"]:
                    html.append(" ")
            if within_ne:
                html.append("</span>")
            if within_ne_rb:
                html.append("</span>")
            html.append("</span>")
        html.append("</p>")
    display_html(HTML("".join(html)))

Le's define a set of CSS rules for color-coding recognized named entities:

In [6]:
display_html(HTML("""<style>
.ne {
    color: #000;
    background-color: #eee;
    margin: 3px;
    padding: 3px 5px;
    border-radius: 3px;
    font-weight: bold;
}
.ne.numex { color: brown; }
.ne.measex { color: blue; }
.ne.timex { color: green; }
.ne.addrex { color: red; }
.ne.per { color: brown; }
.ne.org { color: blue; }
.ne.loc { color: green; }
.ne.evt { color: red; }
.ne.wrk { color: purple; }
.ne.msc { color: orchid; }
.reference {
    float: left;
    padding: 16px;
    border: 1px dotted #aaa;
}
</style>
"""))

The next function will print a reference for the color-coded higlighting of named entities:

In [7]:
def print_color_reference():
    display_html(HTML("""
    <p class="reference">Color coding for recognized named entities:
    <span class="ne numex">number</span>
    <span class="ne measex">measure</span>
    <span class="ne timex">time</span>
    <span class="ne addrex">address</span>
    <span class="ne per">person</span>
    <span class="ne org">organization</span>
    <span class="ne loc">location</span>
    <span class="ne evt">event</span>
    <span class="ne wrk">work</span>
    <span class="ne msc">miscellaneous</span>
    </p>
    """))

print_color_reference()

Next, we will use the functions we defined above for recognizing named entites, pretty-printing them as HTML and finally we also print a reference for the color-coded highlighting:

In [8]:
text = '''
A final do Campeonato Europeu de Futebol de 2016 realizou-se em 10 de julho de 2016 no Stade de France
em Saint-Denis, França. Foi disputada entre Portugal e a França, que era a equipa anfitriã. Os portugueses
ganharam a partida e sagraram-se campeões europeus de futebol. Esta foi a segunda participação numa final
deste campeonato para Portugal e a terceira para a França. Os portugueses haviam participado anteriormente
nas edições de 1984 e em todas as edições desde 1996. O seu melhor resultado anterior foi em 2004, com o
título de vice-campeão. Já os franceses participaram em 1960, 1984 e em todas as edições desde 1992,
tendo-se sagrado campeões nas edições de 1984 e de 2000.
'''
result = recognize(text, format="JSON")
print_text_with_nes(result)
print_color_reference()

## Getting the status of a webservice access key

In [9]:
def get_key_status():
    '''Returns a string with the detailed status of the webservice access key'''
    
    request_data = {
        'method': 'key_status',
        'jsonrpc': '2.0',
        'id': 0,
        'params': {
            'key': LXNER_WS_API_KEY,
        },
    }
    request = requests.post(LXNER_WS_API_URL, json=request_data)
    response_data = request.json()
    if "error" in response_data:
        raise WSException(response_data["error"])
    else:
        return response_data["result"]

In [10]:
get_key_status()

{'requests_remaining': 99999982,
 'chars_remaining': 999989426,
 'expiry': '2030-01-10T00:00+00:00'}