# Installing Spacy

In [1]:
!pip install spacy --upgrade



## Downloading spacy en and spacy pt

### Model naming conventions
In general, spaCy expects all model packages to follow the naming convention of [lang]_[name]. For the provided pipelines, spacy divided the name into three components:<br>

- type: Model capabilities:
  - core: a general-purpose model with tagging, parsing, lemmatization and named entity recognition
  - dep: only tagging, parsing and lemmatization
  - ent: only named entity recognition
  - sent: only sentence segmentation
- genre: Type of text the model is trained on (e.g. web for web text, news for news text)
  - size: Model size indicator:
  - sm: no word vectors
  - md: reduced word vector table with 20k unique vectors for ~500k words
  - lg: large word vector table with ~500k entries

For example, en_core_web_md is a medium-sized English model trained on written web text (blogs, news, comments), that includes a tagger, a dependency parser, a lemmatizer, a named entity recognizer and a word vector table with 20k unique vectors.

The models are available in the: https://spacy.io/usage/models or https://github.com/explosion/spacy-models <br>

To pt:
  pt_core_news_sm, pt_core_news_md, pt_core_news_lg

In [3]:
!python -m spacy download pt_core_news_sm

Collecting pt-core-news-sm==3.6.0
  Downloading https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.6.0/pt_core_news_sm-3.6.0-py3-none-any.whl (13.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pt-core-news-sm
Successfully installed pt-core-news-sm-3.6.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('pt_core_news_sm')


# Importing libraries

In [2]:
import spacy

In [6]:
nlp_model = spacy.load('pt_core_news_sm')

## NER (Named Entity Recognition)

- Siglas: https://spacy.io/api/annotation#named-entities

### Functions

In [28]:
def get_entities(model, text, verbose=True, display_image=True):

  """

    USING A NER MODEL TO RECOGNIZE ENTITIES IN A TEXT

    # Arguments
      model                 - Required: Model to recognize the text (Model Spacy)
      text                  - Required: Text to regonize (String)

    # Returns
      entities              - Required: Entities recognized (List)

  """

  if isinstance(text, str):

    # GETTING THE ENTITIES
    doc = nlp_model(text)
    entities = [{"TEXT": ent.text, "LABEL": ent.label_} for ent in doc.ents]

    if verbose:
      print("{} ENTITIES RECOGNIZED\n".format(len(entities)))
      for ent in entities:
        print(ent["TEXT"], ent["LABEL"], sep=" - ")

    if display_image:
      spacy.displacy.render(doc,
                            style='ent',
                            jupyter=True)

    return entities

### Examples

In [7]:
text = 'A IBM é uma empresa dos Estados Unidos voltada para a área de informática. Sua sede no Brasil fica em São Paulo e a receita em 2018 foi de aproximadamente 320 bilhões de reais'

In [29]:
entities = get_entities(model=nlp_model, text=text, verbose=True)

4 ENTITIES RECOGNIZED

IBM - ORG
Estados Unidos - LOC
Brasil - LOC
São Paulo - LOC


In [30]:
entities

[{'TEXT': 'IBM', 'LABEL': 'ORG'},
 {'TEXT': 'Estados Unidos', 'LABEL': 'LOC'},
 {'TEXT': 'Brasil', 'LABEL': 'LOC'},
 {'TEXT': 'São Paulo', 'LABEL': 'LOC'}]