# News Agencies Recognition and Linking with Impresso BERT models

Delivering swift and reliable news since the 1830s and 1840s, news agencies have played a pivotal role both nationally and internationally. However, understanding their precise impact on shaping news content has remained somewhat elusive. Our goal is to illuminate this aspect by identifying news agencies within historical newspaper articles. Using data from newspapers in Switzerland and Luxembourg as part of the impresso project, we've trained our pipeline to recognize these entities. 

If you're here, you likely seek to detect news agency entities in your own text. This notebook will guide you through the process of setting up a workflow to identify specific newspaper or agency mentions within your text.

Install necessary libraries (if not already installed) and 
download the necessary NLTK data.

In [None]:
!pip install python-dotenv
!pip install transformers
!pip install torch

*Note: This notebook requires `HF_TOKEN` to be set in the environment variables. You can get your token by signing up on the [Hugging Face website](https://huggingface.co/join) and read more in the [official documentation](https://huggingface.co/docs/huggingface_hub/v0.20.2/en/quick-start#environment-variable). We use [dotenv](https://pypi.org/project/python-dotenv/) library to load the HF_TOKEN value from a local .env file*

In [None]:
from dotenv import load_dotenv
load_dotenv()  # take environment variables from .env.

Now the fun part, this function will download the requried model and gives you the keys to successfullly detect news agencies in your text. 

In [None]:
from transformers import is_torch_available
from transformers import pipeline

# Check if PyTorch is available
print(is_torch_available())

# Named Entity Recognition pipeline
nlp = pipeline("newsagency-ner", model="impresso-project/bert-newsagency-ner-fr", trust_remote_code=True)

Run the example below to see how it works.

In [5]:
# Example
text = "Mon nom est François et j'habite à Paris. (Reuter)"
nlp(text)

[{'entity': 'org.ent.pressagency.Reuters',
  'score': 0.98180604,
  'index': 12,
  'word': 'Reuter',
  'start': 43,
  'end': 49}]