# Named Entity Recognition

In this workshop, we are going to learn how to transform large amounts of text into a database using Named Entity Recognition (NER). NER can computationally identify people, places, laws, events, dates, and other elements in a text or collection of texts.

## What is Named Entity Recognition?
*Explanation borrowed from Melanie Walsh's [Introduction to Cultural Analytics & Python](https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/12-Named-Entity-Recognition.html)*
</br>
</br>
Named Entity Recognition is a fundamental task in the field of natural language processing (NLP). NLP is an interdisciplinary field that blends linguistics, statistics, and computer science. The heart of NLP is to understand human language with statistics and computers. Applications of NLP are all around us. Have you ever heard of a little thing called spellcheck? How about autocomplete, Google translate, chat bots, or Siri? These are all examples of NLP in action!

Thanks to recent advances in machine learning and to increasing amounts of available text data on the web, NLP has grown by leaps and bounds in the last decade. NLP models that generate texts and images are now getting eerily good.

Open-source NLP tools are getting very good, too. We’re going to use one of these open-source tools, the Python library spaCy, for our Named Entity Recognition tasks in this lesson.

## What is spaCy?
In this workshop, we are using the spaCy library to run the NER. SpaCy relies on machine learning models that were trained on a large amount of carefully-labeled texts. These texts were, in fact, often labeled and corrected by hand. The English-language spaCy model that we’re going to use in this lesson was trained on an annotated corpus called “OntoNotes”: 2 million+ words drawn from “news, broadcast, talk shows, weblogs, usenet newsgroups, and conversational telephone speech,” which were meticulously tagged by a group of researchers and professionals for people’s names and places, for nouns and verbs, for subjects and objects, and much more. Like a lot of other major machine learning projects, OntoNotes was also sponsored by the Defense Advaced Research Projects Agency (DARPA), the branch of the Defense Department that develops technology for the U.S. military.

When spaCy identifies people and places in a text or collection of text, the NLP model is actually making predictions about the text based on what it has learned about how people and places function in English-language sentences.

### spaCy Named Entities
Below is a Named Entities chart for English-language spaCy taken from [its website](https://spacy.io/api/annotation#named-entities). This chart shows the different named entities that spaCy can identify as well as their corresponding type labels.

|Type Label|Description|
|:---:|:---:|
|PERSON|People, including fictional.|
|NORP|Nationalities or religious or political groups.|
|FAC|Buildings, airports, highways, bridges, etc.|
|ORG|Companies, agencies, institutions, etc.|
|GPE|Countries, cities, states.|
|LOC|Non-GPE locations, mountain ranges, bodies of water.|
|PRODUCT|Objects, vehicles, foods, etc. (Not services.)|
|EVENT|Named hurricanes, battles, wars, sports events, etc.|
|WORK_OF_ART|Titles of books, songs, etc.|
|LAW|Named documents made into laws.|
|LANGUAGE|Any named language.|
|DATE|Absolute or relative dates or periods.|
|TIME|Times smaller than a day.|
|PERCENT|Percentage, including ”%“.|
|MONEY|Monetary values, including unit.|
|QUANTITY|Measurements, as of weight or distance.|
|ORDINAL|“first”, “second”, etc.|
|CARDINAL|Numerals that do not fall under another type.|


### Install spaCy:

In [73]:
# !pip install -U spacy

### Download the spaCy Language Model
Next we need to download the English-language model (en_core_web_sm), which will be processing and making predictions about our texts. This is the model that was trained on the annotated “OntoNotes” corpus. You can download the en_core_web_sm model by running the cell below:

In [74]:
# !python -m spacy download en_core_web_sm

*Note: spaCy offers models for other languages including Chinese, German, French, Spanish, Portuguese, Russian, Italian, Dutch, Greek, Norwegian, and Lithuanian.*

*spaCy offers language and tokenization support for other language via external dependencies — such as PyviKonlpy for Korean*

## Import all relevant libraries for collecting data and processing the NER

We will import:
- Spacy and displacy to run the NER and visualize our results
- en_core_web_sm to import the spaCy language model
- Pandas library for organizing and displaying data (we’re also changing the pandas default max row and column width display setting)
- Glob and pathlib to connect to folders on our operating system
- Requests to get data from an API and also to web scrape
- PPrint to make our JSON results readable
- Beautiful Soup to make our HTML results readable


In [75]:
import spacy
from spacy import displacy
import en_core_web_sm
from collections import Counter
import pandas as pd
pd.options.display.max_rows = 600
pd.options.display.max_colwidth = 400
import glob
from pathlib import Path
import requests
import pprint
from bs4 import BeautifulSoup

## Load the spaCy language model

In [76]:
nlp = en_core_web_sm.load()

## Collect your Data: Combining APIs and Web Scraping

In this workshop, we are going to collect data from news articles in two ways. First, by using connect to the NewsAPI and gathering a collection of URLs related to a specific news topic. Next, by web scraping those URLs to save the articles as text files. For detailed instructions on working with the NewsAPI, please refer to this ["Working with APIs" tutorial](https://gist.github.com/rskrisel/4ff9629df9f9d6bf5a638b8ba6c13a68) and for detailed instructions on how to web scrape a list of URLs please refer to the ["Web Scraping Media URLs in Python"](https://github.com/rskrisel/web_scraping_workshop) tutorial. 

### Install the News API

In [77]:
# !pip install newsapi-python

### Store your secret key

In [78]:
secret= '123456789'

### Define your endpoint

In [79]:
url = 'https://newsapi.org/v2/everything?'

### Define your query parameters

In [80]:
parameters = {
    'q': 'Ukraine',
    'searchIn':'title',
    'pageSize': 20, 
    'language' : 'en',
    'apiKey': secret 
    }

### Make your data request

In [81]:
response = requests.get(url, params=parameters)

### Visualize your JSON results

In [82]:
response_json = response.json()
pprint.pprint(response_json)

{'articles': [{'author': 'https://www.facebook.com/bbcnews',
               'content': 'The battle for Bakhmut, the Ukrainian city which '
                          'Russia has spent months trying to capture, is '
                          '"stabilising", says Ukraine\'s commander in '
                          'chief.\r\n'
                          'Earlier this month, Western officials estimated '
                          'betwe… [+2634 chars]',
               'description': 'Despite Russia spending months trying to take '
                              'the city, Ukrainian officials say they are '
                              'holding on.',
               'publishedAt': '2023-03-24T22:46:31Z',
               'source': {'id': 'bbc-news', 'name': 'BBC News'},
               'title': "Ukraine war: Battle for Bakhmut 'stabilising' - "
                        'Ukraine commander',
               'url': 'https://www.bbc.co.uk/news/world-europe-65072173',
               'urlToImage': 'http

### Check what keys exist in your JSON data

In [83]:
response_json.keys()

dict_keys(['status', 'totalResults', 'articles'])

### See the data stored in each key

In [84]:
print(response_json['status'])
print(response_json['totalResults'])
print(response_json['articles'])

ok
8165


### Check the datatype for each key

In [85]:
print(type(response_json['status']))
print(type(response_json['totalResults']))
print(type(response_json['articles']))

<class 'str'>
<class 'int'>
<class 'list'>


### Make sure the list reads as a dictionary

In [86]:
type(response_json['articles'][0])

dict

### Convert the JSON key into a Pandas Dataframe

In [87]:
df_articles = pd.DataFrame(response_json['articles'])
df_articles

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Battle for Bakhmut 'stabilising' - Ukraine commander,"Despite Russia spending months trying to take the city, Ukrainian officials say they are holding on.",https://www.bbc.co.uk/news/world-europe-65072173,https://ichef.bbci.co.uk/news/1024/branded_news/97F8/production/_129140983_bakhmut.jpg,2023-03-24T22:46:31Z,"The battle for Bakhmut, the Ukrainian city which Russia has spent months trying to capture, is ""stabilising"", says Ukraine's commander in chief.\r\nEarlier this month, Western officials estimated betwe… [+2634 chars]"
1,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Russia Ukraine invasion: Indians are braving war to study in Ukraine,"More than 1,000 students have returned to finish degrees after fleeing fighting a year ago.",https://www.bbc.co.uk/news/world-asia-india-64973574,https://ichef.bbci.co.uk/news/1024/branded_news/17DAE/production/_129001779_vaishalisethiawhoispursuinghersecond-yearmbbsisoneofthem.aresidentoffaridabadneardelhiwemetherinsidehercampusinwesternukrainesternopilnationalmedicaluniversity-tnmu.jpg,2023-03-17T00:32:49Z,"Rishi Dwivedi left India last autumn and is back studying medicine in Ukraine, despite the threat the war poses to his safety\r\n""Air raid sirens alerting us about incoming missile or drone attacks go … [+5813 chars]"
2,"{'id': None, 'name': 'Gizmodo.com'}",Kyle Barr,Musk's Twitter Downranks Any Post Regarding the 'Ukraine Crisis',"“Free speech absolutist” Elon Musk’s Twitter algorithm has been heavily suppressing any topic regarding the ongoing war in Ukraine, downranking it with the same vehemence as toxic, violent, or hate content on the platform, according to an analysis of the Twit…",https://gizmodo.com/twitter-musk-ukraine-crisis-open-source-code-russia-1850293386,"https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/970f223b96fcf9300bbe2f830123154b.jpg",2023-04-03T13:57:00Z,"Free speech absolutist Elon Musks Twitter algorithm has been heavily suppressing any topic regarding the ongoing war in Ukraine, downranking it with the same vehemence as toxic, violent, or hate cont… [+3995 chars]"
3,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,No Ukraine offensive without more weapons – Zelensky,There has been talk for some weeks of Ukraine launching a spring offensive against Russian forces.,https://www.bbc.co.uk/news/world-europe-65075952,https://static.files.bbci.co.uk/ws/simorgh-assets/public/news/images/metadata/poster-1024x576.png,2023-03-25T16:43:09Z,President Volodymyr Zelensky has said Ukraine's counter-offensive against Russia cannot start until Western allies send more military support. \r\nHe told a Japanese newspaper he would not send his tro… [+3006 chars]
4,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Russia and Ukraine extend grain deal despite disagreement,"The renewed accord means exports can continue via Ukraine's Black Sea ports, but it is unclear how long for.",https://www.bbc.co.uk/news/world-europe-65000324,https://ichef.bbci.co.uk/news/1024/branded_news/E8AA/production/_129026595_09cf6f3f3f1f20454208fa07afec1c54868e87b50_0_1981_13211000x667.jpg,2023-03-18T19:31:15Z,"A deal allowing Ukraine to export millions of tonnes of grain through the Black Sea despite the ongoing conflict with Russia has been extended.\r\nBut it is unclear how long it will last, with Ukraine … [+1865 chars]"
5,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,"Ukraine war: Drone downed over Russia, Moscow says",Kyiv has dismissed previous claims that it has attacked Russian civilian targets with explosive drones.,https://www.bbc.co.uk/news/world-europe-65082747,https://ichef.bbci.co.uk/news/1024/branded_news/8269/production/_129158333_russiadefenceminister.png,2023-03-26T21:27:49Z,"A Ukrainian drone has been shot down by Russian air defences, Moscow says. \r\nThe drone came down in the town of Kireyevsk in the Tula region - some 400km from the border with Ukraine - on Sunday, the… [+1582 chars]"
6,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Life in Mariupol under Russian occupation,"Russia captured Mariupol in May after a brutal siege. Now, it's trying to win hearts and minds.",https://www.bbc.co.uk/news/world-europe-64887890,https://ichef.bbci.co.uk/news/1024/branded_news/1102A/production/_128947696_capture.png,2023-03-13T00:03:15Z,"Finding people willing to speak to me from Mariupol was never going to be easy. \r\nAfter 10 months of Russian occupation, fear and distrust are the two most frequent responses I encountered when looki… [+7901 chars]"
7,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Russian strike on eastern city Kostyantynivka 'kills six',Ukraine says six civilians died when Russian missiles crashed into homes in Kostyantynivka.,https://www.bbc.co.uk/news/world-europe-65155072,https://ichef.bbci.co.uk/news/1024/branded_news/EEA6/production/_129249016_mediaitem129249015.jpg,2023-04-02T11:54:33Z,"Ukraine says heavy Russian shelling has killed at least six civilians in Kostyantynivka, an industrial city near embattled Bakhmut in eastern Ukraine.\r\nMissiles and rockets damaged 16 apartment block… [+2505 chars]"
8,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Heavy losses reported as battle for Bakhmut rages,Both Ukraine and Russia have reported inflicting significant losses in the eastern city.,https://www.bbc.co.uk/news/world-europe-64935449,https://ichef.bbci.co.uk/news/1024/branded_news/15604/production/_128965578_mediaitem128965574.jpg,2023-03-13T00:42:03Z,Ukraine and Russia have reported inflicting heavy losses as the battle for Bakhmut rages on.\r\nMoscow has been trying to take the eastern Ukrainian city for months in a grinding war of attrition.\r\nUkr… [+2526 chars]
9,"{'id': 'cnn', 'name': 'CNN'}","Rob Picheta, Vasco Cotovio, Olga Voitovych","Ukraine eyes an offensive around Bakhmut, as Russian momentum stalls","Russian forces are depleted in Bakhmut and a Ukrainian counter-offensive could soon be launched, one of Kyiv's top generals has said, raising the prospect of an unlikely turnaround in the besieged city.",https://www.cnn.com/2023/03/23/europe/bakhmut-ukraine-counter-offensive-intl/index.html,"https://media.cnn.com/api/v1/images/stellar/prod/230323104647-bakhmut-troop-movement-032223.jpg?c=16x9&q=w_800,c_fill",2023-03-23T20:00:52Z,"Russian forces are depleted in Bakhmut and a Ukrainian counter-offensive could soon be launched, one of Kyivs top generals has said, raising the prospect of an unlikely turnaround in the besieged cit… [+3957 chars]"


### Define a function to web scrape text from the list of URLs in the Dataframe

In [88]:
def scrape_article(url):
    response = requests.get(url)
    response.encoding = 'utf-8'
    html_string = response.text
    return html_string

### Apply the function to the Dataframe and store the results in a new column

In [89]:
df_articles['scraped_text'] = df_articles['url'].apply(scrape_article)

In [90]:
df_articles

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content,scraped_text
0,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Battle for Bakhmut 'stabilising' - Ukraine commander,"Despite Russia spending months trying to take the city, Ukrainian officials say they are holding on.",https://www.bbc.co.uk/news/world-europe-65072173,https://ichef.bbci.co.uk/news/1024/branded_news/97F8/production/_129140983_bakhmut.jpg,2023-03-24T22:46:31Z,"The battle for Bakhmut, the Ukrainian city which Russia has spent months trying to capture, is ""stabilising"", says Ukraine's commander in chief.\r\nEarlier this month, Western officials estimated betwe… [+2634 chars]","<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">Ukraine war: Battle for Bakhmut &#x27;stabilising&#x27;, says commander - BBC News</title><meta data-rh=""true"" name=""description"" content=""Despite Russia spending months trying to take the city, Ukrainian officials say they ar..."
1,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Russia Ukraine invasion: Indians are braving war to study in Ukraine,"More than 1,000 students have returned to finish degrees after fleeing fighting a year ago.",https://www.bbc.co.uk/news/world-asia-india-64973574,https://ichef.bbci.co.uk/news/1024/branded_news/17DAE/production/_129001779_vaishalisethiawhoispursuinghersecond-yearmbbsisoneofthem.aresidentoffaridabadneardelhiwemetherinsidehercampusinwesternukrainesternopilnationalmedicaluniversity-tnmu.jpg,2023-03-17T00:32:49Z,"Rishi Dwivedi left India last autumn and is back studying medicine in Ukraine, despite the threat the war poses to his safety\r\n""Air raid sirens alerting us about incoming missile or drone attacks go … [+5813 chars]","<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">Russia Ukraine invasion: Indians are braving war to study in Ukraine - BBC News</title><meta data-rh=""true"" name=""description"" content=""More than 1,000 students have returned to finish degrees after fleeing fighting a year ago..."
2,"{'id': None, 'name': 'Gizmodo.com'}",Kyle Barr,Musk's Twitter Downranks Any Post Regarding the 'Ukraine Crisis',"“Free speech absolutist” Elon Musk’s Twitter algorithm has been heavily suppressing any topic regarding the ongoing war in Ukraine, downranking it with the same vehemence as toxic, violent, or hate content on the platform, according to an analysis of the Twit…",https://gizmodo.com/twitter-musk-ukraine-crisis-open-source-code-russia-1850293386,"https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/970f223b96fcf9300bbe2f830123154b.jpg",2023-04-03T13:57:00Z,"Free speech absolutist Elon Musks Twitter algorithm has been heavily suppressing any topic regarding the ongoing war in Ukraine, downranking it with the same vehemence as toxic, violent, or hate cont… [+3995 chars]","<!DOCTYPE html><html name=""top"" lang=""en"" style=""scroll-behavior:smooth"" data-reactroot=""""><head><meta name=""google-site-verification"" content=""Sc6OGUq5heWyOLFpCb_2K6mX7vFrWKGANts4QDUmgOY""/><meta name=""google-site-verification"" content=""QDPLbDJXTQNT0n69mvNADCeRmwnbkYyL20OKJAVCKq8""/><meta name=""ir-site-verification-token"" value=""-1270174611""/><meta name=""viewport"" content=""width=device-width, i..."
3,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,No Ukraine offensive without more weapons – Zelensky,There has been talk for some weeks of Ukraine launching a spring offensive against Russian forces.,https://www.bbc.co.uk/news/world-europe-65075952,https://static.files.bbci.co.uk/ws/simorgh-assets/public/news/images/metadata/poster-1024x576.png,2023-03-25T16:43:09Z,President Volodymyr Zelensky has said Ukraine's counter-offensive against Russia cannot start until Western allies send more military support. \r\nHe told a Japanese newspaper he would not send his tro… [+3006 chars],"<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">No Ukraine offensive without more weapons – Zelensky - BBC News</title><meta data-rh=""true"" name=""description"" content=""There has been talk for some weeks of Ukraine launching a spring offensive against Russian forces.""/><meta..."
4,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Russia and Ukraine extend grain deal despite disagreement,"The renewed accord means exports can continue via Ukraine's Black Sea ports, but it is unclear how long for.",https://www.bbc.co.uk/news/world-europe-65000324,https://ichef.bbci.co.uk/news/1024/branded_news/E8AA/production/_129026595_09cf6f3f3f1f20454208fa07afec1c54868e87b50_0_1981_13211000x667.jpg,2023-03-18T19:31:15Z,"A deal allowing Ukraine to export millions of tonnes of grain through the Black Sea despite the ongoing conflict with Russia has been extended.\r\nBut it is unclear how long it will last, with Ukraine … [+1865 chars]","<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">Russia and Ukraine extend grain deal despite disagreement - BBC News</title><meta data-rh=""true"" name=""description"" content=""The renewed accord means exports can continue via Ukraine&#x27;s Black Sea ports, but it is unclear h..."
5,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,"Ukraine war: Drone downed over Russia, Moscow says",Kyiv has dismissed previous claims that it has attacked Russian civilian targets with explosive drones.,https://www.bbc.co.uk/news/world-europe-65082747,https://ichef.bbci.co.uk/news/1024/branded_news/8269/production/_129158333_russiadefenceminister.png,2023-03-26T21:27:49Z,"A Ukrainian drone has been shot down by Russian air defences, Moscow says. \r\nThe drone came down in the town of Kireyevsk in the Tula region - some 400km from the border with Ukraine - on Sunday, the… [+1582 chars]","<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">Ukraine war: Drone downed over Russia, Moscow says - BBC News</title><meta data-rh=""true"" name=""description"" content=""Kyiv has dismissed previous claims that it has attacked Russian civilian targets with explosive drones.""/><m..."
6,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Life in Mariupol under Russian occupation,"Russia captured Mariupol in May after a brutal siege. Now, it's trying to win hearts and minds.",https://www.bbc.co.uk/news/world-europe-64887890,https://ichef.bbci.co.uk/news/1024/branded_news/1102A/production/_128947696_capture.png,2023-03-13T00:03:15Z,"Finding people willing to speak to me from Mariupol was never going to be easy. \r\nAfter 10 months of Russian occupation, fear and distrust are the two most frequent responses I encountered when looki… [+7901 chars]","<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">Ukraine war: Life in Mariupol under Russian occupation - BBC News</title><meta data-rh=""true"" name=""description"" content=""Russia captured Mariupol in May after a brutal siege. Now, it&#x27;s trying to win hearts and minds.""/><..."
7,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Russian strike on eastern city Kostyantynivka 'kills six',Ukraine says six civilians died when Russian missiles crashed into homes in Kostyantynivka.,https://www.bbc.co.uk/news/world-europe-65155072,https://ichef.bbci.co.uk/news/1024/branded_news/EEA6/production/_129249016_mediaitem129249015.jpg,2023-04-02T11:54:33Z,"Ukraine says heavy Russian shelling has killed at least six civilians in Kostyantynivka, an industrial city near embattled Bakhmut in eastern Ukraine.\r\nMissiles and rockets damaged 16 apartment block… [+2505 chars]","<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">Ukraine war: Russian strike on eastern city Kostyantynivka &#x27;kills six&#x27; - BBC News</title><meta data-rh=""true"" name=""description"" content=""Ukraine says six civilians died when Russian missiles crashed into homes in Ko..."
8,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Ukraine war: Heavy losses reported as battle for Bakhmut rages,Both Ukraine and Russia have reported inflicting significant losses in the eastern city.,https://www.bbc.co.uk/news/world-europe-64935449,https://ichef.bbci.co.uk/news/1024/branded_news/15604/production/_128965578_mediaitem128965574.jpg,2023-03-13T00:42:03Z,Ukraine and Russia have reported inflicting heavy losses as the battle for Bakhmut rages on.\r\nMoscow has been trying to take the eastern Ukrainian city for months in a grinding war of attrition.\r\nUkr… [+2526 chars],"<!DOCTYPE html><html lang=""en-GB"" class=""no-js""><head><meta charSet=""utf-8"" /><meta name=""viewport"" content=""width=device-width, initial-scale=1"" /><title data-rh=""true"">Ukraine war: Heavy losses reported as battle for Bakhmut rages - BBC News</title><meta data-rh=""true"" name=""description"" content=""Both Ukraine and Russia have reported inflicting significant losses in the eastern city.""/><meta..."
9,"{'id': 'cnn', 'name': 'CNN'}","Rob Picheta, Vasco Cotovio, Olga Voitovych","Ukraine eyes an offensive around Bakhmut, as Russian momentum stalls","Russian forces are depleted in Bakhmut and a Ukrainian counter-offensive could soon be launched, one of Kyiv's top generals has said, raising the prospect of an unlikely turnaround in the besieged city.",https://www.cnn.com/2023/03/23/europe/bakhmut-ukraine-counter-offensive-intl/index.html,"https://media.cnn.com/api/v1/images/stellar/prod/230323104647-bakhmut-troop-movement-032223.jpg?c=16x9&q=w_800,c_fill",2023-03-23T20:00:52Z,"Russian forces are depleted in Bakhmut and a Ukrainian counter-offensive could soon be launched, one of Kyivs top generals has said, raising the prospect of an unlikely turnaround in the besieged cit… [+3957 chars]","<!DOCTYPE html>\n<html lang=""en"" data-uri=""archive.cms.cnn.com/_pages/h_d6ebb83dacbd746b46f3697203b6188b@published"" data-layout-uri=""archive.cms.cnn.com/_layouts/layout-with-rail/instances/world-article-v1@published"">\n <head><style>body,h1,h2,h3,h4,h5{font-family:cnnsans,Helvetica Neue,Helvetica,Arial,Utkal,sans-serif}h1,h2,h3,h4,h5{font-weight:700}:root{--theme-primary:#cc0000;--theme-bac..."


### Use the Beautiful Soup library to make the scraped html text legible and save each article in a text file
*Note: make sure you create a folder named "files" before running this step*

In [91]:
id = 0
for text in df_articles['scraped_text']:
    soup = BeautifulSoup(text)
    article = soup.get_text()
    
    id += 1
    with open(f"files/{id}.txt", "w") as file:
        file.write(str(article))

### Use glob to connect to the file directory where your articles are saved and store it in a variable

In [92]:
directory = "files"
articles = glob.glob(f"{directory}/*.txt")

### Make sure you have data stores in your files variable

In [93]:
articles

['files/15.txt',
 'files/14.txt',
 'files/16.txt',
 'files/17.txt',
 'files/13.txt',
 'files/12.txt',
 'files/10.txt',
 'files/11.txt',
 'files/9.txt',
 'files/8.txt',
 'files/5.txt',
 'files/4.txt',
 'files/6.txt',
 'files/7.txt',
 'files/3.txt',
 'files/2.txt',
 'files/1.txt',
 'files/20.txt',
 'files/19.txt',
 'files/18.txt']

### Let's run the NER on a single article first

In [94]:
filepath = "files/1.txt"
text = open(filepath, encoding='utf-8').read()
doc = nlp(text)

### Let's use displacy to visualize our results

In [95]:
displacy.render(doc, style="ent")

### Let's see a list of the identified entities

In [96]:
doc.ents

(Ukraine,
 BBC,
 HomepageSkip,
 HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore,
 menuBBC,
 UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn,
 PicturesReality CheckWorld News,
 ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine,
 MarchShareclose,
 TopicsRussia,
 Ukrainian,
 Bakhmut,
 months,
 Kathryn,
 NewsThe,
 Bakhmut,
 Ukrainian,
 Russia,
 months,
 Ukraine,
 Zaluzhnyi,
 Ukrainian,
 Russia,
 Earlier this month,
 Western,
 between 20,000 and 30,000,
 Russian,
 Bakhmut,
 last summer,
 Moscow,
 Bakhmut,
 Russian,
 Ukraine,
 Gen Zaluzhnyi,
 Ukraine,
 Bakhmut,
 Gen Zaluzhnyi,
 UK,
 Tony Radakin,
 Ukraine,
 Ukrainian,
 Bakhmut,
 UK,
 Ministry of Defence,
 Saturday,
 Russia,
 Bakhmut,
 Russian,
 Russia,
 Bakhmut,
 Russia,
 January,
 UK,
 Earlier this week,
 Oleksandr Syrsky,
 Russian,
 Bakhmut,
 Syrsky,
 Russia,
 Bakhmut,
 Ukrainian,
 Volodymyr Zelensky,
 Bakhmut,
 December,
 The Institute for the Study of War,
 Thursd

### Let's add the entity label next to each entity: 

In [97]:
for named_entity in doc.ents:
    print(named_entity, named_entity.label_)

Ukraine GPE
BBC ORG
HomepageSkip ORG
HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore PRODUCT
menuBBC ORG
UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn ORG
PicturesReality CheckWorld News ORG
ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine PERSON
MarchShareclose PERSON
TopicsRussia ORG
Ukrainian GPE
Bakhmut ORG
months DATE
Kathryn PERSON
NewsThe ORG
Bakhmut ORG
Ukrainian NORP
Russia GPE
months DATE
Ukraine GPE
Zaluzhnyi PERSON
Ukrainian NORP
Russia GPE
Earlier this month DATE
Western NORP
between 20,000 and 30,000 CARDINAL
Russian NORP
Bakhmut GPE
last summer DATE
Moscow GPE
Bakhmut ORG
Russian NORP
Ukraine GPE
Gen Zaluzhnyi PERSON
Ukraine GPE
Bakhmut ORG
Gen Zaluzhnyi PERSON
UK GPE
Tony Radakin PERSON
Ukraine GPE
Ukrainian GPE
Bakhmut ORG
UK GPE
Ministry of Defence ORG
Saturday DATE
Russia GPE
Bakhmut ORG
Russian NORP
Russia GPE
Bakhmut ORG
Russia GPE
January DATE
UK GPE
Earlier this week DATE
Oleksandr Syrs

### Let's filter the results to see all entities labelled as "PERSON":

In [98]:
for named_entity in doc.ents:
    if named_entity.label_ == "PERSON":
        print(named_entity)

ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine
MarchShareclose
Kathryn
Zaluzhnyi
Gen Zaluzhnyi
Gen Zaluzhnyi
Tony Radakin
Oleksandr Syrsky
Syrsky
Volodymyr Zelensky
Yevgeny Prigozhin
Umar Khan
Khan
watching23 MarchZelensky
agoLeaked
Macron
Gwyneth Paltrow
Nureyev
Mario Movie
Michael Lerner
News ServicesOn


### Let's filter the results to see all entities labelled as "NORP":

In [99]:
for named_entity in doc.ents:
    if named_entity.label_ == "NORP":
        print(named_entity)

Ukrainian
Ukrainian
Western
Russian
Russian
Russian
Russian
Ukrainian
Ukrainian
Ukrainian
Russian
survival"
Russian
cities22 MarchRussian
Finnish
robbery5China
dies7UK-Israeli


### Let's filter the results to see all entities labelled as "GPE":

In [100]:
for named_entity in doc.ents:
    if named_entity.label_ == "GPE":
        print(named_entity)

Ukraine
Ukrainian
Russia
Ukraine
Russia
Bakhmut
Moscow
Ukraine
Ukraine
UK
Ukraine
Ukrainian
UK
Russia
Russia
Russia
UK
Bakhmut
Russia
Bakhmut
Ukraine
Russia
Bakhmut
Bakhmut
Russia
Ukraine
Russia
Russia
Bakhmut
Kentucky
Louisville
US
Ukraine
Louisville
Taiwan
West Bank
Barton Fink


### Let's filter the results to see all entities labelled as "LOC":

In [101]:
for named_entity in doc.ents:
    if named_entity.label_ == "LOC":
        print(named_entity)

### Let's filter the results to see all entities labelled as "FAC":

In [104]:
for named_entity in doc.ents:
    if named_entity.label_ == "FAC":
        print(named_entity)

### Let's filter the results to see all entities labelled as "ORG":

In [106]:
for named_entity in doc.ents:
    if named_entity.label_ == "ORG":
        print(named_entity)

BBC
HomepageSkip
menuBBC
UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn
PicturesReality CheckWorld News
TopicsRussia
Bakhmut
NewsThe
Bakhmut
Bakhmut
Bakhmut
Bakhmut
Ministry of Defence
Bakhmut
Bakhmut
Bakhmut
The Institute for the Study of War
Wagner
Wagner
Bakhmut
MoD
Wagner
the Russian
ICRC
Bakhmut
TopicsRussia
evolve16 MarchTop StoriesLive
EU
UkraineThe
Wagner
records9Elf
BBC NewsHomeNewsSportReelWorklifeTravelFutureCultureMusicTVWeatherSoundsTerms
UseAbout
NewslettersWhy
BBCAdvertise
BBC
BBC


### Let's define a function that will entify all the entities in our document and save the output as a dictionary:

In [107]:
entities=[]
entity_type = [] 
entity_identified = []
for named_entity in doc.ents:
    entity_type.append(named_entity.label_)
    entity_identified.append(named_entity.text)
    entity_dict = {'Entity_type': entity_type, 'Entity_identified': entity_identified}
    entities.append(entity_dict)
print(entities)

[{'Entity_type': ['GPE', 'ORG', 'ORG', 'PRODUCT', 'ORG', 'ORG', 'ORG', 'PERSON', 'PERSON', 'ORG', 'GPE', 'ORG', 'DATE', 'PERSON', 'ORG', 'ORG', 'NORP', 'GPE', 'DATE', 'GPE', 'PERSON', 'NORP', 'GPE', 'DATE', 'NORP', 'CARDINAL', 'NORP', 'GPE', 'DATE', 'GPE', 'ORG', 'NORP', 'GPE', 'PERSON', 'GPE', 'ORG', 'PERSON', 'GPE', 'PERSON', 'GPE', 'GPE', 'ORG', 'GPE', 'ORG', 'DATE', 'GPE', 'ORG', 'NORP', 'GPE', 'ORG', 'GPE', 'DATE', 'GPE', 'DATE', 'PERSON', 'NORP', 'GPE', 'PERSON', 'GPE', 'ORG', 'NORP', 'PERSON', 'GPE', 'DATE', 'ORG', 'DATE', 'GPE', 'ORG', 'NORP', 'NORP', 'ORG', 'NORP', 'ORG', 'PERSON', 'ORG', 'GPE', 'GPE', 'ORG', 'ORG', 'CARDINAL', 'GPE', 'CARDINAL', 'DATE', 'ORG', 'PERSON', 'PERSON', 'NORP', 'ORG', 'GPE', 'CARDINAL', 'CARDINAL', 'GPE', 'GPE', 'DATE', 'ORG', 'NORP', 'PERSON', 'GPE', 'NORP', 'GPE', 'ORG', 'CARDINAL', 'GPE', 'CARDINAL', 'GPE', 'PERSON', 'GPE', 'TIME', 'ORG', 'DATE', 'DATE', 'ORG', 'NORP', 'ORG', 'PERSON', 'PERSON', 'GPE', 'CARDINAL', 'GPE', 'DATE', 'ORDINAL', 'NORP'

### Let's build on this function to run this process across our entire collection of texts:

In [108]:
all_entities = []
for filepath in articles:
    text = open(filepath, encoding='utf-8').read()
    doc = nlp(text)
    entity_type = [] 
    for ent in doc.ents:
        entity_type.append(ent.label_)
    entity_identified = [] 
    for ent in doc.ents:
        entity_identified.append(ent.text)
    ent_dict = {'File_name': filepath, 'Entity_type': entity_type, 'Entity_identified': entity_identified}
    all_entities.append(ent_dict)
print(all_entities)

[{'File_name': 'files/15.txt', 'Entity_type': ['GPE', 'GPE', 'PERSON', 'ORG', 'ORG', 'PRODUCT', 'ORG', 'ORG', 'ORG', 'PERSON', 'GPE', 'PERSON', 'PERSON', 'ORG', 'GPE', 'PRODUCT', 'CARDINAL', 'PERSON', 'ORG', 'GPE', 'PERSON', 'NORP', 'NORP', 'PERSON', 'DATE', 'NORP', 'GPE', 'GPE', 'PERSON', 'NORP', 'GPE', 'PERSON', 'PERSON', 'ORG', 'PERSON', 'ORG', 'PERSON', 'PERSON', 'GPE', 'GPE', 'GPE', 'CARDINAL', 'DATE', 'PERSON', 'NORP', 'GPE', 'PERSON', 'GPE', 'PERSON', 'LOC', 'GPE', 'GPE', 'ORG', 'GPE', 'ORG', 'PERSON', 'NORP', 'DATE', 'PERSON', 'PERSON', 'GPE', 'GPE', 'GPE', 'ORG', 'PERSON', 'ORG', 'ORG', 'ORG', 'LOC', 'ORG', 'GPE', 'NORP', 'ORG', 'GPE', 'GPE', 'GPE', 'PERSON', 'PERSON', 'PERSON', 'GPE', 'ORG', 'NORP', 'ORG', 'PERSON', 'GPE', 'GPE', 'ORG', 'NORP', 'GPE', 'MONEY', 'DATE', 'GPE', 'ORG', 'PERSON', 'DATE', 'GPE', 'GPE', 'GPE', 'GPE', 'GPE', 'PERSON', 'GPE', 'GPE', 'NORP', 'GPE', 'GPE', 'GPE', 'PERSON', 'NORP', 'ORG', 'GPE', 'GPE', 'NORP', 'ORG', 'CARDINAL', 'GPE', 'CARDINAL', 'GPE',

### Let's visualize our results in a Pandas Dataframe sorted by the file name

In [109]:
df_NER = pd.DataFrame(all_entities)
df_NER = df_NER.sort_values(by='File_name', ascending=True)
df_NER 

Unnamed: 0,File_name,Entity_type,Entity_identified
16,files/1.txt,"[GPE, ORG, ORG, PRODUCT, ORG, ORG, ORG, PERSON, PERSON, ORG, GPE, ORG, DATE, PERSON, ORG, ORG, NORP, GPE, DATE, GPE, PERSON, NORP, GPE, DATE, NORP, CARDINAL, NORP, GPE, DATE, GPE, ORG, NORP, GPE, PERSON, GPE, ORG, PERSON, GPE, PERSON, GPE, GPE, ORG, GPE, ORG, DATE, GPE, ORG, NORP, GPE, ORG, GPE, DATE, GPE, DATE, PERSON, NORP, GPE, PERSON, GPE, ORG, NORP, PERSON, GPE, DATE, ORG, DATE, GPE, ORG,...","[Ukraine, BBC, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine, MarchShareclose, TopicsRussia, Ukrainian, Bakhmut, months, Kathryn, NewsThe, Bakhmut, Ukrainian, Russ..."
6,files/10.txt,"[GPE, NORP, ORG, ORG, CARDINAL, CARDINAL, PERSON, ORG, PERSON, ORG, LOC, LOC, LOC, GPE, GPE, LOC, GPE, LOC, GPE, LOC, GPE, GPE, LOC, GPE, PERSON, ORG, ORG, ORG, PERSON, ORG, ORG, ORG, LOC, GPE, GPE, LOC, GPE, ORG, GPE, PERSON, LOC, LOC, GPE, GPE, LOC, GPE, LOC, GPE, ORG, ORDINAL, CARDINAL, ORG, ORG, ORG, ORG, ORG, ORG, ORG, ORG, PERSON, ORG, ORG, ORG, ORG, PRODUCT, ORG, ORG, NORP, PERSON, PERS...","[Ukraine, Russian, CNN, CNN, 1, 2, Audio, Content, Cancel\n \n\n Submit, Ad Feedback\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nClose, Africa, Americas, Asia, Australia, China, Europe, India, Middle East, United Kingdom, Africa\n \n\n Americas\n \..."
7,files/11.txt,"[GPE, GPE, ORG, ORG, ORG, CARDINAL, CARDINAL, PERSON, ORG, PERSON, ORG, GPE, EVENT, EVENT, PERSON, ORG, ORG, ORG, PERSON, ORG, ORG, ORG, PERSON, LOC, LOC, GPE, GPE, LOC, GPE, LOC, GPE, ORG, ORDINAL, CARDINAL, ORG, ORG, ORG, ORG, ORG, ORG, ORG, ORG, PERSON, ORG, ORG, ORG, ORG, PRODUCT, ORG, PERSON, GPE, PERSON, PERSON, ORG, TIME, DATE, NORP, DATE, PERSON, GPE, ORG, QUANTITY, NORP, GPE, LAW, GPE...","[England, Ukraine, Wembley, CNN, CNN, 1, 2, Audio, Content, Cancel\n \n\n Submit, Ad Feedback\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nClose, Motorsport, US Sports\n \n\n\n\n \n Olympics\n \n\n\n\n \n ..."
5,files/12.txt,"[GPE, GPE, PERSON, ORG, ORG, PRODUCT, ORG, ORG, ORG, PERSON, GPE, PERSON, PERSON, ORG, PERSON, GPE, CARDINAL, CARDINAL, NORP, ORG, PERSON, GPE, GPE, GPE, LOC, ORG, ORG, CARDINAL, ORG, ORG, DATE, ORG, ORG, PERSON, CARDINAL, CARDINAL, GPE, NORP, PERSON, GPE, GPE, DATE, GPE, PERSON, DATE, TIME, ORG, GPE, GPE, GPE, LOC, DATE, ORG, GPE, ORG, ORG, CARDINAL, GPE, TIME, DATE, CARDINAL, GPE, CARDINAL, ...","[Ukraine, UK, Putin, BBC, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine, UK, Putin, MarchShareclose, TopicsRussia, Getty ImagesImage, UK, 14, 2, Ukrainians, Russia..."
4,files/13.txt,"[GPE, PERSON, ORG, PRODUCT, ORG, ORG, ORG, PERSON, PERSON, PERSON, ORG, PERSON, NORP, PERSON, NORP, NORP, DATE, PERSON, WORK_OF_ART, GPE, DATE, PERSON, DATE, GPE, DATE, ORG, PERSON, PERSON, NORP, GPE, GPE, DATE, GPE, GPE, DATE, WORK_OF_ART, NORP, GPE, GPE, GPE, DATE, GPE, DATE, NORP, NORP, PERSON, ORG, GPE, DATE, GPE, PERSON, PERSON, GPE, NORP, PERSON, GPE, DATE, NORP, ORG, GPE, NORP, PERSON, ...","[Ukraine, Zelensky, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine, Zelensky, MarchShareclose, TopicsRussia, Screenshot, Ukrainian, Volodymyr Zelensky, Ukrainian, R..."
1,files/14.txt,"[GPE, ORG, ORG, PRODUCT, ORG, ORG, ORG, PERSON, PERSON, ORG, GPE, ORG, ORG, GPE, GPE, PERSON, ORG, GPE, NORP, CARDINAL, DATE, PERSON, GPE, PERSON, PERSON, ORG, GPE, NORP, PERSON, PERSON, GPE, GPE, GPE, LOC, LOC, LOC, GPE, LOC, PERSON, GPE, GPE, GPE, NORP, GPE, GPE, PERSON, GPE, ORG, GPE, EVENT, DATE, GPE, NORP, DATE, NORP, PERSON, GPE, PERSON, GPE, ORG, GPE, GPE, GPE, PERSON, GPE, DATE, GPE, G...","[Ukraine, BBC, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine, ministerPublished17 MarchShareclose, TopicsRussia, Ukraine, UkraineBy James LandaleDiplomatic, KyivCo..."
0,files/15.txt,"[GPE, GPE, PERSON, ORG, ORG, PRODUCT, ORG, ORG, ORG, PERSON, GPE, PERSON, PERSON, ORG, GPE, PRODUCT, CARDINAL, PERSON, ORG, GPE, PERSON, NORP, NORP, PERSON, DATE, NORP, GPE, GPE, PERSON, NORP, GPE, PERSON, PERSON, ORG, PERSON, ORG, PERSON, PERSON, GPE, GPE, GPE, CARDINAL, DATE, PERSON, NORP, GPE, PERSON, GPE, PERSON, LOC, GPE, GPE, ORG, GPE, ORG, PERSON, NORP, DATE, PERSON, PERSON, GPE, GPE, G...","[Ukraine, Florida, Ron DeSantis, BBC, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine, Florida, Ron DeSantis, MarchShareclose, TopicsRussia, Ukraine, JavaScript, Fiv..."
2,files/16.txt,"[ORG, GPE, ORG, ORG, PRODUCT, ORG, ORG, ORG, GPE, PERSON, ORG, GPE, ORG, ORG, GPE, GPE, PERSON, ORG, GPE, CARDINAL, DATE, MONEY, GPE, DATE, GPE, ORG, CARDINAL, NORP, DATE, NORP, PERSON, ORG, PRODUCT, GPE, NORP, GPE, GPE, ORG, GPE, ORG, CARDINAL, NORP, ORG, CARDINAL, ORG, GPE, ORDINAL, MONEY, MONEY, MONEY, ORG, GPE, DATE, ORDINAL, MONEY, QUANTITY, GPE, ORG, NORP, DATE, LOC, GPE, GPE, ORG, ORG, ...","[EU, Ukraine, BBC, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, Ukraine, MarchShareclose, TopicsRussia, Ukraine, Armed Forces 80th, Air Assault Brigade, Bakhmut, Ukraine, James GregoryBBC NewsMore, European Union, Ukraine, at lea..."
3,files/17.txt,"[NORP, NORP, GPE, ORG, ORG, PRODUCT, ORG, ORG, ORG, NORP, NORP, GPE, ORG, ORG, PERSON, GPE, PERSON, ORG, NORP, GPE, LOC, NORP, NORP, GPE, PERSON, GPE, GPE, PERSON, GPE, GPE, PERSON, GPE, GPE, DATE, PERSON, GPE, GPE, CARDINAL, GPE, PERSON, NORP, PERSON, ORDINAL, EVENT, DATE, PERSON, GPE, CARDINAL, DATE, NORP, PERSON, PERSON, ORDINAL, NORP, NORP, PERSON, GPE, CARDINAL, PERSON, GPE, ORG, ORG, DAT...","[Japanese, Chinese, Ukraine, BBC, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, ReadsAsiaChinaIndiaJapanese, Chinese, Ukraine, MarchShareclose, TopicsRussia, Fumio Kishida, Bucha, Shaimaa Khalil, Gareth Evansin Tokyo, LondonIf, Uk..."
19,files/18.txt,"[GPE, PERSON, ORG, ORG, ORG, PRODUCT, ORG, ORG, ORG, PERSON, PERSON, ORG, PERSON, ORG, GPE, PRODUCT, GPE, PERSON, ORG, ORG, NORP, PERSON, ORG, NORP, GPE, PERSON, TIME, ORG, DATE, ORDINAL, NORP, ORG, ORG, PERSON, PERSON, NORP, PERSON, TIME, ORG, GPE, GPE, PERSON, ORG, NORP, PERSON, FAC, NORP, NORP, PERSON, NORP, GPE, ORG, NORP, DATE, GPE, CARDINAL, ORG, PERCENT, CARDINAL, CARDINAL, PERSON, ORG,...","[Ukraine, Putin, Mariupol, BBC, HomepageSkip, HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore, menuBBC, UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn, PicturesReality CheckWorld News, ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine, Putin, Mariupol, MarchShareclose, TopicsRussia, Ukraine, JavaScript, Russia, Putin, ..."


### Let's explode our Dataframe so we have just one entity value per row pegged to the file name

In [110]:
df_NER = df_NER.set_index(['File_name'])
df_NER = df_NER.apply(pd.Series.explode).reset_index()
df_NER[:25]

Unnamed: 0,File_name,Entity_type,Entity_identified
0,files/1.txt,GPE,Ukraine
1,files/1.txt,ORG,BBC
2,files/1.txt,ORG,HomepageSkip
3,files/1.txt,PRODUCT,HelpYour accountHomeNewsSportReelWorklifeTravelFutureMore
4,files/1.txt,ORG,menuBBC
5,files/1.txt,ORG,UkraineClimateVideoWorldUS & CanadaUKBusinessTechScienceMoreStoriesEntertainment & ArtsHealthIn
6,files/1.txt,ORG,PicturesReality CheckWorld News
7,files/1.txt,PERSON,ReadsWorldAfricaAsiaAustraliaEuropeLatin AmericaMiddle EastUkraine
8,files/1.txt,PERSON,MarchShareclose
9,files/1.txt,ORG,TopicsRussia


### Let's filter our results by GPE

In [111]:
df_NER[df_NER['Entity_type'] == 'GPE'][:15]

Unnamed: 0,File_name,Entity_type,Entity_identified
0,files/1.txt,GPE,Ukraine
10,files/1.txt,GPE,Ukrainian
17,files/1.txt,GPE,Russia
19,files/1.txt,GPE,Ukraine
22,files/1.txt,GPE,Russia
27,files/1.txt,GPE,Bakhmut
29,files/1.txt,GPE,Moscow
32,files/1.txt,GPE,Ukraine
34,files/1.txt,GPE,Ukraine
37,files/1.txt,GPE,UK


### Let's filter our results by LAW

In [112]:
df_NER[df_NER['Entity_type'] == 'LAW'][:15]

Unnamed: 0,File_name,Entity_type,Entity_identified
443,files/11.txt,LAW,Euro 2024
490,files/11.txt,LAW,Euro 2024


### Let's filter our results by Money

In [113]:
df_NER[df_NER['Entity_type'] == 'MONEY'][:15]

Unnamed: 0,File_name,Entity_type,Entity_identified
1026,files/15.txt,MONEY,more than $112bn
1110,files/16.txt,MONEY,2bn
1136,files/16.txt,MONEY,1bn
1137,files/16.txt,MONEY,1.06bn
1138,files/16.txt,MONEY,863
1143,files/16.txt,MONEY,1bn
1182,files/16.txt,MONEY,350
1585,files/19.txt,MONEY,around $415 or £
1586,files/19.txt,MONEY,338
1813,files/2.txt,MONEY,87000
