#### Named Entity Recognition (NER)
**Named Entity Recognition (NER)** is an NLP technique that identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and more. The goal is to transform unstructured text into structured data by tagging these entities, making it easier to analyze and extract meaningful information. NER is widely used in various applications, including:

- `Information extraction`: Pulling out specific data from large text corpora.
- `Question answering`: Enhancing the accuracy of responses by identifying relevant entities.
- `Sentiment analysis`: Understanding the sentiment towards specific entities.
- `Text summarization`: Highlighting key entities in a summary.

**Use Cases**

- `Information Extraction`: NER helps in extracting specific information from large volumes of text, such as identifying names of people, organizations, locations, dates, and more. This is particularly useful in fields like journalism and research.
- `Customer Support`: By identifying key entities in customer queries, NER can help chatbots and automated systems provide more accurate and relevant responses, improving customer service efficiency.
- `Sentiment Analysis`: NER can enhance sentiment analysis by pinpointing the entities being discussed, allowing for more precise sentiment scoring related to specific products, services, or individuals.
- `Question Answering Systems`: NER is crucial in developing systems that can understand and respond to user questions by identifying the entities involved in the query.
- `Financial Analysis`: In the finance sector, NER can be used to extract important data from financial reports, news articles, and other documents, aiding in trend analysis and risk assessment.
- `Content Recommendation`: By recognizing entities within text, NER can improve recommendation systems, suggesting relevant content based on identified entities.

**Models**
- spaCy
- Stanford's CoreNLP
- Flair

**Why is NER important?**

1. Enhanced Information Retrieval: NER helps customers quickly find relevant information by highlighting key entities in large volumes of text. This is particularly useful in fields like journalism, legal research, and customer support.
2. Improved Decision-Making: By extracting critical entities from documents, reports, and articles, NER enables customers to make informed decisions based on accurate and relevant data.
3. Automation and Efficiency: NER automates the process of identifying important entities, reducing the need for manual data extraction and saving time and resources.
4. Personalization: In customer service and marketing, NER can be used to tailor responses and recommendations based on identified entities, enhancing the customer experience.

In [22]:
import pandas as pd

pd.options.mode.chained_assignment = None
pd.set_option('display.max_colwidth', None)

# SpaCy
import spacy
from spacy import displacy

# Flair
from flair.data import Sentence
from flair.models import SequenceTagger

In [2]:
df = pd.read_csv(r"C:\Users\nene0\Desktop\Projects\greenflash\chat_data.csv", encoding_errors='ignore')

df.head()

Unnamed: 0,Chat_ID,Message_ID,Sender,Message
0,data_science_trend,0,user,What is the latest trend in data science?
1,data_science_trend,1,copilot,"Data science is evolving rapidly, and several ..."
2,data_science_trend,2,user,Can you tell me more about generative AI?
3,data_science_trend,3,copilot,Generative AI is a fascinating and rapidly evo...
4,data_science_trend,4,user,can you explain more about how the generative ...


In [3]:
df['Chat_ID'].unique()

array(['data_science_trend', 'food_history_companies', 'gaming',
       'greek_myth', 'job_market', 'jokes', 'music_kpop', 'pets',
       'philoshophy', 'rich_poor_countries',
       'tech_product_recommendation', 'travel', 'largest_adj_product',
       'jarritos_flavors', 'reason_for_sleepiness'], dtype=object)

In [4]:
travel_chat = df[df['Chat_ID']=='travel']
travel_chat.reset_index(drop=True, inplace=True)

travel_chat.head()

Unnamed: 0,Chat_ID,Message_ID,Sender,Message
0,travel,0,user,Can you recommend top 10 places to travel in E...
1,travel,1,copilot,Sure! Here are ten of the safest places to tra...
2,travel,2,user,What is the most searched topic for those peop...
3,travel,3,copilot,One of the most searched topics for travelers ...
4,travel,4,user,I am curious to know what other people are int...


## SpaCy

In [5]:
ner = spacy.load("en_core_web_lg")

def extract_entities(text):
    doc = ner(text)
    return [(ent.text, ent.label_) for ent in doc.ents]

travel_chat.loc[:, 'Spacy_Ent'] = travel_chat['Message'].apply(extract_entities)

In [6]:
travel_chat.loc[:, 'Spacy_Count'] = travel_chat['Spacy_Ent'].apply(len)

In [7]:
travel_chat.head(3)

Unnamed: 0,Chat_ID,Message_ID,Sender,Message,Spacy_Ent,Spacy_Count
0,travel,0,user,Can you recommend top 10 places to travel in E...,"[(10, CARDINAL), (East Asia, LOC)]",2
1,travel,1,copilot,Sure! Here are ten of the safest places to tra...,"[(ten, CARDINAL), (East Asia, LOC), (Singapore...",28
2,travel,2,user,What is the most searched topic for those peop...,"[(Taiwan, GPE)]",1


In [17]:
spacy.explain("NORP") # To find the explanation of the tag.

'Nationalities or religious or political groups'

### Displacy

In [9]:
travel_chat.iloc[0]['Message']

'Can you recommend top 10 places to travel in East Asia where the public security is considered very safe?'

In [21]:
displacy.render(ner(travel_chat.iloc[7]['Message']), style="ent", jupyter=True)

In [11]:
message = travel_chat.iloc[0]['Message']

doc = ner(message) # process message with model

print([token.text for token in doc]) # print the token

['Can', 'you', 'recommend', 'top', '10', 'places', 'to', 'travel', 'in', 'East', 'Asia', 'where', 'the', 'public', 'security', 'is', 'considered', 'very', 'safe', '?']


In [12]:
# Sample text
text = "There was an earthquake in California today."

# Process the text
doc = ner(text)

# Extract named entities
for ent in doc.ents:
    print(ent.text, ent.label_)

California GPE
today DATE


## Flair

In [13]:
# Load the NER tagger
tagger = SequenceTagger.load('ner')

# Function to extract named entities
def extract_entities(text):
    sentence = Sentence(text)
    tagger.predict(sentence)
    return [(entity.text, entity.tag) for entity in sentence.get_spans('ner')]


travel_chat.loc[:,'Flair_Ent'] = travel_chat['Message'].apply(extract_entities)

2024-09-11 13:08:32,658 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>


In [14]:
travel_chat.loc[:, 'Flair_Count'] = travel_chat['Flair_Ent'].apply(len)

In [15]:
travel_chat.head()

Unnamed: 0,Chat_ID,Message_ID,Sender,Message,Spacy_Ent,Spacy_Count,Flair_Ent,Flair_Count
0,travel,0,user,Can you recommend top 10 places to travel in E...,"[(10, CARDINAL), (East Asia, LOC)]",2,"[(East Asia, LOC)]",1
1,travel,1,copilot,Sure! Here are ten of the safest places to tra...,"[(ten, CARDINAL), (East Asia, LOC), (Singapore...",28,"[(East Asia, LOC), (Singapore, LOC), (Singapor...",23
2,travel,2,user,What is the most searched topic for those peop...,"[(Taiwan, GPE)]",1,"[(Taiwan, LOC)]",1
3,travel,3,copilot,One of the most searched topics for travelers ...,"[(One, CARDINAL), (Taiwan, GPE), (Taiwan, GPE)...",9,"[(Taiwan, LOC), (Taiwan, LOC), (EasyCard, MISC...",7
4,travel,4,user,I am curious to know what other people are int...,"[(10, CARDINAL)]",1,[],0


In [24]:
travel_chat[['Spacy_Ent', 'Flair_Ent']].head(10)

Unnamed: 0,Spacy_Ent,Flair_Ent
0,"[(10, CARDINAL), (East Asia, LOC)]","[(East Asia, LOC)]"
1,"[(ten, CARDINAL), (East Asia, LOC), (Singapore, GPE), (Singapore, GPE), (one, CARDINAL), (Japan, GPE), (Japan, GPE), (South Korea, GPE), (South Korea, GPE), (Taiwan, GPE), (Taiwan, GPE), (Hong Kong, GPE), (Hong Kong, GPE), (Malaysia, GPE), (Kuala Lumpur, GPE), (Penang, GPE), (Malaysia, GPE), (Vietnam, GPE), (Hanoi, GPE), (Ho Chi Minh City, PERSON), (Mongolia, GPE), (Mongolia, GPE), (Bhutan, GPE), (Gross National Happiness, ORG), (Brunei, GPE), (Brunei, GPE), (Storyteller Travel, ORG), (Nomad, ORG)]","[(East Asia, LOC), (Singapore, LOC), (Singapore, LOC), (Japan, LOC), (Japan, LOC), (South Korea, LOC), (South Korea, LOC), (Taiwan, LOC), (Taiwan, LOC), (Hong Kong, LOC), (Hong Kong, LOC), (Malaysia, LOC), (Kuala Lumpur, LOC), (Penang, LOC), (Malaysia, LOC), (Vietnam, LOC), (Hanoi, LOC), (Ho Chi Minh City, LOC), (Mongolia, LOC), (Mongolia, LOC), (Bhutan, LOC), (Brunei, LOC), (Brunei, LOC)]"
2,"[(Taiwan, GPE)]","[(Taiwan, LOC)]"
3,"[(One, CARDINAL), (Taiwan, GPE), (Taiwan, GPE), (Travelers, ORG), (Taiwan, GPE), (Taiwan, GPE), (Taiwan, GPE), (Lonely Planet, WORK_OF_ART), (The Sandy Feet, WORK_OF_ART)]","[(Taiwan, LOC), (Taiwan, LOC), (EasyCard, MISC), (Taiwan, LOC), (Taiwan, LOC), (Taiwan, LOC), (Lonely Planet : The Sandy Feet, MISC)]"
4,"[(10, CARDINAL)]",[]
5,"[(Japan, GPE), (Tokyo, GPE)]","[(Japan, LOC), (Tokyo, LOC)]"
6,"[(Singapore, GPE), (Singapore, GPE), (Japan, GPE), (first, ORDINAL), (the first day, DATE)]","[(Singapore, LOC), (Singapore, LOC), (Japan, LOC)]"
7,"[(Singapore, GPE), (Japan, GPE), (first, ORDINAL), (the first day, DATE), (Asakusa, PERSON), (Tokyo, GPE), (Senso-ji Temple, FAC), (Tokyo, GPE), (Nakamise Street, FAC), (kibi dango, ORG), (Asakusa, PERSON), (Japanese, NORP), (first, ORDINAL), (Singapore, GPE)]","[(Singapore, LOC), (Japan, LOC), (Asakusa, LOC), (Tokyo, LOC), (Senso-ji Temple, LOC), (Tokyo, LOC), (Nakamise Street, LOC), (Asakusa, LOC), (Japanese, MISC), (Singapore, LOC)]"
8,"[(Singaporean, NORP), (laksa, satay, ORG), (bak kut teh, PERSON), (marina, LOC), (10, CARDINAL)]","[(Singaporean, MISC)]"
9,"[(Singapore, GPE), (10, CARDINAL), (Taiwan, GPE), (the first day, DATE), (Taipei, GPE), (Shilin Night Market, FAC), (Taipei, GPE), (Jiufen, ORG)]","[(Singapore, LOC), (Taiwan, LOC), (Taipei, LOC), (Shilin Night Market, LOC), (Taipei, LOC), (Jiufen, LOC)]"


In [25]:
spacy.explain("FAC")

'Buildings, airports, highways, bridges, etc.'

In [20]:
travel_chat.iloc[7]['Message']

"Singapore sounds like a fantastic choice, especially with its amazing food scene! Revisiting as an adult will definitely give you a whole new perspective.\nIf I could travel to Japan, the first thing I'd want to do on the first day is visit Asakusa in Tokyo. It's home to the famous Senso-ji Temple, Tokyo's oldest temple, which offers a beautiful blend of history and culture. The area around the temple, known as Nakamise Street, is filled with traditional shops and street food stalls where you can try local snacks like taiyaki (fish-shaped cakes filled with sweet red bean paste) and kibi dango (sweet rice dumplings).\nAfter exploring Asakusa, I'd probably head to a nearby onsen (hot spring) to relax and unwind from the journey. It's a perfect way to experience a traditional Japanese bath and soak in the local culture.\nWhat about you? What's the first thing you'd like to do when you visit Singapore again?"

- SpaCy takes a second to extract the entities while Flair takes a minute to extract entities.
- Spacy extracts cardinal and miscellaneous but accuracy wise Flair seems more accurate.
    - index 7: Asakusa is not a person it's a place -> Spacy marked as a Person while Flair marked as a Location
    - index 8: laksa satay, bak kut teh is not a Person, it's a food