#### Named Entity Recognition (NER)
**Named Entity Recognition (NER)** is an NLP technique that identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and more. The goal is to transform unstructured text into structured data by tagging these entities, making it easier to analyze and extract meaningful information. NER is widely used in various applications, including:

- `Information extraction`: Pulling out specific data from large text corpora.
- `Question answering`: Enhancing the accuracy of responses by identifying relevant entities.
- `Sentiment analysis`: Understanding the sentiment towards specific entities.
- `Text summarization`: Highlighting key entities in a summary.

**Use Cases**

- `Information Extraction`: NER helps in extracting specific information from large volumes of text, such as identifying names of people, organizations, locations, dates, and more. This is particularly useful in fields like journalism and research.
- `Customer Support`: By identifying key entities in customer queries, NER can help chatbots and automated systems provide more accurate and relevant responses, improving customer service efficiency.
- `Sentiment Analysis`: NER can enhance sentiment analysis by pinpointing the entities being discussed, allowing for more precise sentiment scoring related to specific products, services, or individuals.
- `Question Answering Systems`: NER is crucial in developing systems that can understand and respond to user questions by identifying the entities involved in the query.
- `Financial Analysis`: In the finance sector, NER can be used to extract important data from financial reports, news articles, and other documents, aiding in trend analysis and risk assessment.
- `Content Recommendation`: By recognizing entities within text, NER can improve recommendation systems, suggesting relevant content based on identified entities.

**Models**
- spaCy
- Stanford's CoreNLP
- Flair

**Why is NER important?**

1. Enhanced Information Retrieval: NER helps customers quickly find relevant information by highlighting key entities in large volumes of text. This is particularly useful in fields like journalism, legal research, and customer support.
2. Improved Decision-Making: By extracting critical entities from documents, reports, and articles, NER enables customers to make informed decisions based on accurate and relevant data.
3. Automation and Efficiency: NER automates the process of identifying important entities, reducing the need for manual data extraction and saving time and resources.
4. Personalization: In customer service and marketing, NER can be used to tailor responses and recommendations based on identified entities, enhancing the customer experience.

In [1]:
import pandas as pd
import spacy
from spacy import displacy

In [2]:
df = pd.read_csv(r"C:\Users\nene0\Desktop\Projects\greenflash\chat_data.csv", encoding_errors='ignore')

df.head()

Unnamed: 0,Chat_ID,Message_ID,Sender,Message
0,data_science_trend,0,user,What is the latest trend in data science?
1,data_science_trend,1,copilot,"Data science is evolving rapidly, and several ..."
2,data_science_trend,2,user,Can you tell me more about generative AI?
3,data_science_trend,3,copilot,Generative AI is a fascinating and rapidly evo...
4,data_science_trend,4,user,can you explain more about how the generative ...


In [3]:
df['Chat_ID'].unique()

array(['data_science_trend', 'food_history_companies', 'gaming',
       'greek_myth', 'job_market', 'jokes', 'music_kpop', 'pets',
       'philoshophy', 'rich_poor_countries',
       'tech_product_recommendation', 'travel', 'largest_adj_product',
       'jarritos_flavors', 'reason_for_sleepiness'], dtype=object)

In [4]:
travel_chat = df[df['Chat_ID']=='travel']

travel_chat.head()

Unnamed: 0,Chat_ID,Message_ID,Sender,Message
384,travel,0,user,Can you recommend top 10 places to travel in E...
385,travel,1,copilot,Sure! Here are ten of the safest places to tra...
386,travel,2,user,What is the most searched topic for those peop...
387,travel,3,copilot,One of the most searched topics for travelers ...
388,travel,4,user,I am curious to know what other people are int...


In [5]:
ner = spacy.load("en_core_web_lg")

def extract_entities(text):
    doc = ner(text)
    return [(ent.text, ent.label_) for ent in doc.ents]

travel_chat['Entities'] = travel_chat['Message'].apply(extract_entities)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  travel_chat['Entities'] = travel_chat['Message'].apply(extract_entities)


In [6]:
travel_chat.head(3)

Unnamed: 0,Chat_ID,Message_ID,Sender,Message,Entities
384,travel,0,user,Can you recommend top 10 places to travel in E...,"[(10, CARDINAL), (East Asia, LOC)]"
385,travel,1,copilot,Sure! Here are ten of the safest places to tra...,"[(ten, CARDINAL), (East Asia, LOC), (Singapore..."
386,travel,2,user,What is the most searched topic for those peop...,"[(Taiwan, GPE)]"


In [7]:
spacy.explain("GPE")

'Countries, cities, states'

In [8]:
travel_chat.iloc[0]['Message']

'Can you recommend top 10 places to travel in East Asia where the public security is considered very safe?'

In [9]:
displacy.render(ner(travel_chat.iloc[1]['Message']),style="ent",jupyter=True)

In [10]:
message = travel_chat.iloc[0]['Message']

doc = ner(message) # process message with model

print([token.text for token in doc]) # print the token

['Can', 'you', 'recommend', 'top', '10', 'places', 'to', 'travel', 'in', 'East', 'Asia', 'where', 'the', 'public', 'security', 'is', 'considered', 'very', 'safe', '?']


In [11]:
# Sample text
text = "There was an earthquake in California today."

# Process the text
doc = ner(text)

# Extract named entities
for ent in doc.ents:
    print(ent.text, ent.label_)

California GPE
today DATE
