### Explanation of Libraries and Settings

- **pandas (`pd`)**: A powerful Python library for data manipulation and analysis, especially useful for working with tabular data like CSV or Excel files.  
- **spaCy (`spacy`)**: A popular NLP library for tasks such as tokenization, part-of-speech tagging, and Named Entity Recognition (NER).  
- **BeautifulSoup (`bs4`)**: A library for parsing HTML and XML documents, commonly used for web scraping.  
- **`pd.set_option("display.max_rows", 200)`**: Configures pandas to display up to 200 rows when printing a DataFrame, which helps in exploring large datasets without truncation.


In [None]:
import pandas as pd
import spacy
from bs4 import BeautifulSoup
pd.set_option("display.max_rows", 200)

### Named Entity Recognition (NER) using spaCy

- **Purpose**: Extract real-world entities such as people, organizations, locations, and dates from text.
- **Steps**:
  1. Load a pretrained NLP model (`en_core_web_sm`) from spaCy.
  2. Process the input text using the model to create a `Doc` object.
  3. Iterate through `doc.ents` to access the named entities.
- **Entity Attributes**:
  - `text` → the actual entity string (e.g., "Mahua Moitra")
  - `start_char` → starting character index of the entity in the text
  - `end_char` → ending character index of the entity in the text
  - `label_` → entity type code (e.g., `PERSON`, `ORG`)
  - `spacy.explain(label_)` → human-readable description of the entity type
- **Use Case**: Helps in structuring unstructured text for analytics, summarization, information extraction, or downstream ML tasks.


In [14]:
nlp=spacy.load('en_core_web_sm')
content = "Trinamool Congress leader Mahua Moitra has moved the Supreme Court against her expulsion from the Lok Sabha over the cash-for-query allegations against her. Moitra was ousted from the Parliament last week after the Ethics Committee of the Lok Sabha found her guilty of jeopardising national security by sharing her parliamentary portal's login credentials with businessman Darshan Hiranandani."

doc=nlp(content)
for ent in doc.ents:
    print(ent.text,ent.start_char,ent.end_char,ent.label_,spacy.explain(ent.label_))

Trinamool Congress 0 18 ORG Companies, agencies, institutions, etc.
Mahua Moitra 26 38 PERSON People, including fictional
the Supreme Court 49 66 ORG Companies, agencies, institutions, etc.
Moitra 157 163 NORP Nationalities or religious or political groups
Parliament 184 194 ORG Companies, agencies, institutions, etc.
last week 195 204 DATE Absolute or relative dates or periods
the Ethics Committee 211 231 ORG Companies, agencies, institutions, etc.
Darshan Hiranandani 373 392 PERSON People, including fictional


### Visualizing Named Entities using spaCy's displaCy

- **Purpose**: To visually highlight named entities in text, making it easier to see and understand what the model has recognized.
- **Function**: `displacy.render(doc, style='ent')`
  - `doc` → the processed text (`Doc` object) containing entities.
  - `style='ent'` → tells displaCy to render named entities (NER visualization).
- **Output**: 
  - Entities in the text are highlighted with different colors according to their type (e.g., PERSON, ORG, DATE).
  - Hovering over an entity in supported environments shows its label.
- **Use Case**: Quick, interactive way to inspect NER results for text exploration, debugging, or project presentation.


In [15]:
from spacy import displacy
displacy.render(doc,style='ent')

### Creating a DataFrame of Named Entities

- **Purpose**: Convert the extracted named entities into a structured tabular format for easier analysis and manipulation.
- **Steps**:
  1. Iterate through `doc.ents` to access each entity in the text.
  2. For each entity, extract:
     - `text` → the actual entity string (e.g., "Mahua Moitra")
     - `label_` → the entity type (e.g., `PERSON`, `ORG`)
     - `lemma_` → the base form of the entity (lemmatized version)
  3. Store these tuples in a list called `entities`.
  4. Convert the list into a **pandas DataFrame** with columns `['text', 'type', 'lemma']`.
- **Output**: A DataFrame where each row represents a named entity with its text, type, and lemma.
- **Use Case**: Useful for data analysis, filtering, or exporting NER results to CSV for downstream tasks.


In [17]:
entities=[(ent.text,ent.label_,ent.lemma_) for ent in doc.ents]
df=pd.DataFrame(entities,columns=['text','type','lemma'])
df

Unnamed: 0,text,type,lemma
0,Trinamool Congress,ORG,Trinamool Congress
1,Mahua Moitra,PERSON,Mahua Moitra
2,the Supreme Court,ORG,the Supreme Court
3,Moitra,NORP,Moitra
4,Parliament,ORG,Parliament
5,last week,DATE,last week
6,the Ethics Committee,ORG,the Ethics Committee
7,Darshan Hiranandani,PERSON,Darshan Hiranandani


In [24]:
content=input('enter paragraph')
doc=nlp(content)
print('here is the detail:')
for e in doc.ents:
    print(e.text,e.label_)
displacy.render(doc,style='ent')

enter paragraph BOAT BoAt is an Indian consumer electronics brand that has become a market leader in audio and wearable devices by focusing on affordability, durability, and style. The company was co-founded by Aman Gupta and Sameer Mehta in 2016 to fill the market gap for fashionable yet reasonably priced audio accessories for millennials. It has successfully carved out a strong brand identity through creative design and robust, youth-centric marketing strategies.


here is the detail:
BoAt GPE
Indian NORP
Aman Gupta PERSON
Sameer Mehta PERSON
2016 DATE
