📌 Project: Named Entity Recognition (NER) using NLTK and SpaCy
📌 Objective: Identify and extract important entities from text (e.g., names, locations, dates).

🔹 Techniques Used:

*Tokenization                             
*Part-of-Speech (POS) tagging                 
*Named Entity Recognition (NER)            
*Visualization                     
🔹 Libraries:

*NLTK (for basic NLP processing)         
*SpaCy (for advanced entity recognition)
*Matplotlib (for visualization)


In [1]:
!pip install spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m69.9 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [3]:
import nltk
import spacy
from nltk.tokenize import word_tokenize
from nltk import pos_tag ,ne_chunk
from nltk.tree import Tree
import matplotlib.pyplot  as  plt



In [8]:
text =""" Elon Musk is the CEO of Tesla and SpaceX.
          He was born in South Africa and now lives in the United States.
          Tesla's headquarters is in California."""

In [13]:
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
tokens =word_tokenize(text)


pos_tags =pos_tag(tokens)
print(pos_tags)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.


[('Elon', 'NNP'), ('Musk', 'NNP'), ('is', 'VBZ'), ('the', 'DT'), ('CEO', 'NN'), ('of', 'IN'), ('Tesla', 'NNP'), ('and', 'CC'), ('SpaceX', 'NNP'), ('.', '.'), ('He', 'PRP'), ('was', 'VBD'), ('born', 'VBN'), ('in', 'IN'), ('South', 'NNP'), ('Africa', 'NNP'), ('and', 'CC'), ('now', 'RB'), ('lives', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('United', 'NNP'), ('States', 'NNPS'), ('.', '.'), ('Tesla', 'NNP'), ("'s", 'POS'), ('headquarters', 'NN'), ('is', 'VBZ'), ('in', 'IN'), ('California', 'NNP'), ('.', '.')]


In [16]:
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')
def nltk_NER(text):
  tokens = word_tokenize(text)
  pos_tags =pos_tag(tokens)
  ne_tree =ne_chunk(pos_tags)
  return ne_tree

ner_tree = nltk_NER(text)
print(ner_tree)

[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


(S
  (PERSON Elon/NNP)
  (ORGANIZATION Musk/NNP)
  is/VBZ
  the/DT
  (ORGANIZATION CEO/NN of/IN Tesla/NNP)
  and/CC
  (ORGANIZATION SpaceX/NNP)
  ./.
  He/PRP
  was/VBD
  born/VBN
  in/IN
  (GPE South/NNP Africa/NNP)
  and/CC
  now/RB
  lives/VBZ
  in/IN
  the/DT
  (GPE United/NNP States/NNPS)
  ./.
  (PERSON Tesla/NNP)
  's/POS
  headquarters/NN
  is/VBZ
  in/IN
  (GPE California/NNP)
  ./.)


#Using SapCy

In [17]:
nlp = spacy.load("en_core_web_sm")

doc =nlp(text)

for ent in doc.ents:
  print(f"{ent.text} → {ent.label_}")

Elon Musk → PERSON
Tesla → ORG
South Africa → GPE
the United States → GPE
Tesla → ORG
California → GPE


In [18]:
from spacy import displacy

displacy.render(doc,style ="ent" ,jupyter =True)