# NLTK - Natural Language Tool Kit

* NLTK, or Natural Language Toolkit, is a prominent Python library designed for working with human language data.

* It offers a comprehensive suite of tools for various natural language processing (NLP) tasks, including:

Tokenization: Breaking text into individual words or sentences.

Stemming and Lemmatization: Reducing words to their base or root forms.

Part-of-Speech Tagging: Identifying grammatical categories of words.

Parsing: Analyzing the grammatical structure of sentences.

Named Entity Recognition: Detecting and classifying key entities in text.

In [14]:
import nltk

In [17]:
sentence = """Most expectant mothers talk to their unborn. But what if the unborn starts to respond?"""
token = nltk.word_tokenize(sentence)

In [18]:
token

['Most',
 'expectant',
 'mothers',
 'talk',
 'to',
 'their',
 'unborn',
 '.',
 'But',
 'what',
 'if',
 'the',
 'unborn',
 'starts',
 'to',
 'respond',
 '?']

* tokenization is the process of splitting text into smaller units called tokens, which can be words, sentences, or subwords.

In [23]:
tags = nltk.pos_tag(token)
tags[0:3]

[('Most', 'RBS'), ('expectant', 'JJ'), ('mothers', 'NNS')]

* Gives us English Grammer classification

# Spacy
Open-source library for natural language processing

In [5]:
#importing spacy
import spacy

In [7]:
#Loading spacy with 'en_core_web_sm'
nlp = spacy.load('en_core_web_sm')

In [9]:
#Calling nlp with string and checking type
introduction_doc = nlp("This is about Natural Language Processing in spacy")
type(introduction_doc)

spacy.tokens.doc.Doc

* The type tells us it's a spacy token

In spaCy, a Token is a fundamental unit of text that represents individual pieces of a document, such as words or punctuation marks. The library processes text by first tokenizing it, which involves breaking down the text into these basic units. Each token is represented as a Token object within a Doc object, which holds all the annotations and information about the text.

In [11]:
print([token.text for token in introduction_doc])

['This', 'is', 'about', 'Natural', 'Language', 'Processing', 'in', 'spacy']


# Dependancies

* This dependancies install spacy
* Then install english language model

In [12]:
!python -m pip install nltk



In [16]:
 nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [20]:
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


True

In [3]:
#Installing Spacy
!python -m pip install spacy



In [5]:
#default model for the English language
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
