## Natural Language Toolkit (NLTK)

Natural Language Toolkit (NLTK) is a powerful Python library used for natural language processing (NLP) tasks. It provides tools and resources for processing and analyzing human language data, making it a valuable asset for developers, researchers, and linguists working with textual data. NLTK offers a wide range of functionalities, including tokenization, stemming, lemmatization, part-of-speech tagging, parsing, and more. It also includes corpora and lexical resources for various languages, as well as modules for machine learning and language understanding tasks.

### NLTK

1. <b>Preprocessing tools</b>: like tokenization, stemming, stopwords removal
2. <b>Analysis tools</b>: like part-of-speech tagging
3. <b>Large language resources</b>: like WordNet (lexical database of English)
4. <b>Machine Learning</b>: like text classification

NLTK: https://www.nltk.org/

### Key Features and Capabilities of NLTK
2.  Tokenization
3.  Stemming and Lemmatization
4.  Part-of-Speech (POS) Tagging
5.  Parsing and Chunking
6.  Semantic Reasoning
7.  Named Entity Recognition (NER)
8.  Text Classification and Machine Learning

8. Corpora and Lexical Resources: https://www.nltk.org/nltk_data/

### Installing NLTK

In [1]:
# !pip install nltk

In [None]:
# Requirement already satisfied: nltk in c:\users\rizwan\...
# Requirement already satisfied: click in c:\users\rizwan\...
# Requirement already satisfied: joblib in c:\users\rizwan\...
# Requirement already satisfied: regex>=2021.8.3 in c:\users\rizwan\...
# Requirement already satisfied: tqdm in c:\users\rizwan\...

### Importing NLTK Python Library 

In [2]:
import nltk

In [3]:
nltk.__version__

'3.8.1'

### NLTK Corpora

In [4]:
# Download Tokenization Models
nltk.download('punkt')  

# Download POS Tagging Models
nltk.download('averaged_perceptron_tagger') 

In [None]:
# [nltk_data] Downloading package punkt to
# [nltk_data]     C:\Users\Rizwan\AppData\Roaming\nltk_data...
# [nltk_data]   Unzipping tokenizers\punkt.zip.
# [nltk_data] Downloading package averaged_perceptron_tagger to
# [nltk_data]     C:\Users\Rizwan\AppData\Roaming\nltk_data...
# [nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.

# True

In [5]:
# Download Twitter Samples
# ?

# Download Stopwords Corpus 
# ?

NLTK Corpora: https://www.nltk.org/nltk_data/

### Some simple things you can do with NLTK

In [6]:
sentence = """Entity, headquartered in Lahore (Gulberg-III, Punjab, Pakistan), unveiled 
the new Luxury Fabrics at the Expo Centre Lahore, Muhammad Rizwan said in his keynote 
that users love their new Luxury Fabrics."""

In [7]:
tokens = nltk.word_tokenize(sentence)

In [8]:
tokens

['Entity',
 ',',
 'headquartered',
 'in',
 'Lahore',
 '(',
 'Gulberg-III',
 ',',
 'Punjab',
 ',',
 'Pakistan',
 ')',
 ',',
 'unveiled',
 'the',
 'new',
 'Luxury',
 'Fabrics',
 'at',
 'the',
 'Expo',
 'Centre',
 'Lahore',
 ',',
 'Muhammad',
 'Rizwan',
 'said',
 'in',
 'his',
 'keynote',
 'that',
 'users',
 'love',
 'their',
 'new',
 'Luxury',
 'Fabrics',
 '.']

In [9]:
tagged = nltk.pos_tag(tokens)

In [10]:
tagged

[('Entity', 'NN'),
 (',', ','),
 ('headquartered', 'VBN'),
 ('in', 'IN'),
 ('Lahore', 'NNP'),
 ('(', '('),
 ('Gulberg-III', 'NNP'),
 (',', ','),
 ('Punjab', 'NNP'),
 (',', ','),
 ('Pakistan', 'NNP'),
 (')', ')'),
 (',', ','),
 ('unveiled', 'VBD'),
 ('the', 'DT'),
 ('new', 'JJ'),
 ('Luxury', 'NNP'),
 ('Fabrics', 'NNS'),
 ('at', 'IN'),
 ('the', 'DT'),
 ('Expo', 'NNP'),
 ('Centre', 'NNP'),
 ('Lahore', 'NNP'),
 (',', ','),
 ('Muhammad', 'NNP'),
 ('Rizwan', 'NNP'),
 ('said', 'VBD'),
 ('in', 'IN'),
 ('his', 'PRP$'),
 ('keynote', 'NN'),
 ('that', 'IN'),
 ('users', 'NNS'),
 ('love', 'VBP'),
 ('their', 'PRP$'),
 ('new', 'JJ'),
 ('Luxury', 'NNP'),
 ('Fabrics', 'NNP'),
 ('.', '.')]

In [11]:
# NN:   Noun, singular
# NNP:  Proper noun, singular
# NNS:  Noun, plural
# VBN:  Verb, past participle
# VBD:  Verb, past tense
# VBP:  Verb, non-3rd person singular present
# IN:   Preposition or subordinating conjunction
# DJ:   Adjective (However, it's more commonly represented as JJ for "adjective" in NLP contexts.)
# DT:   Determiner (e.g., articles like "the," "a," "an")
# JJ:   Adjective
# PRP$: Possessive pronoun (e.g., "his," "her," "its")

('Entity', 'ORG') 😍