<a href="https://colab.research.google.com/github/leilafarsani/NLP-go-get/blob/main/NLP01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NLP Workshop (Session 1)

Learning and implementing Natural Language Processing (NLP) techniques with Python's NLTK and spaCy libraries.

## 1. NLP Toolkit

Two fundamental libraries for natural language processing:

### NLTK (Natural Language Toolkit)
- **Purpose**: Open-source Python library designed for educational and research applications
- **Features**: Comprehensive tools for tokenization, stemming, lemmatization, parsing, and more
- **Resources**: Includes extensive corpora and lexical resources
- **Documentation**: [NLTK Website](https://www.nltk.org/) | [Wikipedia](https://en.wikipedia.org/wiki/Natural_Language_Toolkit)

### spaCy
- **Purpose**: Industrial-strength NLP library optimized for real-world applications
- **Features**: Fast and accurate processing for tokenization, named entity recognition, and dependency parsing
- **Strengths**: Modern API with efficient design for large-scale text processing
- **Documentation**: [spaCy Website](https://spacy.io/) | [Wikipedia](https://en.wikipedia.org/wiki/SpaCy)

## 2. Library Setup

### NLTK Installation and Resource Downloads
- Installs the NLTK library and imports it
- Downloads essential NLTK resources:
  - `punkt`: Tokenizer for splitting text into sentences and words
  - `punkt_tab`: Tab-delimited version of punkt tokenizer models
  - `stopwords`: Common words often filtered out in NLP tasks
  - `wordnet`: Lexical database for word relationships
  - `gutenberg`: Corpus of classic literature texts
  - `averaged_perceptron_tagger_eng`: For part-of-speech tagging

### spaCy Installation and Model Setup
- Installs the spaCy library and imports it
- Downloads the small English language model (`en_core_web_sm`)
- Creates the spaCy NLP pipeline with `nlp = spacy.load("en_core_web_sm")`

In [None]:
# --- NLTK Setup ---
!pip install nltk
import nltk
# Download essential NLTK datasets: tokenizers, stopwords, WordNet, etc.
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('gutenberg')
nltk.download('averaged_perceptron_tagger_eng')

# --- spaCy Setup ---
!pip install spacy
import spacy

# Download and load the small English model for spaCy
!python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")
