# Spacy and NLTK
In this tutorial, we'll cover the following topics:
- Introduction to NLP and spaCy.
- Installing spaCy and language models.
- Basic spaCy usage.
- Tokenization and Text Preprocessing.
- Part-of-Speech Tagging.
- Named Entity Recognition.
- Dependency Parsing.
- Customizing spaCy pipelines.
- Text Classification using spaCy.

**Note**: Before starting, make sure you have spaCy and its language model installed. You can install spaCy and a language model like 'en_core_web_sm' using pip:

In [1]:
# pip install spacy
# python -m spacy download en_core_web_sm   (for english language)
# python -m spacy download en   (alternative)

In [3]:
''' 
I got error while running "python -m spacy download en" command
ERROR: type object 'h5py.h5.H5PYConfig' has no attribute '__reduce_cython__

pip install --upgrade h5py resolved the issue.
'''

' \nI got error while running "python -m spacy download en" command\nERROR: type object \'h5py.h5.H5PYConfig\' has no attribute \'__reduce_cython__\n\npip install --upgrade h5py resolved the issue.\n'

### Sentence & Word Tokenization In Spacy

In [4]:
import spacy

In [5]:
nlp = spacy.load("en_core_web_sm")

doc = nlp("Dr. Strange loves pav bhaji of mumbai. Hulk loves chat of delhi")

In [6]:
for sentence in doc.sents:
    print(sentence)

Dr. Strange loves pav bhaji of mumbai.
Hulk loves chat of delhi


In [7]:
for sentence in doc.sents:
    for word in sentence:
        print(word)

Dr.
Strange
loves
pav
bhaji
of
mumbai
.
Hulk
loves
chat
of
delhi


### Sentence & Word Tokenization In NLTK

In [8]:
from nltk.tokenize import sent_tokenize
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\erkun\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [9]:
sent_tokenize("Dr. Strange loves pav bhaji of mumbai. Hulk loves chat of delhi")

['Dr.', 'Strange loves pav bhaji of mumbai.', 'Hulk loves chat of delhi']

In [10]:
from nltk.tokenize import word_tokenize

In [11]:
word_tokenize("Dr. Strange loves pav bhaji of mumbai. Hulk loves chat of delhi")

['Dr',
 '.',
 'Strange',
 'loves',
 'pav',
 'bhaji',
 'of',
 'mumbai',
 '.',
 'Hulk',
 'loves',
 'chat',
 'of',
 'delhi']

In [None]:
# pip install nltk

## Difference between spaCy and NLTK
From above code you can see that Spacy is object oriented whereas NLTK is a string processing library
![Alt text](img/img01.png)

# Reference: 
[1] https://www.youtube.com/watch?v=h2kBNEShsiE&t=1s