# Course Exercises notebook : Advanced NLP With Python for Machine Learning Training Courses (LinkedIN)

## Build a spaCy Processing Pipeline

### Exercise #1: Load Resources

Load spaCy Resources  : 


- import pandas as pd
- Import spaCy
- Install spaCy
- Download the English language model for spaCy
- Load the English model
  
When you execute nlp = spacy.load('en'), spaCy downloads and loads the pre-trained English language model into memory and assigns it to the variable nlp. This pre-trained model contains information about word vectors, part-of-speech tags, syntactic dependencies, and other linguistic features necessary for various NLP tasks.

#### spaCy Processing Pipeline

In spaCy, the order of tasks in the processing pipeline generally follows a predefined sequence, although it's also customizable. By default, spaCy's processing pipeline includes the following components in the specified order:

#### Order of Tasks in the Processing Pipeline

| Order | Name                        | Definition                                                                                          |
|-------|-----------------------------|-----------------------------------------------------------------------------------------------------|
| 1     | Tokenization                | Input text is split into individual tokens, such as words and punctuation marks.                   |
| 2     | Stop Words                  | Removes stop words from the text.                                                                   |
| 3     | POS Tagging                 | Assigns grammatical labels (e.g., noun, verb, adjective) to each token in the text based on its syntactic role within the sentence. |
| 4     | Dependency Parsing          | Analyzes the grammatical structure of the text by determining the relationships between tokens.     |
| 5     | Lemmatization               | Reduces tokens to their base or root form (lemmas).                                                 |
| 6     | Named Entity Recognition    | Identifies and categorizes persons, organizations, locations, dates, etc.                           |
| 7     | Other Use Case Tasks        | May be included in the pipeline (e.g., Sentiment Analysis).                                        |


<h2 style="color: green;">Solution1</h2>


In [1]:
# Step 1: Install spaCy (run this once in terminal)
# pip install spacy
# python -m spacy download en_core_web_md

# Step 2: Import libraries
import pandas as pd  # For data handling (optional for this exercise)
import spacy  # For NLP tasks

# Step 3: Load the English model
nlp = spacy.load('en_core_web_sm')  # Load the English language model

# Step 4: Example text
text = "SpaCy is an open-source library for advanced Natural Language Processing."

# Step 5: Process the text
doc = nlp(text)

# Step 6: Output the results
print("Tokens:")
for token in doc:
    print(token.text)

print("\nNamed Entities:")
for ent in doc.ents:
    print(ent.text, ent.label_)

print("\nPart-of-Speech Tags:")
for token in doc:
    print(f"{token.text}: {token.pos_}")


Tokens:
SpaCy
is
an
open
-
source
library
for
advanced
Natural
Language
Processing
.

Named Entities:
SpaCy PERSON
Natural Language Processing WORK_OF_ART

Part-of-Speech Tags:
SpaCy: PROPN
is: AUX
an: DET
open: ADJ
-: PUNCT
source: NOUN
library: NOUN
for: ADP
advanced: ADJ
Natural: PROPN
Language: PROPN
Processing: NOUN
.: PUNCT
