## Word Frequency **Counter** Using SPACY

In [3]:
import spacy
from collections import Counter

In [4]:
# Loading the small English model in spaCy

nlp = spacy.load("en_core_web_sm")

In [5]:
def word_frequency_counter(text):
    # Process the text with spaCy to create a Doc object
    doc = nlp(text)

    # Filter tokens: keep only alphabetic words and remove stopwords and punctuations
    filtered_tokens = [token.text.lower() for token in doc
                       if token.is_alpha and not token.is_stop]

    # Use Counter to count the frequency of each filtered word
    frequency = Counter(filtered_tokens)

    return frequency

In [9]:
# Example text
sample_text = """
Cristiano Ronaldo's family helps him stay grounded but they can also break his heart at times.
The five-time Ballon d'Or winner was reduced to tears when recently shown a videotaped interview of his late father conducted in 2004, a year before Cristiano Ronaldo Father's death.
"""

In [10]:
# Get the word frequencies
freq = word_frequency_counter(sample_text)

In [11]:
# Print frequencies in descending order
for word, count in freq.most_common():
    print(f"{word}: {count}")

cristiano: 2
ronaldo: 2
father: 2
family: 1
helps: 1
stay: 1
grounded: 1
break: 1
heart: 1
times: 1
time: 1
ballon: 1
winner: 1
reduced: 1
tears: 1
recently: 1
shown: 1
videotaped: 1
interview: 1
late: 1
conducted: 1
year: 1
death: 1


***Why We Use These Functions:***
**nlp(text):** This converts the raw text into a spaCy Doc object which contains linguistic annotations like tokens, lemmas, part of speech, etc. This is the core processing step.

**token.is_alpha:** Filters tokens to keep only alphabetic words (ignores numbers, punctuation, symbols) — this keeps our count focused on real words.

**token.is_stop:** Removes stopwords (common filler words) which are not useful for meaningful frequency analysis.

**token.text.lower():** Converts all tokens to lowercase so words like "Amazon" and "amazon" are counted together.

**Counter:** A built-in Python collection to count how many times each word appears.

