# Morphological Analysis: Stemming, Lemmatization, and Morpheme Segmentation

This notebook demonstrates:
1. Stemming and Lemmatization using multiple libraries
2. Morpheme extraction (prefixes, roots, suffixes)
3. Classification of morphological changes (inflectional vs derivational)

## Setup and Import Libraries

In [1]:
import nltk
from nltk.stem import PorterStemmer, SnowballStemmer
from nltk.stem import WordNetLemmatizer
import spacy
import pandas as pd

# Download required NLTK data
nltk.download('wordnet', quiet=True)
nltk.download('omw-1.4', quiet=True)

# Load spaCy model
nlp = spacy.load('en_core_web_sm')

## Define Word Set

In [2]:
words = ["unbelievably", "happiest", "disagreement", "rewritten", "wolves", "singing", "unfolded"]

## Task 1: Stemming and Lemmatization

We'll use two stemming algorithms (Porter and Snowball) and two lemmatization approaches (NLTK WordNet and spaCy).

In [3]:
# Initialize stemmers and lemmatizers
porter = PorterStemmer()
snowball = SnowballStemmer('english')
wordnet_lemmatizer = WordNetLemmatizer()

# Process each word
results = []
for word in words:
    doc = nlp(word)
    results.append({
        'Word': word,
        'Porter Stem': porter.stem(word),
        'Snowball Stem': snowball.stem(word),
        'WordNet Lemma': wordnet_lemmatizer.lemmatize(word),
        'spaCy Lemma': doc[0].lemma_
    })

df_stemming = pd.DataFrame(results)
df_stemming

Unnamed: 0,Word,Porter Stem,Snowball Stem,WordNet Lemma,spaCy Lemma
0,unbelievably,unbeliev,unbeliev,unbelievably,unbelievably
1,happiest,happiest,happiest,happiest,happy
2,disagreement,disagr,disagr,disagreement,disagreement
3,rewritten,rewritten,rewritten,rewritten,rewrite
4,wolves,wolv,wolv,wolf,wolf
5,singing,sing,sing,singing,singe
6,unfolded,unfold,unfold,unfolded,unfold


### Interpretation

- **Stemming** (Porter & Snowball): Removes affixes using rule-based methods, sometimes producing non-words (e.g., "unbeliev" from "unbelievably")
- **Lemmatization** (WordNet & spaCy): Returns dictionary forms, preserving valid words (e.g., "wolf" from "wolves")
- **Differences**: Lemmatization is more accurate but computationally intensive; stemming is faster but crude

## Task 2: Morpheme Segmentation

Extract prefixes, roots, and suffixes for each word.

In [4]:
# Define common prefixes and suffixes
prefixes = ['un', 're', 'dis', 'pre', 'mis', 'de', 'over', 'sub']
suffixes = ['ly', 'est', 'ing', 'ed', 'ment', 's', 'es', 'er', 'en', 'able', 'ible']

def extract_morphemes(word):
    """Extract prefixes, root, and suffixes from a word"""
    original = word
    prefix_list = []
    suffix_list = []
    
    # Extract prefixes
    while True:
        found = False
        for prefix in sorted(prefixes, key=len, reverse=True):
            if word.startswith(prefix):
                prefix_list.append(prefix)
                word = word[len(prefix):]
                found = True
                break
        if not found:
            break
    
    # Extract suffixes
    while True:
        found = False
        for suffix in sorted(suffixes, key=len, reverse=True):
            if word.endswith(suffix) and len(word) > len(suffix):
                suffix_list.insert(0, suffix)
                word = word[:-len(suffix)]
                found = True
                break
        if not found:
            break
    
    root = word
    
    return {
        'Word': original,
        'Prefixes': ', '.join(prefix_list) if prefix_list else 'None',
        'Root': root,
        'Suffixes': ', '.join(suffix_list) if suffix_list else 'None',
        'Structure': ' + '.join(prefix_list + [root] + suffix_list)
    }

# Extract morphemes for all words
morpheme_results = [extract_morphemes(word) for word in words]
df_morphemes = pd.DataFrame(morpheme_results)
df_morphemes

Unnamed: 0,Word,Prefixes,Root,Suffixes,Structure
0,unbelievably,un,believab,ly,un + believab + ly
1,happiest,,happi,est,happi + est
2,disagreement,dis,agree,ment,dis + agree + ment
3,rewritten,re,writt,en,re + writt + en
4,wolves,,wolv,es,wolv + es
5,singing,,s,"ing, ing",s + ing + ing
6,unfolded,un,fold,ed,un + fold + ed


### Interpretation

- **Prefixes**: Added before roots to modify meaning (e.g., "un-" negates, "re-" means again)
- **Root**: Core meaning-bearing unit of the word
- **Suffixes**: Added after roots to modify meaning or grammatical function
- **Structure**: Shows complete morphological decomposition

## Task 3: Morphological Classification

Classify each morphological change as inflectional or derivational with justification.

In [5]:
def classify_morphological_change(word):
    """Classify morphological changes and provide justification"""
    classifications = []
    
    # Inflectional patterns (preserve word class)
    inflectional_patterns = {
        'est': 'Superlative adjective form',
        'ing': 'Present participle/gerund verb form',
        'ed': 'Past tense/past participle verb form',
        'en': 'Past participle verb form',
        's': 'Plural noun form',
        'es': 'Plural noun form'
    }
    
    # Derivational patterns (change word class or meaning)
    derivational_patterns = {
        'un': 'Negation prefix (changes meaning)',
        're': 'Repetition prefix (changes meaning)',
        'dis': 'Negation/reversal prefix (changes meaning)',
        'ly': 'Adverb-forming suffix (adjective → adverb)',
        'ment': 'Noun-forming suffix (verb → noun)'
    }
    
    morphemes = extract_morphemes(word)
    
    # Check prefixes
    if morphemes['Prefixes'] != 'None':
        for prefix in morphemes['Prefixes'].split(', '):
            if prefix in derivational_patterns:
                classifications.append({
                    'Morpheme': prefix + '-',
                    'Type': 'Derivational',
                    'Justification': derivational_patterns[prefix]
                })
    
    # Check suffixes
    if morphemes['Suffixes'] != 'None':
        for suffix in morphemes['Suffixes'].split(', '):
            if suffix in inflectional_patterns:
                classifications.append({
                    'Morpheme': '-' + suffix,
                    'Type': 'Inflectional',
                    'Justification': inflectional_patterns[suffix]
                })
            elif suffix in derivational_patterns:
                classifications.append({
                    'Morpheme': '-' + suffix,
                    'Type': 'Derivational',
                    'Justification': derivational_patterns[suffix]
                })
    
    return word, classifications

# Classify all words
classification_results = []
for word in words:
    word_orig, classifications = classify_morphological_change(word)
    for classification in classifications:
        classification_results.append({
            'Word': word_orig,
            'Morpheme': classification['Morpheme'],
            'Type': classification['Type'],
            'Justification': classification['Justification']
        })

df_classification = pd.DataFrame(classification_results)
df_classification

Unnamed: 0,Word,Morpheme,Type,Justification
0,unbelievably,un-,Derivational,Negation prefix (changes meaning)
1,unbelievably,-ly,Derivational,Adverb-forming suffix (adjective → adverb)
2,happiest,-est,Inflectional,Superlative adjective form
3,disagreement,dis-,Derivational,Negation/reversal prefix (changes meaning)
4,disagreement,-ment,Derivational,Noun-forming suffix (verb → noun)
5,rewritten,re-,Derivational,Repetition prefix (changes meaning)
6,rewritten,-en,Inflectional,Past participle verb form
7,wolves,-es,Inflectional,Plural noun form
8,singing,-ing,Inflectional,Present participle/gerund verb form
9,singing,-ing,Inflectional,Present participle/gerund verb form


### Interpretation

**Inflectional Morphemes:**
- Do not change word class (noun stays noun, verb stays verb)
- Express grammatical relationships (tense, number, degree)
- Examples: -est (superlative), -ing (progressive), -s (plural)

**Derivational Morphemes:**
- Often change word class (verb → noun, adjective → adverb)
- Create new words with different meanings
- Examples: un- (negation), -ly (adverb formation), -ment (noun formation)

### Key Takeaways

1. **Stemming vs Lemmatization**: Lemmatization produces valid words while stemming may produce stems
2. **Morpheme Analysis**: Words are built from prefixes, roots, and suffixes combined systematically
3. **Inflectional vs Derivational**: Inflectional morphemes modify grammatical form; derivational morphemes create new words
4. **Practical Application**: Understanding morphology helps in NLP tasks like information retrieval, text normalization, and semantic analysis