### Morphological Analysis (w/ Spacy)

Morphological Analysis is the process of analyzing the structure of words by breaking them down into their components, such as stems, prefixes, suffixes, and roots. It helps to understand the internal structure of words, including the derivation, inflection, and grammatical features.

#### Common aspects of Morphological Analysis include:

- Stemming: Reducing words to their root form (e.g., "running" → "run").
- Lemmatization: Reducing words to their base or dictionary form (e.g., "better" → "good").
- Inflection: Analyzing how words change form to express grammatical features like tense, number, gender, etc. (e.g., "dogs" → "dog" for singular).
- Derivation: Understanding how new words are created from base words by adding prefixes or suffixes (e.g., "happy" → "unhappy").

In short, Morphological Analysis involves understanding the structure of words and how they are formed from smaller units.

#### Morphological Analysis Examples:

- Stemming: "running" → "run"
- Lemmatization: "better" → "good"
- Inflection: "dogs" → "dog" (singular)
- Derivation: "unhappy" → "happy" with prefix "un-"


---


In [None]:
import pandas as pd
import spacy

pd.set_option("display.width", 200)  # Set the display width

nlp = spacy.load("en_core_web_sm")

corpus = [
    "John has 5 apples, but he gave 2 to his friend.",
    "The event will start at 7:30 PM, and we expect around 100 guests.",
    "I have read 3 books this month, and I'm planning to read 2 more before the end of the week.",
]

# word = "books"
# for token in doc:
#     print(f"{"Text":<10}: {token.text}")                          # books
#     print(f"{"Lemma":<10}:", token.lemma_)                        # book
#     print(f"{"PoS":<10}:", token.pos_)                            # NOUN
#     print(f"{"Tag":<10}:", token.tag_)  # Plural                  # NNS
#     print(f"{"Dependency":<10}:", token.dep_)                     # ROOT
#     print(f"{"Shape":<10}:", token.shape_)                        # xxxx
#     print(f"{"Is alpha":<10}:", token.is_alpha)                   # True
#     print(f"{"Is stop":<10}:", token.is_stop)                     # False
#     print(f"{"Morphology":<10}:", token.morph)                    # Number=Plur
#     print(f"{"Is plural":<10}: {'Number=Plur' in token.morph}")   # True

docs = []
for sentence in corpus:
    doc = nlp(sentence)
    for token in doc:
        res = {
            "word": token.text,
            "lemma_": token.lemma_,
            "pos_": token.pos_,
            "tag_": token.tag_,
            "dep_": token.dep_,
            "shape_": token.shape_,
            "is_alpha": token.is_alpha,
            "is_stop": token.is_stop,
            "morph": token.morph,
            "is_plural": "Number=Plur" in token.morph,
        }
        docs.append(res)


df = pd.DataFrame(docs)

print(df)

        word  lemma_   pos_  tag_      dep_ shape_  is_alpha  is_stop                                              morph  is_plural
0       John    John  PROPN   NNP     nsubj   Xxxx      True    False                                      (Number=Sing)      False
1        has    have   VERB   VBZ      ROOT    xxx      True     True  (Mood=Ind, Number=Sing, Person=3, Tense=Pres, ...      False
2          5       5    NUM    CD    nummod      d     False    False                                     (NumType=Card)      False
3     apples   apple   NOUN   NNS      dobj   xxxx      True    False                                      (Number=Plur)       True
4          ,       ,  PUNCT     ,     punct      ,     False    False                                   (PunctType=Comm)      False
5        but     but  CCONJ    CC        cc    xxx      True     True                                     (ConjType=Cmp)      False
6         he      he   PRON   PRP     nsubj     xx      True     True  (Case