✅ 1. Shallow Parsing (Chunking)
Shallow parsing identifies constituents (like noun phrases, verb phrases) in a sentence, but not the entire syntax tree. We typically use nltk for this.

In [1]:
import nltk
nltk.download('punkt_tab')
# Download the necessary resource for POS tagging
nltk.download('averaged_perceptron_tagger_eng')
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.chunk import RegexpParser

# Sample sentence
sentence = "Gauri is working on a cloud-based resume shortlisting project."

# Tokenization and POS tagging
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)

# Define a chunk grammar for noun phrases (NP)
chunk_grammar = "NP: {<DT>?<JJ>*<NN.*>+}"
chunk_parser = RegexpParser(chunk_grammar)

# Parse
tree = chunk_parser.parse(pos_tags)

# Print the tree structure instead of drawing it
print(tree)  # Prints a textual representation of the tree

(S
  (NP Gauri/NNP)
  is/VBZ
  working/VBG
  on/IN
  (NP a/DT cloud-based/JJ resume/NN)
  shortlisting/VBG
  (NP project/NN)
  ./.)


[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\Gauri\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\Gauri\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


✅ 2. Regex Parsing
Regex parsing is more flexible but less linguistically aware. You define regex rules to extract patterns from the text.

In [2]:
import re

text = """
Contact us at support@example.com. The project deadline is 12/05/2025.
"""

# Email pattern
emails = re.findall(r'\b[\w.-]+?@\w+?\.\w+?\b', text)

# Date pattern (dd/mm/yyyy or similar)
dates = re.findall(r'\b\d{2}/\d{2}/\d{4}\b', text)

print("Emails:", emails)
print("Dates:", dates)


Emails: ['support@example.com']
Dates: ['12/05/2025']
