🎯 Problem: Occurrences After Bigram
📖 The Real Problem
You are given:
- A text (string of words separated by spaces)
- A first word
- A second word
Your task: Find all words that immediately follow the bigram "first second" in the text.
What is a bigram?
- A bigram is a pair of consecutive words
- Example: In "the quick brown fox", bigrams are: (the,quick), (quick,brown), (brown,fox)
Why this problem exists:
- Fundamental text processing operation
- Tests string manipulation and pattern matching
- Foundation for more complex NLP tasks
💡 Why This Matters
Real-world applications:
- Search engines - phrase search and autocomplete
- Text prediction - next word prediction in keyboards
- Language modeling - statistical language models
- Plagiarism detection - matching phrase patterns
- Chatbots - response generation based on context
Skills you'll develop:
- ✅ String splitting and manipulation
- ✅ Pattern matching in text
- ✅ List comprehension
- ✅ Index tracking in arrays
📋 Contributor Tasks
Step 1: Understand the Problem
- Read the text carefully
- Identify all occurrences of "first second" pattern
- For each occurrence, note the word that follows
- Collect all such words
Step 2: Plan Your Approach
- Split text into words (list of words)
- Iterate through the word list
- Look for pattern: words[i] == first AND words[i+1] == second
- When found, add words[i+2] to results (if exists)
- Return the list of result words
Step 3: Implement the Solution
- Split text by spaces into word list
- Loop through indices from 0 to len-3
- Check if current and next word match first and second
- If match, append the word after them to results
- Return results list
Step 4: Test Your Solution
- Test with pattern at beginning
- Test with pattern at end
- Test with multiple occurrences
- Test with no occurrences
- Test with pattern appearing consecutively
✅ Expected Outcome
Function Signature:
def findOcurrences(text, first, second):
"""
Find all words following the bigram 'first second' in text.
Args:
text: str - input text (space-separated words)
first: str - first word of bigram
second: str - second word of bigram
Returns:
List[str]: list of words following the bigram
Example:
>>> findOcurrences("alice is a good girl she is a good student", "a", "good")
['girl', 'student']
"""
Expected Behavior:
- ✅ Returns list of all words following "first second"
- ✅ Returns empty list if pattern not found
- ✅ Handles multiple occurrences correctly
- ✅ Case-sensitive matching
- ✅ Preserves order of occurrences
Example Test Cases:
# Test 1: Basic example
text = "alice is a good girl she is a good student"
findOcurrences(text, "a", "good")
# Returns: ['girl', 'student']
# Explanation: "a good" appears twice, followed by "girl" and "student"
# Test 2: Pattern at end (no following word)
text = "we will we will rock you"
findOcurrences(text, "we", "will")
# Returns: ['we', 'rock']
# Explanation: "we will" appears twice, followed by "we" and "rock"
# Test 3: No occurrences
text = "hello world"
findOcurrences(text, "foo", "bar")
# Returns: []
# Test 4: Single occurrence
text = "the quick brown fox"
findOcurrences(text, "quick", "brown")
# Returns: ['fox']
# Test 5: Consecutive patterns
text = "a b a b a b c"
findOcurrences(text, "a", "b")
# Returns: ['a', 'a', 'c']
📚 Additional Context & References
Understanding Bigrams
What is a Bigram?
- A bigram is a sequence of 2 adjacent elements
- In text: 2 consecutive words
- Used extensively in NLP and speech recognition
Example:
Text: "the cat sat on the mat"
Bigrams: (the,cat), (cat,sat), (sat,on), (on,the), (the,mat)
Implementation Approaches
Approach 1: Simple Iteration (Recommended)
def findOcurrences(text, first, second):
words = text.split()
result = []
for i in range(len(words) - 2):
if words[i] == first and words[i+1] == second:
result.append(words[i+2])
return result
Approach 2: List Comprehension
def findOcurrences(text, first, second):
words = text.split()
return [words[i+2] for i in range(len(words)-2)
if words[i] == first and words[i+1] == second]
Hints (Use Only If Stuck!)
💡 Hint 1
Split the text into a list of words first. Then iterate through the list.
💡 Hint 2
You need to check three consecutive positions: i, i+1, and i+2.
💡 Hint 3
Make sure you don't go out of bounds when accessing i+2.
Complexity Analysis
Time Complexity: O(n)
- Split text: O(n) where n is text length
- Single pass through words: O(m) where m is number of words
- Overall: O(n)
Space Complexity: O(m)
- Store words list: O(m)
- Store results: O(k) where k is number of occurrences
- Overall: O(m)
Edge Cases to Consider
- Empty text: Return []
- Less than 3 words: Return [] (can't have bigram + following word)
- Pattern at end: Don't include (no following word)
- Overlapping patterns: Handle correctly (e.g., "a a a" with first="a", second="a")
- Case sensitivity: "The" ≠ "the"
Related Problems
Once you solve this, try:
- Valid Palindrome - String manipulation
- Reverse Words in a String - Word processing
- Implement strStr() - Pattern matching
- Text Justification - Advanced text processing
Helpful Resources
📝 Notes
- Words are separated by single spaces
- Matching is case-sensitive
- Return words in order of occurrence
- Don't include duplicates automatically (only if they appear multiple times)
- Text contains only lowercase English letters and spaces (in most cases)
Ready to contribute?
- Fork the repository
- Create your solution file
- Test with provided examples
- Submit a pull request!
File Location: exercises/1000_programs/medium/1078_occurrences_after_bigram.py
🚀 Happy coding!
🎯 Problem: Occurrences After Bigram
📖 The Real Problem
You are given:
Your task: Find all words that immediately follow the bigram "first second" in the text.
What is a bigram?
Why this problem exists:
💡 Why This Matters
Real-world applications:
Skills you'll develop:
📋 Contributor Tasks
Step 1: Understand the Problem
Step 2: Plan Your Approach
Step 3: Implement the Solution
Step 4: Test Your Solution
✅ Expected Outcome
Function Signature:
Expected Behavior:
Example Test Cases:
📚 Additional Context & References
Understanding Bigrams
What is a Bigram?
Example:
Implementation Approaches
Approach 1: Simple Iteration (Recommended)
Approach 2: List Comprehension
Hints (Use Only If Stuck!)
💡 Hint 1
Split the text into a list of words first. Then iterate through the list.💡 Hint 2
You need to check three consecutive positions: i, i+1, and i+2.💡 Hint 3
Make sure you don't go out of bounds when accessing i+2.Complexity Analysis
Time Complexity: O(n)
Space Complexity: O(m)
Edge Cases to Consider
Related Problems
Once you solve this, try:
Helpful Resources
📝 Notes
Ready to contribute?
File Location:
exercises/1000_programs/medium/1078_occurrences_after_bigram.py🚀 Happy coding!