Skip to content

[MEDIUM] #1078 - Occurrences After Bigram #1865

@hackdartstorm

Description

@hackdartstorm

🎯 Problem: Occurrences After Bigram

📖 The Real Problem

You are given:

  1. A text (string of words separated by spaces)
  2. A first word
  3. A second word

Your task: Find all words that immediately follow the bigram "first second" in the text.

What is a bigram?

  • A bigram is a pair of consecutive words
  • Example: In "the quick brown fox", bigrams are: (the,quick), (quick,brown), (brown,fox)

Why this problem exists:

  • Fundamental text processing operation
  • Tests string manipulation and pattern matching
  • Foundation for more complex NLP tasks

💡 Why This Matters

Real-world applications:

  • Search engines - phrase search and autocomplete
  • Text prediction - next word prediction in keyboards
  • Language modeling - statistical language models
  • Plagiarism detection - matching phrase patterns
  • Chatbots - response generation based on context

Skills you'll develop:

  • ✅ String splitting and manipulation
  • ✅ Pattern matching in text
  • ✅ List comprehension
  • ✅ Index tracking in arrays

📋 Contributor Tasks

Step 1: Understand the Problem

  1. Read the text carefully
  2. Identify all occurrences of "first second" pattern
  3. For each occurrence, note the word that follows
  4. Collect all such words

Step 2: Plan Your Approach

  1. Split text into words (list of words)
  2. Iterate through the word list
  3. Look for pattern: words[i] == first AND words[i+1] == second
  4. When found, add words[i+2] to results (if exists)
  5. Return the list of result words

Step 3: Implement the Solution

  1. Split text by spaces into word list
  2. Loop through indices from 0 to len-3
  3. Check if current and next word match first and second
  4. If match, append the word after them to results
  5. Return results list

Step 4: Test Your Solution

  1. Test with pattern at beginning
  2. Test with pattern at end
  3. Test with multiple occurrences
  4. Test with no occurrences
  5. Test with pattern appearing consecutively

✅ Expected Outcome

Function Signature:

def findOcurrences(text, first, second):
    """
    Find all words following the bigram 'first second' in text.
    
    Args:
        text: str - input text (space-separated words)
        first: str - first word of bigram
        second: str - second word of bigram
    
    Returns:
        List[str]: list of words following the bigram
    
    Example:
        >>> findOcurrences("alice is a good girl she is a good student", "a", "good")
        ['girl', 'student']
    """

Expected Behavior:

  • ✅ Returns list of all words following "first second"
  • ✅ Returns empty list if pattern not found
  • ✅ Handles multiple occurrences correctly
  • ✅ Case-sensitive matching
  • ✅ Preserves order of occurrences

Example Test Cases:

# Test 1: Basic example
text = "alice is a good girl she is a good student"
findOcurrences(text, "a", "good")
# Returns: ['girl', 'student']
# Explanation: "a good" appears twice, followed by "girl" and "student"

# Test 2: Pattern at end (no following word)
text = "we will we will rock you"
findOcurrences(text, "we", "will")
# Returns: ['we', 'rock']
# Explanation: "we will" appears twice, followed by "we" and "rock"

# Test 3: No occurrences
text = "hello world"
findOcurrences(text, "foo", "bar")
# Returns: []

# Test 4: Single occurrence
text = "the quick brown fox"
findOcurrences(text, "quick", "brown")
# Returns: ['fox']

# Test 5: Consecutive patterns
text = "a b a b a b c"
findOcurrences(text, "a", "b")
# Returns: ['a', 'a', 'c']

📚 Additional Context & References

Understanding Bigrams

What is a Bigram?

  • A bigram is a sequence of 2 adjacent elements
  • In text: 2 consecutive words
  • Used extensively in NLP and speech recognition

Example:

Text: "the cat sat on the mat"
Bigrams: (the,cat), (cat,sat), (sat,on), (on,the), (the,mat)

Implementation Approaches

Approach 1: Simple Iteration (Recommended)

def findOcurrences(text, first, second):
    words = text.split()
    result = []
    
    for i in range(len(words) - 2):
        if words[i] == first and words[i+1] == second:
            result.append(words[i+2])
    
    return result

Approach 2: List Comprehension

def findOcurrences(text, first, second):
    words = text.split()
    return [words[i+2] for i in range(len(words)-2) 
            if words[i] == first and words[i+1] == second]

Hints (Use Only If Stuck!)

💡 Hint 1 Split the text into a list of words first. Then iterate through the list.
💡 Hint 2 You need to check three consecutive positions: i, i+1, and i+2.
💡 Hint 3 Make sure you don't go out of bounds when accessing i+2.

Complexity Analysis

Time Complexity: O(n)

  • Split text: O(n) where n is text length
  • Single pass through words: O(m) where m is number of words
  • Overall: O(n)

Space Complexity: O(m)

  • Store words list: O(m)
  • Store results: O(k) where k is number of occurrences
  • Overall: O(m)

Edge Cases to Consider

  1. Empty text: Return []
  2. Less than 3 words: Return [] (can't have bigram + following word)
  3. Pattern at end: Don't include (no following word)
  4. Overlapping patterns: Handle correctly (e.g., "a a a" with first="a", second="a")
  5. Case sensitivity: "The" ≠ "the"

Related Problems

Once you solve this, try:

  • Valid Palindrome - String manipulation
  • Reverse Words in a String - Word processing
  • Implement strStr() - Pattern matching
  • Text Justification - Advanced text processing

Helpful Resources

📝 Notes

  • Words are separated by single spaces
  • Matching is case-sensitive
  • Return words in order of occurrence
  • Don't include duplicates automatically (only if they appear multiple times)
  • Text contains only lowercase English letters and spaces (in most cases)

Ready to contribute?

  1. Fork the repository
  2. Create your solution file
  3. Test with provided examples
  4. Submit a pull request!

File Location: exercises/1000_programs/medium/1078_occurrences_after_bigram.py

🚀 Happy coding!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions