Why does chapter 2 section 7 solution use iteration instead of matching? #59

ab-10 · 2020-05-30T09:07:05Z

Could someone please explain why does the section on Data structures best practices use iteration instead of matching for finding proper noun before a verb? Is it simply to make the pairing between naive solution and the recommended one more direct or is there additional rationale on when to use iteration instead of the excellent matching functionality provided by SpaCy?

Here's the relevant solution code snippet provided:

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Berlin is a nice city")

# Iterate over the tokens
for token in doc:
    # Check if the current token is a proper noun
    if token.pos_ == "PROPN":
        # Check if the next token is a verb
        if doc[token.i + 1].pos_ == "VERB":
            print("Found proper noun before a verb:", token.text)

And here's an example of how I would construct a matcher to extract proper noun followed by a verb:

doc = nlp("Berlin is a nice city")
matcher = Matcher(nlp.vocab)
matcher.add("Proper nouns", None, [{"POS": "PROPN"}, {"POS":"VERB"}])

matches = matcher(doc)
for match in matches:
    print("Found proper noun before a verb:", doc[match[1]])

The text was updated successfully, but these errors were encountered:

ines · 2020-06-01T13:20:09Z

Hi! This is a good question and definitely a valid point. There's no particular reason and the Matcher would definitely be the more elegant and scalable solution in this case. There are other cases where you might still want to iterate over the tokens – for instance, if you're working with the dependency tree or if you're extracting additional information after matching (e.g. check the previous sentence for something).

This particular exercise is based on a real example I once saw on Stack Overflow and I wanted to get the point of using the Doc as the "single source of truth" across. And since rewrite exercises are often a bit more challenging than "fill in the gaps", I didn't want to introduce any new concepts or new token attributes here, and keep the result a bit closer to the original code.

tl;dr: No particular reason, mostly to make the rewrite exercise more straightforward and not ask for too much at once. Matching is perfectly reasonable, too 🙂

ab-10 · 2020-06-01T16:32:11Z

Hi @ines, thank you for the reply (and the exercises too), that clears things up and adds more context to the tradeoffs between the two methods.

ines added the content Issues and PRs related to course content label Jun 1, 2020

ab-10 closed this as completed Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does chapter 2 section 7 solution use iteration instead of matching? #59

Why does chapter 2 section 7 solution use iteration instead of matching? #59

ab-10 commented May 30, 2020 •

edited

ines commented Jun 1, 2020

ab-10 commented Jun 1, 2020

Why does chapter 2 section 7 solution use iteration instead of matching? #59

Why does chapter 2 section 7 solution use iteration instead of matching? #59

Comments

ab-10 commented May 30, 2020 • edited

ines commented Jun 1, 2020

ab-10 commented Jun 1, 2020

ab-10 commented May 30, 2020 •

edited