You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could someone please explain why does the section on Data structures best practices use iteration instead of matching for finding proper noun before a verb? Is it simply to make the pairing between naive solution and the recommended one more direct or is there additional rationale on when to use iteration instead of the excellent matching functionality provided by SpaCy?
Here's the relevant solution code snippet provided:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Berlin is a nice city")
# Iterate over the tokens
for token in doc:
# Check if the current token is a proper noun
if token.pos_ == "PROPN":
# Check if the next token is a verb
if doc[token.i + 1].pos_ == "VERB":
print("Found proper noun before a verb:", token.text)
And here's an example of how I would construct a matcher to extract proper noun followed by a verb:
doc = nlp("Berlin is a nice city")
matcher = Matcher(nlp.vocab)
matcher.add("Proper nouns", None, [{"POS": "PROPN"}, {"POS":"VERB"}])
matches = matcher(doc)
for match in matches:
print("Found proper noun before a verb:", doc[match[1]])
The text was updated successfully, but these errors were encountered:
Hi! This is a good question and definitely a valid point. There's no particular reason and the Matcher would definitely be the more elegant and scalable solution in this case. There are other cases where you might still want to iterate over the tokens – for instance, if you're working with the dependency tree or if you're extracting additional information after matching (e.g. check the previous sentence for something).
This particular exercise is based on a real example I once saw on Stack Overflow and I wanted to get the point of using the Doc as the "single source of truth" across. And since rewrite exercises are often a bit more challenging than "fill in the gaps", I didn't want to introduce any new concepts or new token attributes here, and keep the result a bit closer to the original code.
tl;dr: No particular reason, mostly to make the rewrite exercise more straightforward and not ask for too much at once. Matching is perfectly reasonable, too 🙂
Could someone please explain why does the section on Data structures best practices use iteration instead of matching for finding proper noun before a verb? Is it simply to make the pairing between naive solution and the recommended one more direct or is there additional rationale on when to use iteration instead of the excellent matching functionality provided by SpaCy?
Here's the relevant solution code snippet provided:
And here's an example of how I would construct a matcher to extract proper noun followed by a verb:
The text was updated successfully, but these errors were encountered: