You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def transform_texts(texts):
# Load the annotation models
nlp = English()
# Stream texts through the models. We accumulate a buffer and release
# the GIL around the parser, for efficient multi-threading.
for doc in nlp.pipe(texts, n_threads=4):
# Iterate over base NPs, e.g. "all their good ideas"
for np in doc.noun_chunks:
# Only keep adjectives and nouns, e.g. "good ideas"
while len(np) > 1 and np[0].dep_ not in ('amod', 'compound'):
np = np[1:]
if len(np) > 1:
# Merge the tokens, e.g. good_ideas
np.merge(np.root.tag_, np.text, np.root.ent_type_)
# Iterate over named entities
for ent in doc.ents:
if len(ent) > 1:
# Merge them into single tokens
ent.merge(ent.root.tag_, ent.text, ent.label_)
token_strings = []
for token in tokens:
text = token.text.replace(' ', '_')
tag = token.ent_type_ or token.pos_
token_strings.append('%s|%s' % (text, tag))
yield ' '.join(token_strings)
where is the "tokens" variable defined (from the for loop)?
The text was updated successfully, but these errors were encountered:
For the first code snippet in this link:
https://explosion.ai/blog/sense2vec-with-spacy
where is the "tokens" variable defined (from the for loop)?
The text was updated successfully, but these errors were encountered: