Blog post question #22

vgoklani · 2017-01-13T22:05:21Z

For the first code snippet in this link:

https://explosion.ai/blog/sense2vec-with-spacy

def transform_texts(texts):
    # Load the annotation models
    nlp = English()
    # Stream texts through the models. We accumulate a buffer and release
    # the GIL around the parser, for efficient multi-threading.
    for doc in nlp.pipe(texts, n_threads=4):
        # Iterate over base NPs, e.g. "all their good ideas"
        for np in doc.noun_chunks:
            # Only keep adjectives and nouns, e.g. "good ideas"
            while len(np) > 1 and np[0].dep_ not in ('amod', 'compound'):
                np = np[1:]
            if len(np) > 1:
                # Merge the tokens, e.g. good_ideas
                np.merge(np.root.tag_, np.text, np.root.ent_type_)
            # Iterate over named entities
            for ent in doc.ents:
                if len(ent) > 1:
                    # Merge them into single tokens
                    ent.merge(ent.root.tag_, ent.text, ent.label_)
        token_strings = []
        for token in tokens:
            text = token.text.replace(' ', '_')
            tag = token.ent_type_ or token.pos_
            token_strings.append('%s|%s' % (text, tag))
        yield ' '.join(token_strings)

where is the "tokens" variable defined (from the for loop)?

The text was updated successfully, but these errors were encountered:

elyase · 2017-01-13T22:19:59Z

that should probably be:

for token in doc:

You can find a working and improved version in the bin/merge_text.py script.

honnibal closed this as completed Apr 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog post question #22

Blog post question #22

vgoklani commented Jan 13, 2017

elyase commented Jan 13, 2017 •

edited

Loading

Blog post question #22

Blog post question #22

Comments

vgoklani commented Jan 13, 2017

elyase commented Jan 13, 2017 • edited Loading

elyase commented Jan 13, 2017 •

edited

Loading