Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/markgw/pimlico
Browse files Browse the repository at this point in the history
  • Loading branch information
markgw committed Oct 30, 2020
2 parents 184acc4 + 34c8d0e commit 0814c84
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion src/python/pimlico/modules/spacy/extract_nps/execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,12 @@ def process_document(worker, archive, filename, doc):
# Apply tagger and parser to the raw text
doc = worker.nlp(doc.text)
# Now doc.noun_chunks contains the NP chunks from the parser
chunks = [[token.text.strip() for token in np] for np in doc.noun_chunks]
# Filter out space
chunks = [[token for token in np if len(token)] for np in chunks]
chunks = [np for np in chunks if len(np)]
return {
"sentences": [[token.text for token in np] for np in doc.noun_chunks]
"sentences": chunks
}


Expand Down

0 comments on commit 0814c84

Please sign in to comment.