Cannot process a txt file with spacy #1851

ghost · 2018-01-16T14:38:38Z

Hi all,

I am trying to load an english text file but failing to do so, I am following spacy website 101 tutorial:

ff = io.open('moby_dick.txt', 'r', encoding='utf-8')
nlp(ff)
ff.close()

TypeError                                 Traceback (most recent call last)
<ipython-input-20-f474742159a7> in <module>()
      1 ff = io.open(files[7], 'r', encoding='utf-8')
----> 2 nlp(ff)
      3 ff.close()

~/.local/lib/python3.5/site-packages/spacy/language.py in __call__(self, text, disable)
    327             ('An', 'NN')
    328         """
--> 329         doc = self.make_doc(text)
    330         for name, proc in self.pipeline:
    331             if name in disable:

~/.local/lib/python3.5/site-packages/spacy/language.py in make_doc(self, text)
    355 
    356     def make_doc(self, text):
--> 357         return self.tokenizer(text)
    358 
    359     def update(self, docs, golds, drop=0., sgd=None, losses=None):

TypeError: Argument 'string' has incorrect type (expected str, got _io.TextIOWrapper)

Your Environment

spaCy version: 2.0.5
Platform: Linux-4.13.0-26-generic-x86_64-with-LinuxMint-18.3-sylvia
Models: en
Python version: 3.5.2

The text was updated successfully, but these errors were encountered:

bwj-GitHub · 2018-01-16T16:23:39Z

I wasn't aware that nlp could take a file; you should read the file first: nlp(ff.read()).

ghost · 2018-01-16T16:51:06Z

Oh. I see. I took the idea from the example in the serialization section.

https://spacy.io/usage/spacy-101#serialization

ines · 2018-01-16T17:39:56Z

Sorry – this is a mistake in the example, it's indeed missing the .read(). Fixing! We might also update the example to not use Moby Dick and a made-up filename instead (like customer_feedback_627.txt, which we use in the lightning tour example on the homepage). Loading and parsing a large document like this in one is actually not the best strategy, especially in spaCy v2.0. For better performance, you'd usually want to split up the text and use nlp.pipe, which returns a generator.

lock · 2018-05-08T02:54:59Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added the docs Documentation and website label Jan 16, 2018

ines closed this as completed in 67ba733 Jan 16, 2018

lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot process a txt file with spacy #1851

Cannot process a txt file with spacy #1851

ghost commented Jan 16, 2018

bwj-GitHub commented Jan 16, 2018

ghost commented Jan 16, 2018

ines commented Jan 16, 2018

lock bot commented May 8, 2018

Cannot process a txt file with spacy #1851

Cannot process a txt file with spacy #1851

Comments

ghost commented Jan 16, 2018

Your Environment

bwj-GitHub commented Jan 16, 2018

ghost commented Jan 16, 2018

ines commented Jan 16, 2018

lock bot commented May 8, 2018