Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with non-standard characters? #6

Closed
BrekiTomasson opened this issue May 19, 2016 · 4 comments
Closed

Issue with non-standard characters? #6

BrekiTomasson opened this issue May 19, 2016 · 4 comments

Comments

@BrekiTomasson
Copy link

Is there a potential problem if the corpus contains non-standard characters? I'm thinking, for example, Swedish characters like Å Ä Ö. I'm testing a corpus which contains this kind of character in most rows, and the process seems to hang without outputting anything.

@fitnr
Copy link
Owner

fitnr commented May 19, 2016

Are the files UTF-8 encoded? Could you post a gist with an example?

@BrekiTomasson
Copy link
Author

https://gist.github.com/BrekiTomasson/e9814f32c42376af4fe4795dc3cf95aa for the gist.

The file is in UTF-8.

@BrekiTomasson
Copy link
Author

Also, for reference, when it's been standing still for a while and not outputting anything, and I kill it via a simple ctrl-C, this is what I get:

Traceback (most recent call last):
  File "/usr/local/bin/twittermarkov", line 9, in <module>
    load_entry_point('twitter-markov==0.4.3', 'console_scripts', 'twittermarkov')()
  File "/usr/local/lib/python2.7/dist-packages/twitter_markov/cli.py", line 70, in main
    func(argdict)
  File "/usr/local/lib/python2.7/dist-packages/twitter_markov/cli.py", line 77, in tweet_func
    tm.tweet()
  File "/usr/local/lib/python2.7/dist-packages/twitter_markov/twitter_markov.py", line 178, in tweet
    text = self.compose(model, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/twitter_markov/twitter_markov.py", line 196, in compose
    sent = model.make_sentence(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/markovify/text.py", line 115, in make_sentence
    words = self.chain.walk(init_state)
  File "/usr/local/lib/python2.7/dist-packages/markovify/chain.py", line 98, in walk
    return list(self.gen(init_state))
  File "/usr/local/lib/python2.7/dist-packages/markovify/chain.py", line 87, in gen
    next_word = self.move(state)
  File "/usr/local/lib/python2.7/dist-packages/markovify/chain.py", line 74, in move
    cumdist = list(accumulate(weights))
  File "/usr/local/lib/python2.7/dist-packages/markovify/chain.py", line 9, in accumulate
    def accumulate(iterable, func=operator.add):
KeyboardInterrupt
´``

@BrekiTomasson
Copy link
Author

Actually, never mind. It seems the problem might have been that there were other non-standard characters in there, parentheses and quotationmarks, especially. Once I cleaned them out, it worked better.

@fitnr fitnr closed this as completed May 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants