Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work with http://www.anc.org/data/oanc/ngram/ #97

Closed
patarapolw opened this issue Apr 12, 2018 · 2 comments
Closed

Doesn't work with http://www.anc.org/data/oanc/ngram/ #97

patarapolw opened this issue Apr 12, 2018 · 2 comments

Comments

@patarapolw
Copy link

patarapolw commented Apr 12, 2018

Even with #93, it still returns None.

params = {
    'key': '* love *',
    'print_max': 100,
    'freq_threshold': 0,
    'output_style': 'sentence',
    'output_aux': 0,
    'print_format': 'text',
    'sort': None
}

r = requests.get('http://www.anc.org/cgi-bin/ngrams.cgi', params=params)
text = BeautifulSoup(r.text, 'html.parser').text
for punc in string.punctuation:
    text = text.replace(punc, '')
@jsvine
Copy link
Owner

jsvine commented Apr 29, 2018

Thanks for filing this issue. Can you provide a script that reproduces the problem you're seeing? (The code included above does not appear to use markovify.)

@patarapolw
Copy link
Author

http://www.anc.org/cgi-bin/ngrams.cgi?key=*+love+*&print_max=100&freq_threshold=0&output_style=sentence&output_aux=0&print_format=table&sort=none

It appears than markovify.Chain works OK with POS-separated words (separated by spaces in this case) / tagged corpus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants