New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python 3 issues with working-with-web-pages #407
Comments
Here's a few workarounds. The lesson is indeed written for Python 2.7. To import, use import urllib.request
url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'
response = urllib.request.urlopen(url)
webContent = response.read()
print(webContent[0:300]) and # save-webpage.py
import urllib.request
url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'
response = urllib.request.urlopen(url)
webContent = response.read().decode('utf-8')
f = open('test.html', 'w')
f.write(webContent)
f.flush()
f.close() Putting together: import urllib.request
url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'
response = urllib.request.urlopen(url)
HTML = response.read().decode('utf-8')
print(stripTags(HTML)) and to get the word list, I did import urllib.request
url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'
response = urllib.request.urlopen(url)
HTML = response.read().decode('utf-8')
clip = stripTags(HTML)
text = BeautifulSoup(clip,"html5lib").get_text().lower()
wordlist = text.split()
print(wordlist[0:120]) |
Is this closeable now that we have the python warnings? |
We used to have comments back when we were on Wordpress. It's a shame we've lost that actually because this could be the type of thing solved through a comment (I'm not suggesting we implement that now!). But yes, I suppose we can close it. |
from urllib.request import Request, urlopen any who can help me decode that link , i don't know factor when i tried with another link and it worked |
doenst work |
@guanbuc could you provide a bit more information about what isn't working for you? |
Hi! I'm enjoying learning to work with text files and web pages using Programming Historian.
I installed Python 3.6 and have run into a couple major differences from how the directions are written. Once I power through the lessons and hopefully do very well in a job interview, I'd be happy to help update the directions with more detail.
For now, this is what I've run into on http://programminghistorian.org/lessons/working-with-web-pages :
Thank you again for the site! I hope this is helpful info.
All the best,
Cathi
The text was updated successfully, but these errors were encountered: