python 3 issues with working-with-web-pages #407

fredgibbs · 2017-04-04T14:26:19Z

Hi! I'm enjoying learning to work with text files and web pages using Programming Historian.

I installed Python 3.6 and have run into a couple major differences from how the directions are written. Once I power through the lessons and hopefully do very well in a job interview, I'd be happy to help update the directions with more detail.

For now, this is what I've run into on http://programminghistorian.org/lessons/working-with-web-pages :

import urllib2 does not work. StackOverflow kindly suggested the fix "from urllib.request import urlopen"
f = open('obo-t17800628-33.html', 'w') no longer works. Again, StackOverflow pointed out that it needs to be opened in binary: "f = open('obo-t17800628-33.html', 'wb')." I'd love to understand that better

Thank you again for the site! I hope this is helpful info.

All the best,
Cathi

ianmilligan1 · 2017-04-04T16:47:02Z

Here's a few workarounds. The lesson is indeed written for Python 2.7.

To import, use urllib.request as you've noted.

import urllib.request

url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'

response = urllib.request.urlopen(url)
webContent = response.read()

print(webContent[0:300])

and

# save-webpage.py

import urllib.request

url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'

response = urllib.request.urlopen(url)
webContent = response.read().decode('utf-8')

f = open('test.html', 'w')
f.write(webContent)
f.flush()
f.close()

Putting together:

import urllib.request

url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'

response = urllib.request.urlopen(url)
HTML = response.read().decode('utf-8')

print(stripTags(HTML))

and to get the word list, I did

import urllib.request

url = 'http://www.oldbaileyonline.org/browse.jsp?id=t17800628-33&div=t17800628-33'

response = urllib.request.urlopen(url)
HTML = response.read().decode('utf-8')
clip = stripTags(HTML)
text = BeautifulSoup(clip,"html5lib").get_text().lower()

wordlist = text.split()

print(wordlist[0:120])

mdlincoln · 2017-07-27T00:06:55Z

Is this closeable now that we have the python warnings?

acrymble · 2017-07-28T08:52:43Z

We used to have comments back when we were on Wordpress. It's a shame we've lost that actually because this could be the type of thing solved through a comment (I'm not suggesting we implement that now!). But yes, I suppose we can close it.

lukifer195 · 2019-07-30T11:15:14Z

from urllib.request import Request, urlopen
url = 'https://dict.laban.vn/find?type=1&query=ch%C3%A2n'
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
resource = urlopen(req)
print(resource.read())
content = resource.read().decode(resource.headers.get_content_charset())

any who can help me decode that link , i don't know factor when i tried with another link and it worked

guanbuc · 2019-12-05T18:57:46Z

doenst work

svmelton · 2019-12-05T19:58:36Z

@guanbuc could you provide a bit more information about what isn't working for you?

fredgibbs mentioned this issue Apr 4, 2017

python 3 strategy #408

Closed

acrymble mentioned this issue Apr 24, 2017

Editorial Meeting Agenda 28/04/17 #412

Closed

mdlincoln added the Lesson Maintenance label Jul 27, 2017

acrymble closed this as completed Jul 28, 2017

vgayolrs mentioned this issue Apr 8, 2019

2019 ES-Team objectives #1182

Closed

jenniferisasi mentioned this issue Apr 13, 2019

April 2019 Spanish-Team Meeting Agenda #1273

Closed

15 tasks

vgayolrs mentioned this issue Apr 26, 2019

Create solicitud-lecciones #1261

Merged

acrymble mentioned this issue Sep 1, 2020

Python programming sequence not fully updated for Python 3 #1885

Closed

walshbr mentioned this issue Jan 27, 2021

Fix bug on 'Creating and Viewing HTML files with Python' #1997

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python 3 issues with working-with-web-pages #407

python 3 issues with working-with-web-pages #407

fredgibbs commented Apr 4, 2017

ianmilligan1 commented Apr 4, 2017 •

edited

mdlincoln commented Jul 27, 2017

acrymble commented Jul 28, 2017

lukifer195 commented Jul 30, 2019 •

edited

guanbuc commented Dec 5, 2019

svmelton commented Dec 5, 2019

python 3 issues with working-with-web-pages #407

python 3 issues with working-with-web-pages #407

Comments

fredgibbs commented Apr 4, 2017

ianmilligan1 commented Apr 4, 2017 • edited

mdlincoln commented Jul 27, 2017

acrymble commented Jul 28, 2017

lukifer195 commented Jul 30, 2019 • edited

any who can help me decode that link , i don't know factor when i tried with another link and it worked

guanbuc commented Dec 5, 2019

svmelton commented Dec 5, 2019

ianmilligan1 commented Apr 4, 2017 •

edited

lukifer195 commented Jul 30, 2019 •

edited