## 异常
### 异常一：HTTPError
HTTP错误可能是“404 Page Not Found”“500 Internal Server Error”等。对于所有类似情形，urlopen 函数都会抛出 HTTPError 异常。

In [4]:
from urllib.request import urlopen
from urllib.error import HTTPError

try:
    html = urlopen('http://www.pythonscraping.com/pages/page.html')
except HTTPError as e:
    print(e)
    # 返回空值，中断程序，或者执行另一个方案
else:
    print('程序继续')

HTTP Error 404: Not Found


### 异常二：URLError
服务器不存在

In [5]:
from urllib.request import urlopen
from urllib.error import HTTPError, URLError

try:
    html = urlopen('https://pythonscrapingthisurldoesnotexist.com')
except HTTPError as e:
    print(e)
except URLError as e:
    print(e)
else:
    print('It worked!')

<urlopen error [Errno 11001] getaddrinfo failed>


### 异常三：AttributeError
调用BeautifulSoup对象里一个标签时，若标签不存在，则会出现AttributeError异常

对于BeautifulSoup对象中没有的标签，调用该标签时返回的是一个None对象

In [10]:
from bs4 import BeautifulSoup

html = urlopen('http://www.pythonscraping.com/pages/page1.html')
bs = BeautifulSoup(html.read(), 'html.parser')
print(bs.nonExistentTag)

None


In [11]:
bs

<html>
<head>
<title>A Useful Page</title>
</head>
<body>
<h1>An Interesting Title</h1>
<div>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</div>
</body>
</html>

当对None对象进行操作时，则会出现AttributeError

In [8]:
bs.nonExistentTag.someTag

AttributeError: 'NoneType' object has no attribute 'someTag'

共同处理三类异常的实例

In [9]:
def getTitle(url):
    try:
        html = urlopen(url)
    except HTTPError as e:
        print(e)
        return None
    except URLError as e:
        print(e)
        return None
    else:
        try:
            bs = BeautifulSoup(html.read(), 'html.parser')
            title = bs.body.h1
        except AttributeError as e:
            print(e)
            return None
        else:
            return title


title = getTitle('http://www.pythonscraping.com/pages/page1.html')
if title == None:
    print('Title could not be found!')
else:
    print(title)

<h1>An Interesting Title</h1>
