<a href="https://colab.research.google.com/github/huynhhoc/AdvancedPythonProgramming/blob/main/Chapter%208/Chapter8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# urllib module

**urllib module** provides a high-level interface for fetching data across the World Wide Web. In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames. Some restrictions apply — it can only open URLs for reading, and no seek operations are available (source: https://docs.python.org/2/library/urllib.html)

# urlopen
Open a network object denoted by a URL for reading. If the URL does not have a scheme identifier, or if it has file: as its scheme identifier, this opens a local file (without universal newlines); otherwise it opens a socket to a server somewhere on the network. If the connection cannot be made the IOError exception is raised. If all went well, a file-like object is returned. This supports the following methods: read(), readline(), readlines(), fileno(), close(), info(), getcode() and geturl()

In [1]:
#Used to make requests
import urllib.request

url_google = urllib.request.urlopen('https://www.google.com/')
print(url_google.read())

b'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world\'s information, including webpages, images, videos and more. Google has many special features to help you find exactly what you\'re looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="x2G4CXZ+/Rypg9YPic7v6g==">(function(){window.google={kEI:\'xXcjYbHmF7OuwbkP17yjgAQ\',kEXPI:\'0,202343,569872,1,530320,56873,954,756,4348,207,4804,2316,383,246,5,1354,4936,314,1122516,1197745,537,328985,51224,16114,28684,17572,4859,1361,284,9006,3028,2816,14765,4020,978,13228,3847,4192,6430,1142,13385,4518,2778,918,5081,885,708,1279,2212,530,149,1103,842,1981,214,4100,3514,606,2023,1777,522,14668,3227,2845,7,5599,6755,5096,7877,5036,1483,1371,553,908,2,941,6398,8926,432,3,346,1

In [2]:
print (url_google.info())

Date: Mon, 23 Aug 2021 10:26:13 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2021-08-23-10; expires=Wed, 22-Sep-2021 10:26:13 GMT; path=/; domain=.google.com; Secure
Set-Cookie: NID=222=s4DBrSZbMoKa6MX76iQW7iZ1Dx4nL-dtTsPNXrQSFqzJRlwi0mZIJYSkAhikTYYGNw-mvZ7BNkRW4EGD7bTV7VEK6jGR0r_DBjq33sepq2EIovM-qOuHXnBBDRSaSAZ5rS8jEpx4t0L-tdmv5ybtUz9WuH7HejOZhUWeCSpAqrY; expires=Tue, 22-Feb-2022 10:26:13 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Connection: close
Transfer-Encoding: chunked




In [3]:
print(url_google.getheaders())

[('Date', 'Mon, 23 Aug 2021 10:26:13 GMT'), ('Expires', '-1'), ('Cache-Control', 'private, max-age=0'), ('Content-Type', 'text/html; charset=ISO-8859-1'), ('P3P', 'CP="This is not a P3P policy! See g.co/p3phelp for more info."'), ('Server', 'gws'), ('X-XSS-Protection', '0'), ('X-Frame-Options', 'SAMEORIGIN'), ('Set-Cookie', '1P_JAR=2021-08-23-10; expires=Wed, 22-Sep-2021 10:26:13 GMT; path=/; domain=.google.com; Secure'), ('Set-Cookie', 'NID=222=s4DBrSZbMoKa6MX76iQW7iZ1Dx4nL-dtTsPNXrQSFqzJRlwi0mZIJYSkAhikTYYGNw-mvZ7BNkRW4EGD7bTV7VEK6jGR0r_DBjq33sepq2EIovM-qOuHXnBBDRSaSAZ5rS8jEpx4t0L-tdmv5ybtUz9WuH7HejOZhUWeCSpAqrY; expires=Tue, 22-Feb-2022 10:26:13 GMT; path=/; domain=.google.com; HttpOnly'), ('Accept-Ranges', 'none'), ('Vary', 'Accept-Encoding'), ('Connection', 'close'), ('Transfer-Encoding', 'chunked')]


In [4]:
print (url_google.geturl())

https://www.google.com/


In [5]:
print(url_google.getcode())

200


# urlparse

In [23]:
#Used to make requests
import urllib.request

url_google = urllib.request.urlopen('https://www.python.org/search/?q=urlopen&submit=Search')
print(url_google.read())



In [24]:
import urllib.parse
url ='https://www.python.org/search'
values = {'q':'basic', 'submit':'search'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
resp = urllib.request.urlopen(req)
respData = resp.read()
print (respData)



In [9]:
print (resp)

<http.client.HTTPResponse object at 0x7fb5be727250>


In [10]:
import requests

In [16]:
r = requests.get('https://api.github.com/events')

In [17]:
r.headers

{'Server': 'GitHub.com', 'Date': 'Mon, 23 Aug 2021 10:38:40 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept, Accept-Encoding, Accept, X-Requested-With', 'ETag': 'W/"ae926af20c9303f0c9695fb8877ecdc12bab6ab65f1f33ab5d0f49331780e351"', 'Last-Modified': 'Mon, 23 Aug 2021 10:33:40 GMT', 'X-Poll-Interval': '60', 'X-GitHub-Media-Type': 'github.v3; format=json', 'Link': '<https://api.github.com/events?page=2>; rel="next", <https://api.github.com/events?page=10>; rel="last"', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 

In [25]:
x = urllib.request.urlopen('https://www.google.com/search?q=test')


HTTPError: ignored