# urllib & Requests

## Urllib

Urllib is a package that combines several modules to preprocess the URLs. In simple words, it is an HTTP client for python programming languages, the latest version of 

Urllib is urllib3 1.26.2 which supports thread-safe connection, connection pooling, client-side verification using SSL/TLS verification, multipart encoding, support for gzip, and brotli encoding. It brings many critical features that are missing from traditional python libraries.

Urllib3 is one of the widely downloaded packages on PyPi, and it is the first to execute in any web scraping script, it is available under the MIT license.

By using urllib.request we can simply open and read URLs.

urllib.error defines the exceptions and errors raised by the urllib.request command.

urllib.parse is used for parsing URLs.

urllib.robotparser is used for parsing [robots.txt](https://developers.google.com/search/docs/advanced/robots/intro) files.

## Requests

Requests is an open-source python library that makes HTTP requests more human-friendly and simple to use. It is developed by Kenneth Reitz, Cory Benfield, Ian Stapleton Cordasco, Nate Prewitt with an initial release in February 2011.

Requests module library is Apache2 licensed, which is written in Python.

To read about it more, please refer [this](https://analyticsindiamag.com/web-scraping-frameworks/) article.

# Code Implementation

## Installation

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels scikit-image urllib3 --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

## Quickstart

In [None]:
import urllib3

http = urllib3.PoolManager()
r = http.request('GET', 'http://httpbin.org/robots.txt')
r.status

In [None]:
r.data

Let’s scrape a website using urllib and regular expressions

In [None]:
#1 libraries needed
import urllib.request
import urllib.parse 
import re 

#2 search   
url = 'https://analyticsindiamag.com/'
values = {'s':'Web Scraping', 
          'submit':'search'} 
#defining header
header= {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) ' 
      'AppleWebKit/537.11 (KHTML, like Gecko) '
      'Chrome/23.0.1271.64 Safari/537.11',
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
      'Accept-Encoding': 'none',
      'Accept-Language': 'en-US,en;q=0.8',
      'Connection': 'keep-alive'}

#3 parse
data = urllib.parse.urlencode(values) 
data = data.encode('utf-8') 
req = urllib.request.Request(url, data, headers=header) 

In [None]:
resp = urllib.request.urlopen(req) 

In [None]:
respData = resp.read() 

#4 extract using regular expressions
document = re.findall(r'<p>(.*?)</p>',str(respData)) 
   
for line in document: 
    print(line) 

## Requests

In [None]:
#1 importing modules
import requests
from bs4 import BeautifulSoup

#2 using .GET()
res = requests.get('https://analyticsindiamag.com/')

#3 beautiful for extracting only reliable data
soup = BeautifulSoup(res.text, 'html.parser')
article_block =soup.find_all('div',class_='post-title')
for titles in article_block:
	title =titles.find('span').get_text()
	print(title)

## Use- case of Request other than Web scraping

We can use the requests module to request our web API to get answers like in this case we are using POST on web API: https://loan5.herokuapp.com/api

This API is used to predict loan approval. It returns 1 or 0, i.e. approved or disapproved on passing some attributes like gender, credit history, married, etc.

In [None]:
#1
import json
import requests
url= 'https://loan5.herokuapp.com/api'

#2 sample data
data={'Gender':1, 'Married':1, 'Dependents':2, 'Education':0, 'Self_Employed':1,'Credit_History':0,'Property_Area':1, 'Income':1}
data = json.dumps(data)

#3 sending requesting with data to webapi and it will 
#Return the answer.
send_req = requests.post(url, data)
print(send_req.json())