# <p style="text-align:center;font-size:70px;background-color:#0ad61b;color:white;font-style:italic;">Python for web</p>

![](http://www.blog.skytopper.com/wp-content/uploads/2015/06/Global-computer-network.jpg)

This bootcamp is all about interacting with **web** using Python programming language!

In this bootcamp, we will learn:

- to work with web APIs
- to download content from web
- web scraping
- web automation

using simple python scripts!

![](https://i.amz.mshcdn.com/mqczOBQlR2uS7uALqB4fkKylDx0=/fit-in/1200x9600/https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F193985%2Fnewhere.jpg)

# 1. Working with web APIs

- **What is API?**<br>
    API is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact. 
![](https://www.retriever.nl/wp-content/uploads/2016/11/api-321x250.png)
-------------

- **What is web API?**<br>
    Web API is a framework for building HTTP services that can be consumed by a broad range of clients including browsers, mobiles, iphone and tablets.
![](http://dselva.co.in/blog/wp-content/uploads/2017/09/Web-APIs.png)
-----------------
- **Some examples of public web APIs:**
    - [Facebook Graph API](https://developers.facebook.com/docs/graph-api)
    - [Twitter API](https://dev.twitter.com/rest/public)
    - [Google API explorer](https://developers.google.com/apis-explorer/#p/)
--------------

- **What is REST?**<br>
    REST is an architectural style followed by web services, in which, they allow requesting systems to access and manipulate their Web resources using a uniform and predefined set of **stateless operations**.
    
    >In computing, a stateless protocol is a communications protocol in which no information is retained by either sender or receiver. The sender transmits a packet to the receiver and does not expect an acknowledgment of receipt. There is nothing saved that has to be remembered by the next transaction. The server must be able to completely understand the client request without using any server context or server session state. 
    
   Advantages of REST:
   - As the transactions are stateless, we can direct them to any instance of the web service. (As no sessions are involved). Hence, the web service can scale to accommodate load changes.
   - Binding to a service through an API is a matter of controlling how the URL is decoded.

-----------------
- **Types of HTTP requests**
    - GET
    - POST
    - DELETE
    - PUT
    - PATCH, etc.
    
![](http://lotsofthing.com/wp-content/uploads/2017/11/rest-api-1.jpg)

### HTTP  for humans: [requests](http://docs.python-requests.org/en/master/)

<img src="http://docs.python-requests.org/en/master/_static/requests-sidebar.png"  height=200 width=200>


- Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month.All the cool kids are doing it

- Recreational use of other HTTP libraries may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death. Requests is the only Non-GMO HTTP library for Python, safe for human consumption.

- Python HTTP: When in doubt, or when not in doubt, use Requests. Beautiful, simple, Pythonic.

***Everybody loves it!***

#### Installation

```
pip install requests
```

![](http://graph.facebook.com/100002023231822/picture?type=large)

## GET request

### Example 1

http://graph.facebook.com/7/picture?type=large

![](http://graph.facebook.com/10/picture?type=small)

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTJ3ELNuC_coeH9tvLn62fsTMoe-vMVQrsfTrLUOIhsUI69i5QIyg)

![](http://i.imgur.com/gRvt4lV.png)

In [1]:
import requests

In [2]:
url = "http://graph.facebook.com/4/picture?type=large"

In [3]:
response = requests.get(url)

In [23]:
with open("markie.jpg", "wb") as file:
    file.write(response.content)

In [24]:
url = "http://graph.facebook.com/{}/picture?type=large"
for i in range(4, 20):
    response = requests.get(url.format(i))
    with open("data/{}.jpg".format(i), "wb") as file:
        file.write(response.content)

### Example 2

[Google maps geocoding API](https://developers.google.com/maps/documentation/geocoding/intro)

In [3]:
url = "https://maps.googleapis.com/maps/api/geocode/json"

In [4]:
parameters = {
    "address": "coding blocks noida"
}

In [5]:
response = requests.get(url, params = parameters)

In [6]:
response.url

'https://maps.googleapis.com/maps/api/geocode/json?address=coding+blocks+noida'

In [7]:
response.status_code

200

In [8]:
response.content

b'{\n   "results" : [\n      {\n         "address_components" : [\n            {\n               "long_name" : "Knowledge Park I",\n               "short_name" : "Knowledge Park I",\n               "types" : [ "political", "sublocality", "sublocality_level_1" ]\n            },\n            {\n               "long_name" : "Greater Noida",\n               "short_name" : "Greater Noida",\n               "types" : [ "locality", "political" ]\n            },\n            {\n               "long_name" : "Gautam Buddh Nagar",\n               "short_name" : "Gautam Buddh Nagar",\n               "types" : [ "administrative_area_level_2", "political" ]\n            },\n            {\n               "long_name" : "Uttar Pradesh",\n               "short_name" : "UP",\n               "types" : [ "administrative_area_level_1", "political" ]\n            },\n            {\n               "long_name" : "India",\n               "short_name" : "IN",\n               "types" : [ "country", "political" ]\n

In [41]:
import json

In [42]:
d = json.loads(response.content.decode("UTF-8"))

In [45]:
d['results'][1]

{'address_components': [{'long_name': 'Noida',
   'short_name': 'Noida',
   'types': ['locality', 'political']},
  {'long_name': 'Uttar Pradesh',
   'short_name': 'UP',
   'types': ['administrative_area_level_1', 'political']},
  {'long_name': 'India',
   'short_name': 'IN',
   'types': ['country', 'political']},
  {'long_name': '201301', 'short_name': '201301', 'types': ['postal_code']}],
 'formatted_address': 'A-73, Sector 2, Near Sector 15 Metro Station Noida, Noida, Uttar Pradesh 201301, India',
 'geometry': {'location': {'lat': 28.5852881, 'lng': 77.3127031},
  'location_type': 'GEOMETRIC_CENTER',
  'viewport': {'northeast': {'lat': 28.5866370802915, 'lng': 77.3140520802915},
   'southwest': {'lat': 28.5839391197085, 'lng': 77.3113541197085}}},
 'place_id': 'ChIJzRXJljHlDDkRDK62vIMPJV4',
 'plus_code': {'compound_code': 'H8P7+43 Noida, Uttar Pradesh, India',
  'global_code': '7JWVH8P7+43'},
 'types': ['establishment', 'point_of_interest']}

In [46]:
for result in d['results']:
    print(result['formatted_address'])

Knowledge Park I, Greater Noida, Uttar Pradesh 201310, India
A-73, Sector 2, Near Sector 15 Metro Station Noida, Noida, Uttar Pradesh 201301, India


## POST request

![](https://indianpythonista.files.wordpress.com/2016/12/iservice_post_get.png?w=809)

### Example 1

[Pastebin API](https://pastebin.com/api)

In [48]:
key = "d500f4c4f87cc0076865ce7a998feea4"
url = "https://pastebin.com/api/api_post.php"

In [52]:
with open("library.py", "r") as file:
    content = file.read()

In [53]:
parameters = {
    
}
data = {
    "api_dev_key": key,
    "api_option": "paste",
    "api_paste_code": content,
    "api_paste_format": "python"
}

In [54]:
response = requests.post(url, params = parameters, data = data)

In [55]:
response.content

b'https://pastebin.com/73DpBjUs'

# 2. Downloading files

![](https://pics.onsizzle.com/downloading-98-downloading-99-downloading-failed-11367153.png)

Downloading large files in chunks!

http://www.greenteapress.com/thinkpython/thinkpython.pdf

```python
chunk_size = 256
r = requests.get(url, stream=True)

with open("python.pdf", "wb") as f:
    for chunk in r.iter_content(chunk_size=chunk_size):
        f.write(chunk)
```

In [56]:
url = "http://www.greenteapress.com/thinkpython/thinkpython.pdf"

In [57]:
with open("python.pdf", "wb") as file:
    response = requests.get(url)
    
    file.write(response.content)

In [58]:
chunk_size = 256

In [70]:
url = "http://www.greenteapress.com/thinkpython/thinkpython.pdf"

In [83]:
response = requests.get(url, stream = True)

In [84]:
response.headers

{'Date': 'Sat, 15 Sep 2018 07:09:00 GMT', 'Server': 'Apache', 'Last-Modified': 'Mon, 22 Feb 2016 17:23:24 GMT', 'ETag': '"cbc7d-52c5f17101b00"', 'Accept-Ranges': 'bytes', 'Content-Length': '834685', 'Content-Type': 'application/pdf', 'Connection': 'keep-alive'}

In [73]:
from math import ceil

In [85]:
total_iterations = ceil(int(response.headers['Content-Length']) / chunk_size)

In [86]:
total_iterations

3261

In [76]:
with open("python.pdf", "wb") as file:
    for chunk in response.iter_content(chunk_size = chunk_size):
        file.write(chunk)

In [79]:
from tqdm import tqdm

In [81]:
for i in tqdm(range(10000000), total = 10000000):
    pass

100%|██████████| 10000000/10000000 [00:03<00:00, 2959135.01it/s]


In [88]:
url = "https://www.iso.org/files/live/sites/isoorg/files/archive/pdf/en/annual_report_2009.pdf"

response = requests.get(url, stream = True)

total_iterations = ceil(int(response.headers['Content-Length']) / chunk_size)

with open("big_file.pdf", "wb") as file:
    iterable = response.iter_content(chunk_size = chunk_size)
    for chunk in tqdm(iterable, total = total_iterations):
        file.write(chunk)

100%|██████████| 42040/42040 [00:27<00:00, 1527.75it/s]


# 3. Web scraping

![](https://image.slidesharecdn.com/scrapingtotherescue-160713133749/95/getting-started-with-web-scraping-in-python-9-638.jpg?cb=1468417631)


## [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

>Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

### Installation

```
pip install bs4
```

**Bonus:**
```
pip install html5lib
```

https://www.values.com/inspirational-quotes

In [11]:
from bs4 import BeautifulSoup
import requests

In [63]:
url = "https://www.passiton.com/inspirational-quotes"

response = requests.get(url)

In [64]:
soup = BeautifulSoup(response.content)

In [65]:
article_tags = soup.findAll('article')

In [71]:
first = article_tags[1]

In [72]:
img_tag = first.img

In [73]:
img_tag

<img alt="Obstacles don’t have to stop you. If you run into a wall, don’t turn around and give up. Figure out how to climb it, go through it or work around it. #&lt;Author:0x007f33615c1ea8&gt;" class="hover" src="https://quotes.values.com/quote_artwork/7058/medium/20180913_thursday_quote.jpg?1536354522"/>

In [74]:
img_tag.attrs

{'alt': 'Obstacles don’t have to stop you. If you run into a wall, don’t turn around and give up. Figure out how to climb it, go through it or work around it. #<Author:0x007f33615c1ea8>',
 'class': ['hover'],
 'src': 'https://quotes.values.com/quote_artwork/7058/medium/20180913_thursday_quote.jpg?1536354522'}

In [50]:
url = "http://www.google.com"
response = requests.get(url)
print(type(response.content.decode('UTF-8')))

<class 'bytes'>


In [53]:
with open("qoute.jpg", "wb") as file:
    response = requests.get(url)
    file.write(response.content)

In [55]:
l = [(1, 2), (2, 3), (3, 4)]

for i in enumerate(l):
    print(i)

(0, (1, 2))
(1, (2, 3))
(2, (3, 4))


In [None]:
session = requests.Session()

resp = session.get(url)

In [56]:
url = "https://www.passiton.com/inspirational-quotes"
response = requests.get(url)
soup = BeautifulSoup(response.content)

article_tags = soup.findAll('article')

for i, article in enumerate(article_tags):
    article_img_url = article.img.attrs['src']
    
    with open("quotes/{}.jpg".format(i), "wb") as file:
        r = requests.get(article_img_url)
        file.write(r.content)

In [80]:
url = "https://www.passiton.com/inspirational-quotes"
response = requests.get(url)
soup = BeautifulSoup(response.content)

article_tags = soup.findAll('article')

quotes = []

for i, article in enumerate(article_tags):
    article_text = article.img.attrs['alt'].split("#")[0]
    article_img_url = article.img.attrs['src']
    
    quotes.append((i, article_text, article_img_url))


In [86]:
import csv
with open("quotes.csv", "w") as file:
    writer = csv.writer(file)
    writer.writerow(["S. no.", "quote", "image"])
    writer.writerows(quotes)

In [85]:
import csv
with open("temp.csv", "w") as file:
    writer = csv.writer(file)
    
    writer.writerows([(1, 2), [2, 3], [3, 4]])

In [26]:
a.div

<div class="portfolio-image">
        <a href="/inspirational-quotes/4108-a-will-finds-a-way"><img alt="A will finds a way. #&lt;Author:0x007f335d535420&gt;" class="hover" src="https://quotes.values.com/quote_artwork/4108/medium/20180914_friday_quote.jpg?1536354551"/></a>
    </div>

![](http://www.entropywebscraping.com/wp-content/uploads/2017/02/Screenshot-from-2017-02-01-10-23-00.png)

# 4. Web automation
 
 ![](https://images.contentful.com/qs7jgwzogkzr/6HeUbprAsMYek2Keqi0WYo/d8ad7cf2f15e706ead76e00a53859cc7/testing-automation-alternatives.jpg)
 
 **Task:** Automatically submit the code for a [problem](https://www.codechef.com/problems/TEST) on [codechef](https://www.codechef.com/).
 
 ### [Selenium](http://selenium-python.readthedocs.io/) : Web automation and testing
 
 ![](https://udemy-images.udemy.com/course/750x422/482754_7146_4.jpg)
 
 
 #### Installation
 
 - To install python bindings for selenium:
     ```
     pip install selenium
     ```
     
 - To install webdriver:
 
     http://selenium-python.readthedocs.io/installation.html#drivers
     
     [How to put webdriver in PATH?](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path)
 
 #### To start a browser session
 ```python
 from selenium import webdriver
 browser = webdriver.Chrome()
 ```
 
 #### To open a webpage
 ```python
 browser.get('https://www.codechef.com')
 ```
 
 #### To select an element by its id
 ```python
 browser.find_element_by_id(<id>)
 ```
 
 #### Input value in element
 ```python
 element.send_keys()
 ```
 
 #### Click on an element
 ```python
 element.click()
 ```

In [100]:
from selenium import webdriver

In [101]:
chrome = webdriver.Chrome('./chromedriver')

In [102]:
chrome.get("http://www.codechef.com")

In [103]:
element = chrome.find_element_by_id("edit-name")

In [104]:
element.send_keys("jatin_katyal")

In [105]:
element = chrome.find_element_by_id("edit-pass")

In [106]:
from getpass import getpass
password = getpass()

········


In [107]:
element.send_keys(password)

In [108]:
element = chrome.find_element_by_id("edit-submit")

In [109]:
element.click()

In [133]:
chrome.get("https://www.codechef.com/submit/TEST")

In [138]:
element = chrome.find_element_by_id("edit_area_toggle_checkbox_edit-program")

In [139]:
element.click()

In [144]:
element = chrome.find_element_by_id("edit-program")

In [141]:
with open("code.c", "r") as file:
    element.send_keys(file.read())

In [142]:
element = chrome.find_element_by_id("edit-submit")

In [143]:
element.click()

In [146]:
element = chrome.find_element(by = "data-testid", value = "royal_login_button")

WebDriverException: Message: unknown error: Unsupported locator strategy: data-testid
  (Session info: chrome=68.0.3440.106)
  (Driver info: chromedriver=2.42.591059 (a3d9684d10d61aa0c45f6723b327283be1ebaad8),platform=Mac OS X 10.13.6 x86_64)


In [167]:
chrome = webdriver.Chrome("./chromedriver")

In [168]:
chrome.get("https://www.facebook.com/")

In [169]:
element = chrome.find_element_by_id("email")

In [170]:
element.send_keys("jatin.katyal13@gmail.com")

In [171]:
element = chrome.find_element_by_id("pass")

In [172]:
from getpass import getpass
password = getpass()
element.send_keys(password)

········


In [173]:
element = chrome.find_element_by_id("loginbutton")

In [174]:
element = element.find_element_by_tag_name('input')

In [175]:
element.click()

In [166]:
chrome.get("https://www.facebook.com/?ref=tn_tnmn")

![](https://i.imgflip.com/poxkz.jpg)

## Resourses:

- Python packages:

    - [requests](http://docs.python-requests.org/en/master/)

    - [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
    
    - [html5lib](https://html5lib.readthedocs.io/en/latest/)
 