# <p style="text-align:center;font-size:70px;background-color:#0ad61b;color:white;font-style:italic;">Python for web</p>

![](http://www.blog.skytopper.com/wp-content/uploads/2015/06/Global-computer-network.jpg)

This bootcamp is all about interacting with **web** using Python programming language!

In this bootcamp, we will learn:

- to work with web APIs
- to download content from web
- web scraping
- web automation

using simple python scripts!

![](https://i.amz.mshcdn.com/mqczOBQlR2uS7uALqB4fkKylDx0=/fit-in/1200x9600/https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F193985%2Fnewhere.jpg)

# 1. Working with web APIs

- **What is API?**<br>
    API is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact. 
![](https://www.retriever.nl/wp-content/uploads/2016/11/api-321x250.png)
-------------

- **What is web API?**<br>
    Web API is a framework for building HTTP services that can be consumed by a broad range of clients including browsers, mobiles, iphone and tablets.
![](http://dselva.co.in/blog/wp-content/uploads/2017/09/Web-APIs.png)
-----------------
- **Some examples of public web APIs:**
    - [Facebook Graph API](https://developers.facebook.com/docs/graph-api)
    - [Twitter API](https://dev.twitter.com/rest/public)
    - [Google API explorer](https://developers.google.com/apis-explorer/#p/)
--------------

- **What is REST?**<br>
    REST is an architectural style followed by web services, in which, they allow requesting systems to access and manipulate their Web resources using a uniform and predefined set of **stateless operations**.
    
    >In computing, a stateless protocol is a communications protocol in which no information is retained by either sender or receiver. The sender transmits a packet to the receiver and does not expect an acknowledgment of receipt. There is nothing saved that has to be remembered by the next transaction. The server must be able to completely understand the client request without using any server context or server session state. 
    
   Advantages of REST:
   - As the transactions are stateless, we can direct them to any instance of the web service. (As no sessions are involved). Hence, the web service can scale to accommodate load changes.
   - Binding to a service through an API is a matter of controlling how the URL is decoded.

-----------------
- **Types of HTTP requests**
    - GET
    - POST
    - DELETE
    - PUT
    - PATCH, etc.
    
![](http://lotsofthing.com/wp-content/uploads/2017/11/rest-api-1.jpg)

### HTTP  for humans: [requests](http://docs.python-requests.org/en/master/)

<img src="http://docs.python-requests.org/en/master/_static/requests-sidebar.png"  height=200 width=200>


- Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month.All the cool kids are doing it

- Recreational use of other HTTP libraries may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death. Requests is the only Non-GMO HTTP library for Python, safe for human consumption.

- Python HTTP: When in doubt, or when not in doubt, use Requests. Beautiful, simple, Pythonic.

***Everybody loves it!***

#### Installation

```
pip install requests
```

## GET request

### Example 1

http://graph.facebook.com/4/picture?type=large

![](http://graph.facebook.com/4/picture?type=large)

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTJ3ELNuC_coeH9tvLn62fsTMoe-vMVQrsfTrLUOIhsUI69i5QIyg)

![](http://i.imgur.com/gRvt4lV.png)

In [1]:
import requests

In [2]:
url = "http://graph.facebook.com/4/picture?type=large"

In [3]:
resp = requests.get(url)

In [5]:
with open("mark.jpg", 'wb') as f:
    f.write(resp.content)

### Example 2

[Google maps geocoding API](https://developers.google.com/maps/documentation/geocoding/intro)

In [6]:
url = "https://maps.googleapis.com/maps/api/geocode/json"

In [7]:
params = {
    'address': 'Coding blocks, kohat'
}

In [8]:
r = requests.get(url, params=params)

In [9]:
r.url

'https://maps.googleapis.com/maps/api/geocode/json?address=Coding+blocks%2C+kohat'

In [12]:
data = r.json()

In [20]:
from pprint import pprint

In [26]:
address = data['results'][0]['formatted_address']
lat = data['results'][0]['geometry']['location']['lat']
lng = data['results'][0]['geometry']['location']['lng']

In [21]:
pprint(data['results'])

[{'address_components': [{'long_name': 'New Delhi',
                          'short_name': 'New Delhi',
                          'types': ['locality', 'political']},
                         {'long_name': 'Delhi',
                          'short_name': 'DL',
                          'types': ['administrative_area_level_1',
                                    'political']},
                         {'long_name': 'India',
                          'short_name': 'IN',
                          'types': ['country', 'political']},
                         {'long_name': '110034',
                          'short_name': '110034',
                          'types': ['postal_code']}],
  'formatted_address': '47, Nishant Kunj, 1st & 2nd Floor, Pitampura Main '
                       'Road, Opposite Metro Pillar 337, Pitampura, New Delhi, '
                       'Delhi 110034, India',
  'geometry': {'location': {'lat': 28.6969421, 'lng': 77.14238250000001},
               'location_type': 'G

In [19]:
data['results']

[{'address_components': [{'long_name': 'New Delhi',
    'short_name': 'New Delhi',
    'types': ['locality', 'political']},
   {'long_name': 'Delhi',
    'short_name': 'DL',
    'types': ['administrative_area_level_1', 'political']},
   {'long_name': 'India',
    'short_name': 'IN',
    'types': ['country', 'political']},
   {'long_name': '110034', 'short_name': '110034', 'types': ['postal_code']}],
  'formatted_address': '47, Nishant Kunj, 1st & 2nd Floor, Pitampura Main Road, Opposite Metro Pillar 337, Pitampura, New Delhi, Delhi 110034, India',
  'geometry': {'location': {'lat': 28.6969421, 'lng': 77.14238250000001},
   'location_type': 'GEOMETRIC_CENTER',
   'viewport': {'northeast': {'lat': 28.6982910802915,
     'lng': 77.14373148029152},
    'southwest': {'lat': 28.6955931197085, 'lng': 77.14103351970851}}},
  'place_id': 'ChIJ_ZBg-dEDDTkRCYK3Ee8ywoI',
  'plus_code': {'compound_code': 'M4WR+QX Delhi, India',
   'global_code': '7JWVM4WR+QX'},
  'types': ['establishment', 'point_o

## POST request

![](https://indianpythonista.files.wordpress.com/2016/12/iservice_post_get.png?w=809)

### Example 1

[Pastebin API](https://pastebin.com/api)

In [28]:
url = "https://pastebin.com/api/api_post.php"

In [29]:
api_dev_key = "b4d2dc565cf00f0a1e89a0afdb20addc"

In [37]:
proxies = {
    'https': "195.138.88.176:8080"
}

In [30]:
data = {
    'api_dev_key': api_dev_key,
    'api_option': 'paste',
    'api_paste_code': "Hello, world!"
}

In [38]:
r = requests.post(url, data=data, proxies=proxies)

# 2. Downloading files

![](https://pics.onsizzle.com/downloading-98-downloading-99-downloading-failed-11367153.png)

Downloading large files in chunks!

http://www.greenteapress.com/thinkpython/thinkpython.pdf

```python
chunk_size = 256
r = requests.get(url, stream=True)

with open("python.pdf", "wb") as f:
    for chunk in r.iter_content(chunk_size=chunk_size):
        f.write(chunk)
```

In [40]:
chunk_size = 256

In [41]:
url = "http://www.greenteapress.com/thinkpython/thinkpython.pdf"

In [67]:
r = requests.get(url, stream=True)

In [44]:
from tqdm import tqdm

In [64]:
from math import ceil

In [65]:
total = ceil(int(r.headers['Content-Length']) / chunk_size)

In [51]:
r.headers

{'Date': 'Sat, 11 Aug 2018 09:09:48 GMT', 'Server': 'Apache', 'Last-Modified': 'Mon, 22 Feb 2016 17:23:24 GMT', 'ETag': '"cbc7d-52c5f17101b00"', 'Accept-Ranges': 'bytes', 'Content-Length': '834685', 'Keep-Alive': 'timeout=2, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'application/pdf'}

In [68]:
with open("python.pdf", "wb") as f:
    for chunk in tqdm(r.iter_content(chunk_size=chunk_size), total=total):
        f.write(chunk)


  0%|          | 0/3261 [00:00<?, ?it/s][A
 62%|██████▏   | 2018/3261 [00:00<00:00, 19680.40it/s][A
100%|██████████| 3261/3261 [00:00<00:00, 23482.51it/s][A

# 3. Web scraping

![](https://image.slidesharecdn.com/scrapingtotherescue-160713133749/95/getting-started-with-web-scraping-in-python-9-638.jpg?cb=1468417631)


## [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

>Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

### Installation

```
pip install bs4
```

**Bonus:**
```
pip install html5lib
```

https://www.values.com/inspirational-quotes

In [72]:
from bs4 import BeautifulSoup

In [69]:
url = "https://www.passiton.com/inspirational-quotes"

In [70]:
r = requests.get(url)

In [73]:
soup = BeautifulSoup(r.content, 'html5lib')

In [79]:
article_elements = soup.findAll('article', attrs={'class': 'quotation'})

In [80]:
article_element = article_elements[0]

In [98]:
article_element.a

<a href="/inspirational-quotes/4053-no-legacy-is-so-rich-as-honesty"><img alt="No legacy is so rich as honesty. #&lt;Author:0x007f2798b7ad88&gt;" class="hover" src="https://quotes.values.com/quote_artwork/4053/medium/20180810_friday_quote.jpg?1533403416"/></a>

In [93]:
articles = []

In [99]:
for article_element in article_elements:
    article = {}
    article['txt'] = article_element.img['alt']
    article['img'] = article_element.img['src']
    article['url'] = article_element.a['href']
    articles.append(article)

In [100]:
articles

[{'txt': 'No legacy is so rich as honesty. #<Author:0x007f2798b7ad88>',
  'img': 'https://quotes.values.com/quote_artwork/4053/medium/20180810_friday_quote.jpg?1533403416',
  'url': '/inspirational-quotes/4053-no-legacy-is-so-rich-as-honesty'},
 {'txt': 'Honesty is the fastest way to prevent a mistake from turning into a failure. #<Author:0x007f279a3ad9c8>',
  'img': 'https://quotes.values.com/quote_artwork/7793/medium/20180809_thursday_quote.jpg?1533403382',
  'url': '/inspirational-quotes/7793-honesty-is-the-fastest-way-to-prevent-a-mistake'},
 {'txt': 'We know the truth, not only by the reason, but also by the heart. #<Author:0x007f279961db78>',
  'img': 'https://quotes.values.com/quote_artwork/6114/medium/20180808_wednesday_quote.jpg?1533403338',
  'url': '/inspirational-quotes/6114-we-know-the-truth-not-only-by-the-reason-but'},
 {'txt': 'There is nothing so strong or safe in an emergency of life as the simple truth. #<Author:0x007f2798be8928>',
  'img': 'https://quotes.values.com

In [101]:
import csv

In [104]:
with open("quotes.csv", "w") as f:
    writer = csv.DictWriter(f, fieldnames=['img', 'txt', 'url'])
    writer.writeheader()
    writer.writerows(articles)

![](http://www.entropywebscraping.com/wp-content/uploads/2017/02/Screenshot-from-2017-02-01-10-23-00.png)

# 4. Web automation
 
 ![](https://images.contentful.com/qs7jgwzogkzr/6HeUbprAsMYek2Keqi0WYo/d8ad7cf2f15e706ead76e00a53859cc7/testing-automation-alternatives.jpg)
 
 **Task:** Automatically submit the code for a [problem](https://www.codechef.com/problems/TEST) on [codechef](https://www.codechef.com/).
 
 ### [Selenium](http://selenium-python.readthedocs.io/) : Web automation and testing
 
 ![](https://udemy-images.udemy.com/course/750x422/482754_7146_4.jpg)
 
 
 #### Installation
 
 - To install python bindings for selenium:
     ```
     pip install selenium
     ```
     
 - To install webdriver:
 
     http://selenium-python.readthedocs.io/installation.html#drivers
     
     [How to put webdriver in PATH?](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path)
 
 #### To start a browser session
 ```python
 from selenium import webdriver
 browser = webdriver.Chrome()
 ```
 
 #### To open a webpage
 ```python
 browser.get('https://www.codechef.com')
 ```
 
 #### To select an element by its id
 ```python
 browser.find_element_by_id(<id>)
 ```
 
 #### Input value in element
 ```python
 element.send_keys()
 ```
 
 #### Click on an element
 ```python
 element.click()
 ```

In [105]:
from selenium import webdriver

In [107]:
browser = webdriver.Chrome()

In [108]:
browser.get("https://www.codechef.com")

In [109]:
username_element = browser.find_element_by_id('edit-name')

In [110]:
username_element.send_keys('nikhilksingh97')

In [111]:
from getpass import getpass

In [113]:
password_element = browser.find_element_by_id('edit-pass')

In [None]:
password_element.send_keys(getpass())

In [115]:
browser.find_element_by_id('edit-submit').click()

In [127]:
browser.get("https://www.codechef.com/submit/TEST")

In [128]:
browser.find_element_by_id('edit_area_toggle_checkbox_edit-program').click()

In [129]:
code_element = browser.find_element_by_id('edit-program')

In [130]:
code_element.send_keys(open('solution.cpp', 'r').read())

In [131]:
browser.find_element_by_xpath('//*[@id="edit-language"]/option[2]').click()

In [132]:
browser.find_element_by_id('edit-submit').click()

![](https://i.imgflip.com/poxkz.jpg)

## Resourses:

- Python packages:

    - [requests](http://docs.python-requests.org/en/master/)

    - [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
    
    - [html5lib](https://html5lib.readthedocs.io/en/latest/)
 

- Articles:

    - https://indianpythonista.wordpress.com/2016/12/10/get-and-post-requests-using-python/

    - https://indianpythonista.wordpress.com/2016/10/18/requests-http-for-pythonistas/

    - https://indianpythonista.wordpress.com/2016/12/10/downloading-files-from-web-using-python/

    - https://indianpythonista.wordpress.com/2016/12/10/implementing-web-scraping-in-python-with-beautiful-soup/


- Videos:

    - File downloader: https://www.youtube.com/watch?v=Xhw2l-hzoKk
    - Web scraping: https://www.youtube.com/watch?v=lIkd_jt28i0&t=557s