# <p style="text-align:center;font-size:70px;background-color:#0ad61b;color:white;font-style:italic;">Python for web</p>

![](http://www.blog.skytopper.com/wp-content/uploads/2015/06/Global-computer-network.jpg)

This bootcamp is all about interacting with **web** using Python programming language!

In this bootcamp, we will learn:

- to work with web APIs
- to download content from web
- web scraping
- web automation

using simple python scripts!

![](https://i.amz.mshcdn.com/mqczOBQlR2uS7uALqB4fkKylDx0=/fit-in/1200x9600/https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F193985%2Fnewhere.jpg)

In [1]:
import math

In [2]:
math.acos?

# 1. Working with web APIs

- **What is API?**<br>
    API is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact. 
![](https://www.retriever.nl/wp-content/uploads/2016/11/api-321x250.png)
-------------

- **What is web API?**<br>
    Web API is a framework for building HTTP services that can be consumed by a broad range of clients including browsers, mobiles, iphone and tablets.
![](http://dselva.co.in/blog/wp-content/uploads/2017/09/Web-APIs.png)
-----------------
- **Some examples of public web APIs:**
    - [Facebook Graph API](https://developers.facebook.com/docs/graph-api)
    - [Twitter API](https://dev.twitter.com/rest/public)
    - [Google API explorer](https://developers.google.com/apis-explorer/#p/)
--------------

- **What is REST?**<br>
    REST is an architectural style followed by web services, in which, they allow requesting systems to access and manipulate their Web resources using a uniform and predefined set of **stateless operations**.
    
    >In computing, a stateless protocol is a communications protocol in which no information is retained by either sender or receiver. The sender transmits a packet to the receiver and does not expect an acknowledgment of receipt. There is nothing saved that has to be remembered by the next transaction. The server must be able to completely understand the client request without using any server context or server session state. 
    
   Advantages of REST:
   - As the transactions are stateless, we can direct them to any instance of the web service. (As no sessions are involved). Hence, the web service can scale to accommodate load changes.
   - Binding to a service through an API is a matter of controlling how the URL is decoded.

-----------------
- **Types of HTTP requests**
    - GET
    - POST
    - DELETE
    - PUT
    - PATCH, etc.
    
![](http://lotsofthing.com/wp-content/uploads/2017/11/rest-api-1.jpg)

### HTTP  for humans: [requests](http://docs.python-requests.org/en/master/)

<img src="http://docs.python-requests.org/en/master/_static/requests-sidebar.png"  height=200 width=200>


- Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month.All the cool kids are doing it

- Recreational use of other HTTP libraries may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death. Requests is the only Non-GMO HTTP library for Python, safe for human consumption.

- Python HTTP: When in doubt, or when not in doubt, use Requests. Beautiful, simple, Pythonic.

***Everybody loves it!***

#### Installation

```
pip install requests
```

## GET request

### Example 1

http://graph.facebook.com/4/picture?type=large

![](http://graph.facebook.com/4/picture?type=large)

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTJ3ELNuC_coeH9tvLn62fsTMoe-vMVQrsfTrLUOIhsUI69i5QIyg)

![](http://graph.facebook.com/100002953986902/picture?type=large)

![](http://i.imgur.com/gRvt4lV.png)

In [3]:
with open("output.bin", "rb") as file:
    print(file.read())

FileNotFoundError: [Errno 2] No such file or directory: 'output.bin'

In [4]:
import requests

In [5]:
url = "https://reqres.in"

In [6]:
all_user_endpoint = "/api/users/"
r = requests.get(url + all_user_endpoint)

In [7]:
r.headers

{'Date': 'Thu, 27 Jun 2019 05:45:52 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=dc001213f6ee3e7b767f09cad62eb590b1561614352; expires=Fri, 26-Jun-20 05:45:52 GMT; path=/; domain=.reqres.in; HttpOnly; Secure', 'X-Powered-By': 'Express', 'Access-Control-Allow-Origin': '*', 'Etag': 'W/"21b-fFOA+z2p0qqZtiTcYG5d6jtK5Cs"', 'Via': '1.1 vegur', 'CF-Cache-Status': 'REVALIDATED', 'Expires': 'Thu, 27 Jun 2019 09:45:52 GMT', 'Cache-Control': 'public, max-age=14400', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '4ed51e08dfcc6b6b-LHR', 'Content-Encoding': 'gzip'}

In [8]:
content = r.content

In [9]:
content = content.decode("UTF-8")

In [10]:
type(content)

str

In [11]:
import json

In [12]:
data = json.loads(content)

In [13]:
data

{'page': 1,
 'per_page': 3,
 'total': 12,
 'total_pages': 4,
 'data': [{'id': 1,
   'email': 'george.bluth@reqres.in',
   'first_name': 'George',
   'last_name': 'Bluth',
   'avatar': 'https://s3.amazonaws.com/uifaces/faces/twitter/calebogden/128.jpg'},
  {'id': 2,
   'email': 'janet.weaver@reqres.in',
   'first_name': 'Janet',
   'last_name': 'Weaver',
   'avatar': 'https://s3.amazonaws.com/uifaces/faces/twitter/josephstein/128.jpg'},
  {'id': 3,
   'email': 'emma.wong@reqres.in',
   'first_name': 'Emma',
   'last_name': 'Wong',
   'avatar': 'https://s3.amazonaws.com/uifaces/faces/twitter/olegpogodaev/128.jpg'}]}

### Example 2

[Google maps geocoding API (Paid)](https://developers.google.com/maps/documentation/geocoding/intro)

[Open Street Map](https://nominatim.openstreetmap.org/search/)

In [14]:
base_url = "https://nominatim.openstreetmap.org/search/"

In [15]:
parameters = {
    "format": "json",
    "q": "coding blocks",
    "name": "jatin"
}

In [16]:
r = requests.get(base_url, params=parameters)

In [17]:
r.url

'https://nominatim.openstreetmap.org/search/?format=json&q=coding+blocks&name=jatin'

In [18]:
data = json.loads(r.content.decode('utf-8'))

In [19]:
data

[{'place_id': 237920251,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'way',
  'osm_id': 349993758,
  'boundingbox': ['28.6967764', '28.6969339', '77.1424004', '77.1425482'],
  'lat': '28.6968552',
  'lon': '77.1424743283951',
  'display_name': 'Coding Blocks, Pitampura, 47, Mahatma Gandhi Road, Rohit Kunj, North West Delhi, Delhi, 110034, India',
  'class': 'office',
  'type': 'educational_institution',
  'importance': 0.201}]

In [20]:
data[0]['lat'], data[0]['lon']

('28.6968552', '77.1424743283951')

## POST request

![](https://indianpythonista.files.wordpress.com/2016/12/iservice_post_get.png?w=809)

### Example 1

[Pastebin API](https://pastebin.com/api)

In [77]:
key = "e7d82892722c4597a57091897f2a449f"

base_url = "https://pastebin.com/api/api_post.php"

data = {
    "api_dev_key": key,
    "api_option": "paste",
    "api_paste_format": "Python",
    "api_paste_code": """
        for i in range(5):
            for i in range(5):
                print(min(i, j, n-i-1, n-j-1))
    """
}

In [78]:
r = requests.post(base_url, data = data)

In [79]:
r.content

b'https://pastebin.com/zhSxpDLR'

### Hot or Not

In [80]:
base_url_format = "http://graph.facebook.com/{}/picture?type=large"

In [81]:
r = requests.get(base_url_format.format(4))

In [83]:
with open("zuckerbhai.png", "wb") as file:
    file.write(r.content)

In [85]:
import os

In [87]:
if not 'fb_profiles' in os.listdir():
    os.mkdir('fb_profiles')

In [84]:
for i in range(4, 100):
    gen_url = base_url_format.format(i)
    r = requests.get(gen_url)
    with open("fb_profiles/{}_image.png".format(i), "wb") as file:
        file.write(r.content)
    print("done with {} id".format(i))

done with 4 id
done with 5 id
done with 6 id
done with 7 id
done with 8 id
done with 9 id
done with 10 id
done with 11 id
done with 12 id
done with 13 id
done with 14 id
done with 15 id
done with 16 id
done with 17 id
done with 18 id
done with 19 id
done with 20 id
done with 21 id
done with 22 id
done with 23 id
done with 24 id
done with 25 id
done with 26 id
done with 27 id
done with 28 id
done with 29 id
done with 30 id
done with 31 id
done with 32 id
done with 33 id
done with 34 id
done with 35 id
done with 36 id
done with 37 id
done with 38 id
done with 39 id
done with 40 id
done with 41 id
done with 42 id
done with 43 id
done with 44 id
done with 45 id
done with 46 id
done with 47 id
done with 48 id
done with 49 id
done with 50 id
done with 51 id
done with 52 id
done with 53 id
done with 54 id
done with 55 id
done with 56 id
done with 57 id
done with 58 id
done with 59 id
done with 60 id
done with 61 id
done with 62 id
done with 63 id
done with 64 id
done with 65 id
done with 66 i

# 2. Downloading files

![](https://pics.onsizzle.com/downloading-98-downloading-99-downloading-failed-11367153.png)

Downloading large files in chunks!

http://www.greenteapress.com/thinkpython/thinkpython.pdf

```python
chunk_size = 256
r = requests.get(url, stream=True)

with open("python.pdf", "wb") as f:
    for chunk in r.iter_content(chunk_size=chunk_size):
        f.write(chunk)
```

In [117]:
import requests

In [118]:
import tqdm

In [119]:
url=("http://www.cartographicperspectives.org/index.php/journal/article/view/cp43-complete-issue/577")

In [120]:
chunk_size=256

In [121]:
r=requests.get(url,stream=True)

In [132]:
import requests

In [133]:
import tqdm

In [134]:
import math

In [135]:
url1=("https://cb.lk/4w1Q1")

In [136]:
chunk=256

In [137]:
r=requests.get(url1,stream=True)

In [138]:
r.headers

{'Date': 'Thu, 03 Jan 2019 04:28:48 GMT', 'Server': 'Apache', 'Last-Modified': 'Mon, 22 Feb 2016 17:23:24 GMT', 'ETag': '"cbc7d-52c5f17101b00"', 'Accept-Ranges': 'bytes', 'Content-Length': '834685', 'Content-Type': 'application/pdf', 'Connection': 'keep-alive'}

In [139]:
r.headers['Date']

'Thu, 03 Jan 2019 04:28:48 GMT'

In [96]:
# with open("lafbw.pdf","wb")as file:
#     file.write(r.content)

In [140]:
n=math.ceil(int(r.headers['Content-Length'])/chunk)

In [142]:
with open("lar.pdf","wb") as file:
    for chunki in tqdm.tqdm(r.iter_content(chunk_size=chunk),total=n,unit="KB",unit_scale=(256/1024)):
        file.write(chunki)


  0%|                                                                                     | 0.0/815.25 [00:00<?, ?KB/s]
 17%|████████████                                                            | 136.25/815.25 [00:00<00:00, 1233.73KB/s]
 28%|███████████████████▊                                                    | 224.75/815.25 [00:00<00:00, 1097.70KB/s]
 38%|███████████████████████████▊                                              | 306.0/815.25 [00:00<00:00, 986.76KB/s]
 49%|███████████████████████████████████▉                                      | 396.5/815.25 [00:00<00:00, 941.07KB/s]
 60%|████████████████████████████████████████████▌                             | 491.0/815.25 [00:00<00:00, 925.24KB/s]
 71%|████████████████████████████████████████████████████▊                     | 581.5/815.25 [00:00<00:00, 916.76KB/s]
 83%|████████████████████████████████████████████████████████████▏            | 672.75/815.25 [00:00<00:00, 903.38KB/s]
 94%|██████████████████████████████████

In [None]:
chunk_size=

# 3. Web scraping

![](https://image.slidesharecdn.com/scrapingtotherescue-160713133749/95/getting-started-with-web-scraping-in-python-9-638.jpg?cb=1468417631)


## [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

>Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

### Installation

```
pip install bs4
```

**Bonus:**
```
pip install html5lib
```

https://www.values.com/inspirational-quotes

In [37]:
from bs4 import BeautifulSoup

In [8]:
content=''
with open("index.html","r+")as file:
    content=file.read()

FileNotFoundError: [Errno 2] No such file or directory: 'index.html'

In [9]:
soup=BeautifulSoup("td")

In [38]:
url=("https://www.passiton.com/inspirational-quotes")


In [39]:
import requests

In [40]:
r=requests.get(url)

In [41]:
content=r.content.decode("utf-8")

In [42]:
content

'<!DOCTYPE html>\n<html dir="ltr" lang="en-US">\n<head>\n    <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n    <link href="https://fonts.googleapis.com/css?family=Roboto:900|Lato:300,400,400italic,600,700|Raleway:300,400,500,600,700|Crete+Round:400italic|Zilla+Slab" rel="stylesheet" type="text/css" />\n    <link rel="stylesheet" media="all" href="/assets/application-f7c42ac1a26bc766307e42ff5df71b58.css" />\n\n    <meta charset="utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <!--[if lt IE 9]><script src="http://css3-mediaqueries-js.googlecode.com/svn/trunk/css3-mediaqueries.js"></script><![endif]-->\n    <title>Inspirational Quotes - Motivational Quotes - Leadership Quotes | PassItOn.com</title>\n    <meta name="description" content="Find the perfect quotation from our hand-picked collection of inspiring quotes by hundreds of authors." />\n    <meta name="csrf-param" content="authenticity_token" />\n<meta name="csrf-to

In [43]:
soup=BeautifulSoup(content)

In [44]:
soup

<!DOCTYPE html>
<html dir="ltr" lang="en-US"><head>
    <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
    <link href="https://fonts.googleapis.com/css?family=Roboto:900|Lato:300,400,400italic,600,700|Raleway:300,400,500,600,700|Crete+Round:400italic|Zilla+Slab" rel="stylesheet" type="text/css"/>
    <link href="/assets/application-f7c42ac1a26bc766307e42ff5df71b58.css" media="all" rel="stylesheet"/>

    <meta charset="utf-8"/>
    <meta content="width=device-width, initial-scale=1" name="viewport"/>
    <!--[if lt IE 9]><script src="http://css3-mediaqueries-js.googlecode.com/svn/trunk/css3-mediaqueries.js"></script><![endif]-->
    <title>Inspirational Quotes - Motivational Quotes - Leadership Quotes | PassItOn.com</title>
    <meta content="Find the perfect quotation from our hand-picked collection of inspiring quotes by hundreds of authors." name="description"/>
    <meta content="authenticity_token" name="csrf-param"/>
<meta content="LsZTPUNGVbU1eTKduMcjyyFitK

In [16]:
articles=soup.findAll('article')

In [17]:
articles[0]

<article class="portfolio-item quotation optimism">
    <div class="portfolio-image">
        <a href="/inspirational-quotes/7858-what-a-wonderful-thought-it-is-that-some-of-the"><img alt="What a wonderful thought it is that some of the best days of our lives haven't even happened yet. #&lt;Author:0x007f2240266998&gt;" class="hover" src="https://quotes.values.com/quote_artwork/7858/medium/20190101_tuesday_quote_alternate.jpg?1546208533"/></a>
    </div>
</article>

In [18]:
articles[0].find('img')['alt']

"What a wonderful thought it is that some of the best days of our lives haven't even happened yet. #<Author:0x007f2240266998>"

In [19]:
d={}
for i,article in enumerate(articles):
    d[i]=article.find('img')['alt']
    print(i, "=>", article.find('img')['alt'])

0 => What a wonderful thought it is that some of the best days of our lives haven't even happened yet. #<Author:0x007f2240266998>
1 => The future belongs to those who believe in the beauty of their dreams. #<Author:0x007f223f1d2fa0>
2 => Love is not only something you feel, it is something you do. #<Author:0x007f223db8d5b0>
3 => Your circumstance doesn’t make life extraordinary. Love does. #<Author:0x007f223f360e58>
4 => Let us see what love can do. #<Author:0x007f223ceb48c0>
5 => And the Grinch, with his Grinch-feet ice cold in the snow, stood puzzling and puzzling, how could it be so? It came without ribbons. It came without tags. It came without packages, boxes or bags. And he puzzled and puzzled 'till his puzzler was sore. Then the Grinch thought of something he hadn't before. What if Christmas, he thought, doesn't come from a store? What if Christmas, perhaps, means a little bit more? #<Author:0x007f223c9ff8c0>
6 => The love in your heart wasn't put there to stay. Love isn't love 

In [25]:
d

{0: "What a wonderful thought it is that some of the best days of our lives haven't even happened yet. #<Author:0x007f2240266998>",
 1: 'The future belongs to those who believe in the beauty of their dreams. #<Author:0x007f223f1d2fa0>',
 2: 'Love is not only something you feel, it is something you do. #<Author:0x007f223db8d5b0>',
 3: 'Your circumstance doesn’t make life extraordinary. Love does. #<Author:0x007f223f360e58>',
 4: 'Let us see what love can do. #<Author:0x007f223ceb48c0>',
 5: "And the Grinch, with his Grinch-feet ice cold in the snow, stood puzzling and puzzling, how could it be so? It came without ribbons. It came without tags. It came without packages, boxes or bags. And he puzzled and puzzled 'till his puzzler was sore. Then the Grinch thought of something he hadn't before. What if Christmas, he thought, doesn't come from a store? What if Christmas, perhaps, means a little bit more? #<Author:0x007f223c9ff8c0>",
 6: "The love in your heart wasn't put there to stay. Love

In [26]:
# import json

In [27]:
# filet=json.dump(d,'filrt')

In [28]:
import os

In [29]:
filet

NameError: name 'filet' is not defined

In [46]:
if not 'moti' in os.listdir():
    os.mkdir('moti')

In [47]:
import json

In [48]:
with open("jst1.json","w+")as file:
    json.dump(d,file)

In [34]:
# import os

In [35]:
# if not 'moti' in os.listdir():
#     os.mkdir('moti')

In [36]:
for i,article in enumerate(articles):
    img_url=article.find('img')['src']
    r=requests.get(img_url)
    with open("moti/{}.jpg".format(i),"wb")as file:
        file.write(r.content)

In [41]:
articles=soup.find_all('div',attr={"class":"portfolio-image"})

![](http://www.entropywebscraping.com/wp-content/uploads/2017/02/Screenshot-from-2017-02-01-10-23-00.png)

# 4. Web automation
 
 ![](https://images.contentful.com/qs7jgwzogkzr/6HeUbprAsMYek2Keqi0WYo/d8ad7cf2f15e706ead76e00a53859cc7/testing-automation-alternatives.jpg)
 
 **Task:** Automatically submit the code for a [problem](https://www.codechef.com/problems/TEST) on [codechef](https://www.codechef.com/).
 
 ### [Selenium](http://selenium-python.readthedocs.io/) : Web automation and testing
 
 ![](https://udemy-images.udemy.com/course/750x422/482754_7146_4.jpg)
 
 
 #### Installation
 
 - To install python bindings for selenium:
     ```
     pip install selenium
     ```
     
 - To install webdriver:
 
     http://selenium-python.readthedocs.io/installation.html#drivers
     
     [How to put webdriver in PATH?](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path)
 
 #### To start a browser session
 ```python
 from selenium import webdriver
 browser = webdriver.Chrome()
 ```
 
 #### To open a webpage
 ```python
 browser.get('https://www.codechef.com')
 ```
 
 #### To select an element by its id
 ```python
 browser.find_element_by_id(<id>)
 ```
 
 #### Input value in element
 ```python
 element.send_keys()
 ```
 
 #### Click on an element
 ```python
 element.click()
 ```

![](https://i.imgflip.com/poxkz.jpg)

## Resourses:

- Python packages:

    - [requests](http://docs.python-requests.org/en/master/)

    - [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
    
    - [html5lib](https://html5lib.readthedocs.io/en/latest/)
 

- Articles:

    - https://indianpythonista.wordpress.com/2016/12/10/get-and-post-requests-using-python/

    - https://indianpythonista.wordpress.com/2016/10/18/requests-http-for-pythonistas/

    - https://indianpythonista.wordpress.com/2016/12/10/downloading-files-from-web-using-python/

    - https://indianpythonista.wordpress.com/2016/12/10/implementing-web-scraping-in-python-with-beautiful-soup/


- Videos:

    - File downloader: https://www.youtube.com/watch?v=Xhw2l-hzoKk
    - Web scraping: https://www.youtube.com/watch?v=lIkd_jt28i0&t=557s