# Python for web

![](http://www.blog.skytopper.com/wp-content/uploads/2015/06/Global-computer-network.jpg)

This bootcamp is all about interacting with **web** using Python programming language.

In this bootcamp, we will learn:

- to work with web APIs
- to download content from web
- web scraping
- web crawling
- web automation

using simple python scripts!

# 1. Working with web APIs

- **What is API?**<br>
    API is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact. 
-------------

- **What is web API?**<br>
    Web API is a framework for building HTTP services that can be consumed by a broad range of clients including browsers, mobiles, iphone and tablets.

-----------------
- **Some examples of public web APIs:**
    - [Facebook Graph API](https://developers.facebook.com/docs/graph-api)
    - [Twitter API](https://dev.twitter.com/rest/public)
    - [Google API explorer](https://developers.google.com/apis-explorer/#p/)
--------------

- **What is REST?**<br>
    REST is an architectural style followed by web services, in which, they allow requesting systems to access and manipulate their Web resources using a uniform and predefined set of **stateless operations**.
    
    >In computing, a stateless protocol is a communications protocol in which no information is retained by either sender or receiver. The sender transmits a packet to the receiver and does not expect an acknowledgment of receipt. There is nothing saved that has to be remembered by the next transaction. The server must be able to completely understand the client request without using any server context or server session state. 
    
   Advantages of REST:
   - As the transactions are stateless, we can direct them to any instance of the web service. (As no sessions are involved). Hence, the web service can scale to accommodate load changes.
   - Binding to a service through an API is a matter of controlling how the URL is decoded.

-----------------
- **Types of HTTP requests**
    - GET
    - POST
    - DELETE
    - PUT
    - PATCH, etc.

### HTTP  for humans: [requests](http://docs.python-requests.org/en/master/)

<img src="http://docs.python-requests.org/en/master/_static/requests-sidebar.png"  height=200 width=200>


- Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month.All the cool kids are doing it

- Recreational use of other HTTP libraries may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death. Requests is the only Non-GMO HTTP library for Python, safe for human consumption.

- Python HTTP: When in doubt, or when not in doubt, use Requests. Beautiful, simple, Pythonic.

***Everybody loves it!***

#### Installation

```
pip install requests
```

## GET request

### Example 1
![](http://graph.facebook.com/4/picture?type=large)

![](http://i.imgur.com/gRvt4lV.png)

In [116]:
import requests

url = "http://graph.facebook.com/4/picture?type=large"

r = requests.get(url)

In [117]:
r.status_code

200

In [118]:
with open("mark.jpg","wb")  as f:
    f.write(r.content)

### Example 2

[Google maps geocoding API](https://developers.google.com/maps/documentation/geocoding/intro)

In [10]:
url = "https://maps.googleapis.com/maps/api/geocode/json"

In [21]:
params = {
    "address": "Delhi Technological University"
}

r = requests.get(url, params = params)

In [23]:
data  = r.json()

In [42]:
data['results'][0]['geometry']['location']['lat']

28.7500749

In [43]:
from pprint import pprint

In [46]:
pprint(data)

{'results': [{'address_components': [{'long_name': 'Delhi',
                                      'short_name': 'Delhi',
                                      'types': ['locality', 'political']},
                                     {'long_name': 'Delhi',
                                      'short_name': 'DL',
                                      'types': ['administrative_area_level_1',
                                                'political']},
                                     {'long_name': 'India',
                                      'short_name': 'IN',
                                      'types': ['country', 'political']},
                                     {'long_name': '110042',
                                      'short_name': '110042',
                                      'types': ['postal_code']}],
              'formatted_address': 'Shahbad Daulatpur, Main Bawana Road, '
                                   'Delhi, 110042, India',
              'geometry': {'l

## POST request

![](https://www.safaribooksonline.com/library/view/head-first-servlets/9780596516680/httpatomoreillycomsourceoreillyimages1377910.png.jpg)

### Example 1

[Pastebin API](https://pastebin.com/api)

In [47]:
key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

url = "https://pastebin.com/api/api_post.php"

In [76]:
code = '''
a = int(raw_input())
print(a**a)
'''

In [49]:
data = {
    "api_dev_key": key,
    "api_option": 'paste',
    "api_paste_code": code
}

In [50]:
r = requests.post(url, data = data)

In [51]:
r.status_code

200

In [52]:
r.content

b'https://pastebin.com/JdGM4nbE'

### Example 2

https://m.me/mycodebot

[HackeRank code checker API](https://www.hackerrank.com/api/docs)

In [67]:
key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

In [68]:
url= "http://api.hackerrank.com/checker/submission.json"

In [85]:
code = '''
a = int(raw_input())
print(a**a)
'''

In [77]:
data = {
    'format': 'json',
    'source': code,
    'lang': 5,
    'testcases':'["2","3"]',
    'api_key': key
}

In [78]:
r = requests.post(url, data = data)

In [79]:
r.status_code

200

In [84]:
r.json()['result']['stdout']

['4\n', '27\n']

# 2. Downloading files

![](http://noclone.net/images/file-types-s.png)

Downloading large files in chunks!

In [93]:
from tqdm import tqdm

url = "http://www.tutorialspoint.com/python/python_tutorial.pdf"

chunk_size = 1024

r = requests.get(url, stream =True)

iterations = int(r.headers['content-length']) / chunk_size

with open("python.pdf", "wb") as f:
    for chunk in tqdm(r.iter_content(chunk_size=chunk_size), 
                      total = iterations):
        f.write(chunk)

2717it [00:01, 1784.23it/s]                                     


# 3. Web scraping

![](https://cdn-images-1.medium.com/max/1600/0*yxxFwUEPQU3lAz4W.png)

## [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

>Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

### Installation

```
pip install bs4
```

**Bonus:**
```
pip install html5lib
```

In [97]:
from bs4 import BeautifulSoup

url = "https://www.values.com/inspirational-quotes"

r = requests.get(url)

soup = BeautifulSoup(r.content, 'html5lib')

In [100]:
quote_html = soup.findAll('div', attrs = {'class': 'quote'})

In [103]:
quote = quote_html[0]

In [104]:
quote

<div class="quote" data-url="/inspirational-quotes/7673-education-breeds-confidence">

	<a href="/inspirational-quotes/7673-education-breeds-confidence"><img alt="Discuss and promote the importance of education to the youth in your community. #education #passiton www.values.com" src="https://quotes.values.com/quote_artwork/7673/medium/20170811_friday_quote.jpg?1501716093"/></a>

	<h5>EDUCATION</h5>

	<h6><a href="/inspirational-quotes/7673-education-breeds-confidence">Education breeds confidence.</a></h6>

	<p>Confucius </p>

</div>

In [110]:
quotes = []

for data in quote_html:
    quote = {}
    quote['link'] = data.a['href']
    quote['text'] = data.h6.a.text
    quote['author'] = data.p.text
    quote['img'] = data.a.img['src']
    
    quotes.append(quote)
    

In [115]:
import csv

with open("quotes.csv", "w") as f:
    writer = csv.DictWriter(f, fieldnames = ['link', 'text', 'author','img'])
    writer.writeheader()
    
    for quote in quotes:
        writer.writerow(quote)

# 4. Web crawling

![](http://seopressor.com/wp-content/uploads/2016/04/how-crawler-works.png)

**Task:** Make a list of all public URLs on [Indian Pythonista](https://indianpythonista.wordpress.com)

### [Scrapy](https://scrapy.org/): a fast high-level web crawling & scraping framework for Python

![](https://scrapinghub.files.wordpress.com/2016/08/scrapy.png)

```
pip install scrapy
```

Start new project:
```
scrapy startproject <project-name>
```

# 5. Web automation


**Task:** Automatically submit the code for a problem on [codechef](https://www.codechef.com/).

### [Selenium](http://selenium-python.readthedocs.io/) : Web automation and testing

![](https://udemy-images.udemy.com/course/750x422/482754_7146_4.jpg)


#### Installation

```
pip install selenium
```

#### To start a browser session
```python
from selenium import webdriver
browser = webdriver.Firefox()
```

#### To open a webpage
```python
browser.get('https://www.codechef.com')
```

#### To select an element by its id
```python
browser.find_element_by_id(<id>)
```

#### Input value in element
```python
element.send_keys()
```

#### Click on an element
```python
element.click()
```