# Intro

**API (Application Programming Interface)** - a set of rules and protocols that allows different software applications to communicate, exchange data, and share functionality

![alt text](./imgs/api_arch.jpg)

# REST ( Representational State Transfer)

**REST** - an architectural style for web services that simplifies communication between systems by using standard HTTP methods

## HTTP METHODS

![alt text](./imgs/http.png)

## Working with APIs

###  Get  Request

In [None]:
import requests

In [None]:
response = requests.get('<https://api.example.com/data>')

If you want to pass parameters, you can do so with the help of `params` argument of a `get()` method. i.e. https://httpbin.org/get?key2=value2&key1=value1

In [None]:
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)

### POST Request

If you want to send some data, like login and password, then you should use `data` argument.

In [None]:
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('<https://api.example.com/data>', data=payload)

In [None]:
header = {"Authorization": api_key}

    payload = {
        "sortBy": "DEFAULT",
        "order": "ASC",
        "size": 100,
        "page": 0,
        "filter": "ALL",
    }

    resp = requests.get(base_url + product, headers=header, params=payload).json()

## Response

### Content

`response.text` - returns text<br>
`response.json()` - return JSON object <br>
`response.content` - return binary content

### Status Codes

![alt text](./imgs/status-code.png)

`response.status_code` - returns status code of your response

In [6]:
import requests
base_url = "https://rickandmortyapi.com/api"
character = "/character"

In [8]:
response = requests.get(base_url+character)

In [10]:
type(response)

requests.models.Response

In [12]:
response.status_code

200

In [26]:
response.json()['info']

{'count': 826,
 'pages': 42,
 'next': 'https://rickandmortyapi.com/api/character?page=2',
 'prev': None}

# Beautiful Soup

![alt text](./imgs/overview.png)

**Beautiful Soup** is a Python library for pulling data out of HTML and XML files.

|Parser|	Typical usage|	Advantages|	Disadvantages|
|---------|--------------|------------|--------------|
|Python’s html.parser|	`BeautifulSoup(markup, "html.parser")`|<ul><li>Batteries included</li><li> Decent speed</li><li> Lenient (As of Python 2.7.3 and 3.2.)</li></ul>|Not as fast as lxml, less lenient than html5lib.|
|lxml’s HTML parser|	`BeautifulSoup(markup, "lxml")`	|<ul><li>Very fast</li> <li>Lenient</li></ul>| External C dependency|
|lxml’s XML parser|`BeautifulSoup(markup, "lxml-xml")` `BeautifulSoup(markup, "xml")`|<ul><li>Very fast</li><li> The only currently supported XML parser</li></ul> |External C dependency|
|html5lib|	`BeautifulSoup(markup, "html5lib")`	|<ul><li>Extremely lenient</li><li> Parses pages the same way a web browser does</li><li> Creates valid HTML5</li></ul>|<ul><li>Very slow</li><li> External Python dependency</li></ul>|

**Main methods** - `find()` and `findall()`.

In [28]:
from bs4 import BeautifulSoup

In [30]:
site = "https://www.scrapethissite.com/pages/simple/"
page = requests.get(site)

In [36]:
soup = BeautifulSoup(page.content, "html.parser")

In [38]:
soup

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Countries of the World: A Simple Example | Scrape This Site | A public sandbox for learning web scraping</title>
<link href="/static/images/scraper-icon.png" rel="icon" type="image/png"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<meta content="A single page that lists information about all the countries in the world. Good for those just get started with web scraping." name="description"/>
<link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css" integrity="sha256-MfvZlkHCEqatNoGiOXveE8FIwMzZg4W85qfrfIFBfYc= sha512-dTfge/zgoMYpP7QbHy4gWMEGsbsdZeCXz7irItjcC3sPUFtf0kuFbDz/ixG7ArTxmDjLXDmezHubeNikyKGVyQ==" rel="stylesheet"/>
<link href="https://fonts.googleapis.com/css?family=Lato:400,700" rel="stylesheet" type="text/css"/>
<link href="/static/css/styles.css" rel="stylesheet" type="text/css"/>
<meta content="noindex" name="robots"/>
<link h

## Find Elements By ID

In [40]:
results = soup.find(id="page")

In [42]:
results

<div id="page">
<section id="countries">
<div class="container">
<div class="row">
<div class="col-md-12">
<h1>
                            Countries of the World: A Simple Example
                            <small>250 items</small>
</h1>
<hr/>
</div>
</div>
<div class="row">
<div class="col-md-12">
<p class="lead">
                            A single page that lists information about all the countries in the world. Good for those just get started with web scraping.
                            Practice looking for patterns in the HTML that will allow you to extract information about each country. Then, build a simple web scraper that makes a request to this page, parses the HTML and prints out each country's name.
                        </p>
<hr/>
</div>
</div>
<div class="row">
<div class="col-md-6">
<p>
<i class="glyphicon glyphicon-education"></i> There are <a href="/lessons/">4 video lessons</a> that show you how to scrape this page.
                        </p>
<hr/>
</div>
<di

In [47]:

print(results.prettify())

<div id="page">
 <section id="hockey">
  <div class="container">
   <div class="row">
    <div class="col-md-12">
     <h1>
      Hockey Teams: Forms, Searching and Pagination
      <small>
       25 items
      </small>
     </h1>
     <hr/>
    </div>
   </div>
   <div class="row">
    <div class="col-md-12">
     <p class="lead">
      Browse through a database of NHL team stats since 1990. Practice building a scraper that handles common website interface components.
                            Take a look at how pagination and search elements change the URL as your browse. Build a web scraper that can conduct searches and paginate through the results.
     </p>
     <hr/>
    </div>
   </div>
   <div class="row">
    <div class="col-md-6">
     <p>
      <i class="glyphicon glyphicon-education">
      </i>
      There are
      <a href="/lessons/">
       8 video lessons
      </a>
      that show you how to scrape this page.
     </p>
     <hr/>
    </div>
    <div class="col-md-6

## Find Elements by HTML Class Name

In [56]:
country_info = results.find_all("div", class_="country-info")
country_name = results.find_all("h3", class_="country-name")

In [50]:
type(country_info)

bs4.element.ResultSet

In [58]:
country_name

[<h3 class="country-name">
 <i class="flag-icon flag-icon-ad"></i>
                             Andorra
                         </h3>,
 <h3 class="country-name">
 <i class="flag-icon flag-icon-ae"></i>
                             United Arab Emirates
                         </h3>,
 <h3 class="country-name">
 <i class="flag-icon flag-icon-af"></i>
                             Afghanistan
                         </h3>,
 <h3 class="country-name">
 <i class="flag-icon flag-icon-ag"></i>
                             Antigua and Barbuda
                         </h3>,
 <h3 class="country-name">
 <i class="flag-icon flag-icon-ai"></i>
                             Anguilla
                         </h3>,
 <h3 class="country-name">
 <i class="flag-icon flag-icon-al"></i>
                             Albania
                         </h3>,
 <h3 class="country-name">
 <i class="flag-icon flag-icon-am"></i>
                             Armenia
                         </h3>,
 <h3 class="countr

In [64]:
for country in country_name:
    print(country.text)



                            Andorra
                        


                            United Arab Emirates
                        


                            Afghanistan
                        


                            Antigua and Barbuda
                        


                            Anguilla
                        


                            Albania
                        


                            Armenia
                        


                            Angola
                        


                            Antarctica
                        


                            Argentina
                        


                            American Samoa
                        


                            Austria
                        


                            Australia
                        


                            Aruba
                        


                            Åland
                        


              

In [54]:
for country in country_info:
    capital = country.find("span", class_="country-capital")
    print(capital.text)

Andorra la Vella
Abu Dhabi
Kabul
St. John's
The Valley
Tirana
Yerevan
Luanda
None
Buenos Aires
Pago Pago
Vienna
Canberra
Oranjestad
Mariehamn
Baku
Sarajevo
Bridgetown
Dhaka
Brussels
Ouagadougou
Sofia
Manama
Bujumbura
Porto-Novo
Gustavia
Hamilton
Bandar Seri Begawan
Sucre
Kralendijk
Brasília
Nassau
Thimphu
None
Gaborone
Minsk
Belmopan
Ottawa
West Island
Kinshasa
Bangui
Brazzaville
Bern
Yamoussoukro
Avarua
Santiago
Yaoundé
Beijing
Bogotá
San José
Havana
Praia
Willemstad
Flying Fish Cove
Nicosia
Prague
Berlin
Djibouti
Copenhagen
Roseau
Santo Domingo
Algiers
Quito
Tallinn
Cairo
Laâyoune / El Aaiún
Asmara
Madrid
Addis Ababa
Helsinki
Suva
Stanley
Palikir
Tórshavn
Paris
Libreville
London
St. George's
Tbilisi
Cayenne
St Peter Port
Accra
Gibraltar
Nuuk
Bathurst
Conakry
Basse-Terre
Malabo
Athens
Grytviken
Guatemala City
Hagåtña
Bissau
Georgetown
Hong Kong
None
Tegucigalpa
Zagreb
Port-au-Prince
Budapest
Jakarta
Dublin
None
Douglas
New Delhi
None
Baghdad
Tehran
Reykjavik
Rome
Saint Helier
Kingston

# Dynamic

In [89]:
base_url = "https://www.scrapethissite.com/pages/ajax-javascript/?ajax=true&year=2013"

In [91]:
page = requests.get(base_url)

In [99]:
page.json()

[{'title': '12 Years a Slave',
  'year': 2013,
  'awards': 3,
  'nominations': 9,
  'best_picture': True},
 {'title': 'Gravity', 'year': 2013, 'awards': 7, 'nominations': 10},
 {'title': 'Dallas Buyers Club', 'year': 2013, 'awards': 3, 'nominations': 6},
 {'title': 'Frozen', 'year': 2013, 'awards': 2, 'nominations': 2},
 {'title': 'The Great Gatsby', 'year': 2013, 'awards': 2, 'nominations': 2},
 {'title': 'Her', 'year': 2013, 'awards': 1, 'nominations': 5},
 {'title': 'Blue Jasmine', 'year': 2013, 'awards': 1, 'nominations': 3},
 {'title': 'Mr Hublot', 'year': 2013, 'awards': 1, 'nominations': 1},
 {'title': 'The Lady in Number 6: Music Saved My Life',
  'year': 2013,
  'awards': 1,
  'nominations': 1},
 {'title': 'Helium', 'year': 2013, 'awards': 1, 'nominations': 1},
 {'title': 'The Great Beauty', 'year': 2013, 'awards': 1, 'nominations': 1},
 {'title': '20 Feet from Stardom',
  'year': 2013,
  'awards': 1,
  'nominations': 1}]

In [92]:
parsed_page = BeautifulSoup(page.content, "html.parser")

In [85]:
print(parsed_page.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Oscar Winning Films: AJAX and Javascript | Scrape This Site | A public sandbox for learning web scraping
  </title>
  <link href="/static/images/scraper-icon.png" rel="icon" type="image/png"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <meta content="Click through a bunch of great films. Learn how content is added to the page asynchronously with Javascript and how you can scrape it." name="description"/>
  <link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css" integrity="sha256-MfvZlkHCEqatNoGiOXveE8FIwMzZg4W85qfrfIFBfYc= sha512-dTfge/zgoMYpP7QbHy4gWMEGsbsdZeCXz7irItjcC3sPUFtf0kuFbDz/ixG7ArTxmDjLXDmezHubeNikyKGVyQ==" rel="stylesheet"/>
  <link href="https://fonts.googleapis.com/css?family=Lato:400,700" rel="stylesheet" type="text/css"/>
  <link href="/static/css/styles.css" rel="stylesheet" type="text/css"/>
  <meta content

# Resources

https://www.youtube.com/@JohnWatsonRooney

https://www.scrapethissite.com/pages/simple/