## Summary

### Requests
The requests library is used to send all kinds of HTTP requests. Here is a basic example of a GET request:

    # Making a GET request
    response = requests.get('https://www.example.com')

    # print the status code
    print(response.status_code)

    # print the content of the response
    print(response.text)

#### Status_code
- Codes between 200-299 indicate that the request was successful
- Codes between 400-599 indicate errors. Examples include
  - The 404 error when trying to access a website that no longer exists;
  - The 403 error indicates the client is forbidden from accessing a valid URL.
To scrape a website, you will typically make a GET request to the website and then parse the content of the response. For parsing the HTML content, you can use libraries like `BeautifulSoup`.

### BeautifulSoup
HTML files are text files that can be opened and inspected by text editors. While we could use a notepad application to search for tags and the content within them manually, it's easier to do this using BeautifulSoup!

BeautifulSoup is an HTML parser written in the Python programming language. Interesting aside: the name is derived from the "tag soup" which refers to the unstructured and difficult-to-parse HTML found on many websites.

BeautifulSoup's `.find()` and `.find_all()` are immensely useful for finding tags. `.find()` will return the first result where the tag is used. `.find_all()` will return all results that match the conditions. Then we can extract the text using .text.

    # Parse the HTML content of the page with BeautifulSoup
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the first <h1> tag on the page
    h1_tag = soup.find('h1')

    # Print the text of the first <h1> tag
    print(h1_tag.text)

### Additional Reading: Hypertext Markup Language
The Hypertext Markup Language (or HTML) is used to create World Wide Web documents.

An HTML file is a collection of tags - here's a recap of the HTML tags we covered:

- `<div>`: Div element is used to chunk content together.

- `<h1>,<h2>,<h3>,<h4>`: Heading elements are used for section headings.

- `<img>`: Image element is used to embed an image in a web page.

- `<p>`: Paragraph element is used for standard blocks of text.

- `<span>`: Span element is used to group text within another block of text, often for styling.

- `<a>`: Hyperlink tag is used to link to one page from another.
  
As you write HTML code, the hierarchy you create can grow as wide or deep as you need.

#### Comparing file formats
Recall that HTML is used for building websites, whereas JSON is used for communicating information. Both JSON and HTML files allow for putting elements inside other elements. However, JSON can handle more complex data, whereas HTML has a simpler structure. However, HTML has a much more complicated structure compared to flat files.

### XML files
At some point in time, you may also encounter XML files. XML stands for Extensible Markup Language and is a popular format for communicating information as an alternative to JSON.

Note that unlike HTML and similar to JSON, it isn’t used to compose a website’s structure. However, the similarity between XML and HTML is the presence of tags and information that can be nestled between them.

Example of an XML file:

    <note>
    <to>Barry</to>
    <from>Annie</from>
    <heading>Data Rocks!</heading>
    <body>I'm learning data wrangling - I love it!</body>
    </note>

[DEMO WORKSPACE](./5.4d.ipynb)

## Additional References

[]()