# Fetching Web Page Content

Now that we have a basic understanding of web scraping and the composition of a web page, let's proceed to obtaining the HTML content of a web page. We will use the `requests` library for this task.

## Required Libraries

Before we proceed, ensure that you installed all the necessary libraries in the `requirements.txt` file. If not, you can install them using the following command:

```bash
!pip install -r requirements.txt
```

From here, I will assume that you have installed all the necessary libraries. If you haven't, please do so before proceeding.

## What is the `requests` library?

`requests` is a Python library that allows you to send HTTP requests using Python. It is a very powerful library and is widely used for obtaining web page content. You can find more information about the `requests` library [here](https://docs.python-requests.org/en/latest/).

## Basic Usage

The `requests` library is very easy to use. You can use it to send HTTP requests to a web server and obtain the response. The response will contain the HTML content of the web page. Here is a simple example of how to use the `requests` library to obtain the HTML content of a web page:

In [1]:
import requests

link = "https://bicol-u.edu.ph/"
response = requests.get(link)
response

<Response [200]>

Let's analyze the code above:

1. We first import the `requests` library using the `import requests` statement.

2. Variable `link` is then defined. It contains the URL of the web page we want to scrape. In this case, it's the website of [Bicol University](https://bicol-u.edu.ph/).

3. The `requests.get()` function is called with the `link` variable as an argument. This function sends an HTTP GET request to the web server and returns the response. The response is then stored in the `response` variable.

4. Finally, we print the `response` variable to see the result. Since we are using a Jupyter notebook, no need to use the `print()` function. We can just type the variable name and run the cell.

## Response Content

To see the HTML content of the web page, we can access the `text` attribute of the `response` variable. This attribute contains the HTML content of the web page. Since the HTML content is quite long, we will only display the first 1000 characters of the HTML content.

In [2]:
response.text[:1000]

'<!DOCTYPE html>\n<html lang="en-US">\n    <head>\n        <meta charset="UTF-8">\n        <meta name="viewport" content="width=device-width, initial-scale=1">\n        <link rel="icon" href="/wp-content/uploads/2022/11/Bicol_University-1.png" sizes="any">\n                <link rel="apple-touch-icon" href="/wp-content/themes/yootheme/packages/theme-wordpress/assets/images/apple-touch-icon.png">\n                <title>Official Website of Bicol University</title>\n<meta name=\'robots\' content=\'max-image-preview:large\' />\n<link rel=\'dns-prefetch\' href=\'//translate.google.com\' />\n<link rel="alternate" type="application/rss+xml" title="Official Website of Bicol University &raquo; Feed" href="https://bicol-u.edu.ph/feed/" />\n<link rel="alternate" type="application/rss+xml" title="Official Website of Bicol University &raquo; Comments Feed" href="https://bicol-u.edu.ph/comments/feed/" />\n<script type="text/javascript">\n/* <![CDATA[ */\nwindow._wpemojiSettings = {"baseUrl":"https:

<h3 style="color: darkorange">Note: Persistent Variables in Jupyter Notebook</h3>

Have you noticed that even though we didn't define the `response` variable in the previous cell, we can still access it? This is because **Jupyter notebook is stateful**. This means that variables defined in one cell can be accessed in another cell, given that the cell has been run. 

This is a very useful feature of Jupyter notebook. However, it can also lead to confusion. Always remember to run the cells in order to avoid errors.

## Query Parameters

Before jumping into the code, let's review some basic concepts. When you visit a website, you often see a URL with a `?` followed by some parameters. These parameters are called query parameters. They are used to send data to the server. 

For example, the URL `https://www.example.com/search?q=python` contains a query parameter `q` with the value `python`. This URL is used to search for the term `python` on the website.

You can also use query parameters when sending HTTP requests using the `requests` library. You can pass the query parameters as a dictionary to the `params` parameter of the `requests.get()` function. Here is an example:

In [3]:
base_link = "https://bicol-u.edu.ph/"
search_term = "computer science"
params = {"s": search_term}

response = requests.get(base_link, params=params)
response.url

'https://bicol-u.edu.ph/?s=computer+science'

Time to break down the code:

1. We define the `base_link` variable, which contains the URL of the website we want to scrape. We also define the `search_term` variable, which contains the search term we want to use.

2. We define the `params` variable, which contains the query parameters we want to send to the server. In this case, we want to send the search term to the server.

3. We call the `requests.get()` function with the `base_link` and `params` variables as arguments. This function sends an HTTP GET request to the web server with the query parameters and returns the response. The response is then stored in the `response` variable.

4. Finally, we print the `response.url` variable to see the URL that was used to send the request. This URL will contain the query parameters.

<p style="color: darkorange">Note: If you're wondering, we did not import the `requests` library again. This is because we already imported it in the first cell. Since Jupyter notebook is stateful, we can access the `requests` library without importing it again.</p>