What is Web Scraping ?

Web Scraping (also termed as Scraping, Data Extraction, Data Harvesting, etc.) is a technique used to extract data from the websites. Sometimes web scraping can be very useful wherein we can get the data that we are looking for straight from the web, but sometimes it a bad way to do it, because it’s like stealing the precious data from the website without their permission, but limit your scraping process to once or twice so that this can avoid you from falling in trouble.

The most useful libraries required for web scraping are:
   1. Beautiful Soup.

   2. Requests.

# Step 1: Importing the required libraries

In [1]:
import requests
from bs4 import BeautifulSoup

# Step 2: Getting the URL and storing it in a variable

Let us store the URL of the professor in the variable named “url”. The URL of the website can be found here: “Rate My Professor”.

In [2]:
url = 'https://www.ratemyprofessors.com/ShowRatings.jsp?tid=941931'

# Step 3: Making a request to the website using the requests library.

Here we use the requests library by passing “url” as a parameter, be careful don’t run this multiple times. If you get like Response 200 then its success, if you get something else then there is something wrong with maybe the code or your browser I don’t know.

In [3]:
page = requests.get(url)

In [4]:
page

<Response [200]>

# Step 4: Using the Beautiful Soup library to get the HTML (raw) data from the website.

Here we use the BeautifulSoup by passing the page.text as a parameter and using the HTML parser. You can try to print the soup, but printing the soup doesn’t give you the answer, rather it contains huge chunks of HTML data, so I decided not to show it here.

In [5]:
soup = BeautifulSoup(page.text, "html.parser")

# Step 5: Using soup.findAll method to get the respected tag that we are looking for.

Here is the place where you shall add the tags that you are looking for, to get the tag name all you have to do is to right click on the respected tag or click Ctrl-Shift-I on the tag in the webpage, then a page with selected tag will open for you to your right-hand side as shown below:

You can then copy the HTML tag and class if any, and then place it inside the soup.findAll method. In this case, the HTML tag is “span” and class is “tag-box-choosetags”

In [6]:
proftags = soup.findAll("span", {"class": "Tag-bs9vf4-0" })
proftags

[<span class="Tag-bs9vf4-0 hHOVKF">Tough grader</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Lots of homework</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Skip class? You won't pass.</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Beware of pop quizzes</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Caring</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Graded by few things</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Tough grader</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Skip class? You won't pass.</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Test heavy</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Tough grader</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Respected</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Skip class? You won't pass.</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Caring</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Respected</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Hilarious</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Amazing lectures</span>,
 <span class="Tag-bs9vf4-0 hHOVKF">Respected</span>,
 <sp

# Step 6: Removing all the HTML tags and converting it to a plain text format.

Here we remove all the HTML tags and convert it to a text format, this can be done with the help of get_text method placed inside a for loop. This converts the HTML into the text format.

In [7]:
for mytag in proftags:
    print(mytag.get_text())

Tough grader
Lots of homework
Skip class? You won't pass.
Beware of pop quizzes
Caring
Graded by few things
Tough grader
Skip class? You won't pass.
Test heavy
Tough grader
Respected
Skip class? You won't pass.
Caring
Respected
Hilarious
Amazing lectures
Respected
TEST HEAVY
Amazing lectures
Inspirational
Hilarious
Caring
Tough Grader
Skip class? You won't pass.
LOTS OF HOMEWORK
Tough Grader
Skip class? You won't pass.
LOTS OF HOMEWORK
Respected
Tough Grader
Skip class? You won't pass.
BEWARE OF POP QUIZZES
LOTS OF HOMEWORK
GROUP PROJECTS
Tough Grader
Skip class? You won't pass.
Caring
GRADED BY FEW THINGS
GROUP PROJECTS
LECTURE HEAVY
Skip class? You won't pass.
Caring


Hence we got the above information that we were looking for. We got all the tags of the professor. This is how we scrape the data from the internet by using Requests and Beautiful Soup libraries.