In this exercise, you'll practice using BeautifulSoup to parse the content of a web page. The page that you'll be scraping, https://realpython.github.io/fake-jobs/, contains job listings. Your job is to extract the data on each job and convert into a pandas DataFrame.

1. Start by performing a GET request on the url above and convert the response into a BeautifulSoup object.  
a. Use the .find method to find the tag containing the first job title ("Senior Python Developer"). Hint: can you find a tag type and/or a class that could be helpful for extracting this information? Extract the text from this title.  
b. Now, use what you did for the first title, but extract the job title for all jobs on this page. Store the results in a list.  
c. Finally, extract the companies, locations, and posting dates for each job. For example, the first job has a company of "Payne, Roberts and Davis", a location of "Stewartbury, AA", and a posting date of "2021-04-08". Ensure that the text that you extract is clean, meaning no extra spaces or other characters at the beginning or end.  
d. Take the lists that you have created and combine them into a pandas DataFrame.

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [3]:
URL = 'https://realpython.github.io/fake-jobs/'

response = requests.get(URL)

In [4]:
fakejobs = BeautifulSoup(response.text, features="html.parser")

In [5]:
print(fakejobs.prettify())

<!DOCTYPE html>
<html>
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <title>
   Fake Python
  </title>
  <link href="https://cdn.jsdelivr.net/npm/bulma@0.9.2/css/bulma.min.css" rel="stylesheet"/>
 </head>
 <body>
  <section class="section">
   <div class="container mb-5">
    <h1 class="title is-1">
     Fake Python
    </h1>
    <p class="subtitle is-3">
     Fake Jobs for Your Web Scraping Journey
    </p>
   </div>
   <div class="container">
    <div class="columns is-multiline" id="ResultsContainer">
     <div class="column is-half">
      <div class="card">
       <div class="card-content">
        <div class="media">
         <div class="media-left">
          <figure class="image is-48x48">
           <img alt="Real Python Logo" src="https://files.realpython.com/media/real-python-logo-thumbnail.7f0db70c2ed2.jpg?__no_cf_polish=1"/>
          </figure>
         </div>
         <div class="media-content">
          <h2 c

In [6]:
fakejobs.find(class_='title is-5').text

'Senior Python Developer'

b. Now, use what you did for the first title, but extract the job title for all jobs on this page. Store the results in a list.  

In [26]:
job_list=[x.text for x in fakejobs.findAll(class_='title is-5')]
job_list

['Senior Python Developer',
 'Energy engineer',
 'Legal executive',
 'Fitness centre manager',
 'Product manager',
 'Medical technical officer',
 'Physiological scientist',
 'Textile designer',
 'Television floor manager',
 'Waste management officer',
 'Software Engineer (Python)',
 'Interpreter',
 'Architect',
 'Meteorologist',
 'Audiological scientist',
 'English as a second language teacher',
 'Surgeon',
 'Equities trader',
 'Newspaper journalist',
 'Materials engineer',
 'Python Programmer (Entry-Level)',
 'Product/process development scientist',
 'Scientist, research (maths)',
 'Ecologist',
 'Materials engineer',
 'Historic buildings inspector/conservation officer',
 'Data scientist',
 'Psychiatrist',
 'Structural engineer',
 'Immigration officer',
 'Python Programmer (Entry-Level)',
 'Neurosurgeon',
 'Broadcast engineer',
 'Make',
 'Nurse, adult',
 'Air broker',
 'Editor, film/video',
 'Production assistant, radio',
 'Engineer, communications',
 'Sales executive',
 'Software Deve

c. Finally, extract the companies, locations, and posting dates for each job. For example, the first job has a company of "Payne, Roberts and Davis", a location of "Stewartbury, AA", and a posting date of "2021-04-08". Ensure that the text that you extract is clean, meaning no extra spaces or other characters at the beginning or end.  

In [20]:
job_listings = []

for job in fakejobs.findAll(class_='title is-5'):
    job_title = job.text.strip()
    company = job.find_next(class_='subtitle is-6 company').text.strip()
    location = job.find_next(class_='location').text.strip()
    date = job.find_next(class_='is-small has-text-grey').text.strip()

    job_listings.append({
        'job_title': job_title,
        'company': company,
        'location': location,
        'date': date
    })

job_listings

[{'job_title': 'Senior Python Developer',
  'company': 'Payne, Roberts and Davis',
  'location': 'Stewartbury, AA',
  'date': '2021-04-08'},
 {'job_title': 'Energy engineer',
  'company': 'Vasquez-Davidson',
  'location': 'Christopherville, AA',
  'date': '2021-04-08'},
 {'job_title': 'Legal executive',
  'company': 'Jackson, Chambers and Levy',
  'location': 'Port Ericaburgh, AA',
  'date': '2021-04-08'},
 {'job_title': 'Fitness centre manager',
  'company': 'Savage-Bradley',
  'location': 'East Seanview, AP',
  'date': '2021-04-08'},
 {'job_title': 'Product manager',
  'company': 'Ramirez Inc',
  'location': 'North Jamieview, AP',
  'date': '2021-04-08'},
 {'job_title': 'Medical technical officer',
  'company': 'Rogers-Yates',
  'location': 'Davidville, AP',
  'date': '2021-04-08'},
 {'job_title': 'Physiological scientist',
  'company': 'Kramer-Klein',
  'location': 'South Christopher, AE',
  'date': '2021-04-08'},
 {'job_title': 'Textile designer',
  'company': 'Meyers-Johnson',
  '

d. Take the lists that you have created and combine them into a pandas DataFrame.

In [22]:
Jobs = pd.DataFrame(job_listings, columns=['job_title', 'company', 'location', 'date'])
Jobs

Unnamed: 0,job_title,company,location,date
0,Senior Python Developer,"Payne, Roberts and Davis","Stewartbury, AA",2021-04-08
1,Energy engineer,Vasquez-Davidson,"Christopherville, AA",2021-04-08
2,Legal executive,"Jackson, Chambers and Levy","Port Ericaburgh, AA",2021-04-08
3,Fitness centre manager,Savage-Bradley,"East Seanview, AP",2021-04-08
4,Product manager,Ramirez Inc,"North Jamieview, AP",2021-04-08
...,...,...,...,...
95,Museum/gallery exhibitions officer,"Nguyen, Yoder and Petty","Lake Abigail, AE",2021-04-08
96,"Radiographer, diagnostic",Holder LLC,"Jacobshire, AP",2021-04-08
97,Database administrator,Yates-Ferguson,"Port Susan, AE",2021-04-08
98,Furniture designer,Ortega-Lawrence,"North Tiffany, AA",2021-04-08


2. Next, add a column that contains the url for the "Apply" button. Try this in two ways.   
    a. First, use the BeautifulSoup find_all method to extract the urls.  
    b. Next, get those same urls in a different way. Examine the urls and see if you can spot the pattern of how they are constructed. Then, build the url using the elements you have already extracted. Ensure that the urls that you created match those that you extracted using BeautifulSoup. Warning: You will need to do some string cleaning and prep in constructing the urls this way. For example, look carefully at the urls for the "Software Engineer (Python)" job and the "Scientist, research (maths)" job.

In [48]:
apply_urls = []
for button in fakejobs.find_all('a', string='Apply'):
    apply_urls.append(button['href'])

apply_urls

['https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html',
 'https://realpython.github.io/fake-jobs/jobs/energy-engineer-1.html',
 'https://realpython.github.io/fake-jobs/jobs/legal-executive-2.html',
 'https://realpython.github.io/fake-jobs/jobs/fitness-centre-manager-3.html',
 'https://realpython.github.io/fake-jobs/jobs/product-manager-4.html',
 'https://realpython.github.io/fake-jobs/jobs/medical-technical-officer-5.html',
 'https://realpython.github.io/fake-jobs/jobs/physiological-scientist-6.html',
 'https://realpython.github.io/fake-jobs/jobs/textile-designer-7.html',
 'https://realpython.github.io/fake-jobs/jobs/television-floor-manager-8.html',
 'https://realpython.github.io/fake-jobs/jobs/waste-management-officer-9.html',
 'https://realpython.github.io/fake-jobs/jobs/software-engineer-python-10.html',
 'https://realpython.github.io/fake-jobs/jobs/interpreter-11.html',
 'https://realpython.github.io/fake-jobs/jobs/architect-12.html',
 'https://realpython.gi

In [46]:
df = pd.DataFrame(apply_urls, columns = 'URL_link')

TypeError: Index(...) must be called with a collection of some kind, 'URL_link' was passed