<a href="https://colab.research.google.com/github/pbeens/python/blob/master/OISE_AQ_ABQ_Courses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This program scans all the URLs to compile a list of all AQ/ABQ courses offered by OISE, then scans each of those courses to see if they are offered in the prescribed term (see # global variables). The name and URL of each course offered that term is then stored locally in a webpage. 

GitHub URL: https://github.com/pbeens/python/blob/master/OISE_AQ_ABQ_Courses.ipynb

Colab URL: https://colab.research.google.com/drive/18DxRzxTiDYHEOQ6ZlE4qqQO8C0_2r0t-

In [1]:
# imports
from bs4 import BeautifulSoup
import urllib.request

In [2]:
# global variable(s)
urls = ['https://cpl.oise.utoronto.ca/program_certificate/abq-primary-junior/',
        'https://cpl.oise.utoronto.ca/program_certificate/abq-intermediate/',
        'https://cpl.oise.utoronto.ca/program_certificate/abq-senior/',
        'https://cpl.oise.utoronto.ca/program_certificate/one-session-additional-qualifications/',
        'https://cpl.oise.utoronto.ca/program_certificate/three-session-additional-qualifications/',
        'https://cpl.oise.utoronto.ca/program_certificate/honour-specialist/',
        'https://cpl.oise.utoronto.ca/program_certificate/technological-education/']
term = '2022 Winter'

In [16]:
# grab each URL
links = []
for url in urls:
    print(f'Grabbing links from {url}... ')
    html_page = urllib.request.urlopen(url)
    soup = BeautifulSoup(html_page, "html.parser")
    for link in soup.findAll('a'):
        s = str(link.get('href'))
        links.append(s)
links = list(set(links)) # delete dupes
links.sort()
print('Done.')

Grabbing links from https://cpl.oise.utoronto.ca/program_certificate/abq-primary-junior/... 
Grabbing links from https://cpl.oise.utoronto.ca/program_certificate/abq-intermediate/... 
Grabbing links from https://cpl.oise.utoronto.ca/program_certificate/abq-senior/... 
Grabbing links from https://cpl.oise.utoronto.ca/program_certificate/one-session-additional-qualifications/... 
Grabbing links from https://cpl.oise.utoronto.ca/program_certificate/three-session-additional-qualifications/... 
Grabbing links from https://cpl.oise.utoronto.ca/program_certificate/honour-specialist/... 
Grabbing links from https://cpl.oise.utoronto.ca/program_certificate/technological-education/... 
Done.


In [19]:
# warm fuzzy feeling that it grabbed all the courses
# also delete non-courses
all_courses = []
for link in links:
    if link.find('/course/') > 0:
        all_courses.append(link)
        print(link)

https://cpl.oise.utoronto.ca/course/biology-honour-specialist/
https://cpl.oise.utoronto.ca/course/chemistry-honour-specialist/
https://cpl.oise.utoronto.ca/course/co-operative-education-part-1/
https://cpl.oise.utoronto.ca/course/co-operative-education-part-2/
https://cpl.oise.utoronto.ca/course/co-operative-education-specialist/
https://cpl.oise.utoronto.ca/course/computer-studies-senior/
https://cpl.oise.utoronto.ca/course/construction-technology-grades-11-12/
https://cpl.oise.utoronto.ca/course/construction-technology-grades-9-10/
https://cpl.oise.utoronto.ca/course/dramatic-arts-honour-specialist/
https://cpl.oise.utoronto.ca/course/dramatic-arts-intermediate/
https://cpl.oise.utoronto.ca/course/dramatic-arts-senior/
https://cpl.oise.utoronto.ca/course/english-honour-specialist/
https://cpl.oise.utoronto.ca/course/english-intermediate/
https://cpl.oise.utoronto.ca/course/english-senior/
https://cpl.oise.utoronto.ca/course/environmental-education-part-1/
https://cpl.oise.utoronto.c

In [20]:
# inspect each page for course date (Late Summer 2021 as an example)
term_courses = {}
for course in all_courses:
  print(f'Processing {course}...')
  html_page = urllib.request.urlopen(course)
  soup = BeautifulSoup(html_page, "html.parser")
  # need to do some magic to find the term text
  divs = soup.find_all('div', {'class':'grid--auto'}) # where the term is stored
  for div in divs:
    if len(div.text) > 0: # skip the empty ones
      if div.text == term:
        # clean up the title (course name) for use in the HTML file
        title = str(soup.title) \
          .replace('<title>','') \
          .replace(' - OISE Continuing and Professional Learning</title>','')
        print(f'{term}: {title}') # tell us which courses were found
        term_courses[title] = course # add to dict of term_courses
        break # once found we can move on
print('Done.')

Processing https://cpl.oise.utoronto.ca/course/biology-honour-specialist/...
Processing https://cpl.oise.utoronto.ca/course/chemistry-honour-specialist/...
Processing https://cpl.oise.utoronto.ca/course/co-operative-education-part-1/...
2022 Winter: Co-Operative Education Part 1
Processing https://cpl.oise.utoronto.ca/course/co-operative-education-part-2/...
Processing https://cpl.oise.utoronto.ca/course/co-operative-education-specialist/...
Processing https://cpl.oise.utoronto.ca/course/computer-studies-senior/...
Processing https://cpl.oise.utoronto.ca/course/construction-technology-grades-11-12/...
Processing https://cpl.oise.utoronto.ca/course/construction-technology-grades-9-10/...
Processing https://cpl.oise.utoronto.ca/course/dramatic-arts-honour-specialist/...
Processing https://cpl.oise.utoronto.ca/course/dramatic-arts-intermediate/...
Processing https://cpl.oise.utoronto.ca/course/dramatic-arts-senior/...
Processing https://cpl.oise.utoronto.ca/course/english-honour-specialis

In [21]:
# test section to test the term_courses dict
for (k, v) in term_courses.items():
  print(f'{k}: {v}')

Co-Operative Education Part 1: https://cpl.oise.utoronto.ca/course/co-operative-education-part-1/
English - Honour Specialist: https://cpl.oise.utoronto.ca/course/english-honour-specialist/
English - Intermediate: https://cpl.oise.utoronto.ca/course/english-intermediate/
English - Senior: https://cpl.oise.utoronto.ca/course/english-senior/
First Nations, Metis and Inuit Studies - Intermediate: https://cpl.oise.utoronto.ca/course/first-nations-metis-and-inuit-studies-intermediate/
First Nations, Metis and Inuit Studies - Senior: https://cpl.oise.utoronto.ca/course/first-nations-metis-and-inuit-studies-senior/
French as a Second Language Part 1: https://cpl.oise.utoronto.ca/course/french-as-a-second-language-part-1/
French as a Second Language Part 2: https://cpl.oise.utoronto.ca/course/french-as-a-second-language-part-2/
Geography - Senior: https://cpl.oise.utoronto.ca/course/geography-senior/
Guidance and Career Education Part 1: https://cpl.oise.utoronto.ca/course/guidance-and-career-

In [22]:
# create html file with desired course listings
file = './courses.html'
with open(file, 'w') as f:
  s = f'<HTML>\n<HEAD>\n\t<TITLE>{term}</TITLE>\n</HEAD>\n<BODY>\n'
  f.write(s)
  for k, v in term_courses.items():
    f.write(f'\t<a href="{v}">{k}</a><br>\n')
  s = '</BODY>\n<HTML>'
  f.write(s)
f.close()
print(f'{file} created.')

./courses.html created.
