<a href="https://colab.research.google.com/github/pbeens/python/blob/master/Western_AQ_ABQ_Courses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This program retrieves all the AQ and ABQ courses from the UWO website and stores them in an HTML file.

GitHub URL: https://github.com/pbeens/python/blob/master/Western_AQ_ABQ_Courses.ipynb

Colab URL: https://colab.research.google.com/drive/1DfF86AtXKKhniCdes0IDHIAaz75m4hvW 

In [1]:
# imports
from bs4 import BeautifulSoup
import urllib.request

In [2]:
# global variable(s)
url = 'https://www.aspire.uwo.ca/aq-courses/index.html'
term = 'Fall 2021'
subjects = ['Business Studies', 'Co-Operative Education', 'Dramatic Arts', 
            'Family Studies', 'French as a Second Language', 
            'Guidance and Career Education', 'Honour Specialist', 
            'Integration of Information and Computer Technology in Instruction',
            'Intermediate Division', 'Junior', 'Kindergarten', 'Mathematics',
            'Music', 'Primary', 'Reading', 'Religious Education in Catholic Schools',
            'Senior Division', 'Special Education', 'Teacher Librarian',
            'Teaching Children with Communication Needs (Autism Spectrum Disorders)',
            'Teaching English Language Learners', 'Teaching Students Who are Blind/Low Vision',
            'Visual Arts']

# grab each subject URL
subject_links = []

html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page, "html.parser")

for link in soup.findAll('a', href=True):
    if link.text.strip() in subjects:
        subject_links.append('https://www.aspire.uwo.ca/aq-courses/' + link['href'])

# grab each course URL
all_courses = []
for link in subject_links:
  print(f'Grabbing courses from {link}... ')
  html_page = urllib.request.urlopen(link)
  soup = BeautifulSoup(html_page, "html.parser")

  # only find the links with publicCourseSearchDetails in the link
  for link in soup.findAll('a'):
    s = str(link.get('href'))
    if s.find('publicCourseSearchDetails') > 1:
      all_courses.append(s)

print('Done.')

Grabbing courses from https://www.aspire.uwo.ca/aq-courses/business-studies.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/cooperative-education.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/dramatic-arts.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/family-studies.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/french-second-language.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/guidance-career-education.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/honour-specialist.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/integration-information-computer-technology.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/intermediate-division.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/junior-division.html... 
Grabbing courses from https://www.aspire.uwo.ca/aq-courses/kindergarten.html... 
Grabbing courses from 

In [3]:
# populate course_urls_dict 
course_urls_dict = {}

for link in all_courses:

    html_page = urllib.request.urlopen(link)
    soup = BeautifulSoup(html_page, "html.parser")

    subject = soup.select_one('span[class*=title]').text.strip()
    
    # miscellaneous fixes
    subject = subject.replace('Business Studies - ', '')
    subject = subject.replace('Honour Specialist-', 'Honour Specialist -')
    if subject.find('Science General') == -1:
        subject = subject.replace('Honour Specialist - Science', 'Honour Specialist -')

    divs = soup.find_all('div', {'class':'courseSectionSemester'}) # where the term is stored
    for div in divs:
        if len(div.text) > 0: # skip the empty ones
            if div.text.strip() == term:
                # clean up the title (course name) for use in the HTML file
                # title = str(soup.title) \
                # .replace('<title>','') \
                course_urls_dict[subject] = link
                print(f'{div.text.strip()}: {subject}') # tell us which courses were found

print('Done.')

# check the courses dictionary
# for (k,v) in course_urls_dict.items():
#     print(f'{k}: {v}')   

Fall 2021: Accounting Part 1
Fall 2021: Accounting Part 2
Fall 2021: Accounting Specialist
Fall 2021: Entrepreneurship Part 1
Fall 2021: Entrepreneurship Part 2
Fall 2021: Entrepreneurship Specialist
Fall 2021: Honour Specialist - Business Studies
Fall 2021: Information and Communications Technology Part 1
Fall 2021: Information and Communications Technology Part 2
Fall 2021: Information and Communications Technology Specialist
Fall 2021: Intermediate Division - Business Studies General
Fall 2021: Senior Division - Business Studies General
Fall 2021: Family Studies Part 1
Fall 2021: Family Studies Part 2
Fall 2021: Family Studies Specialist
Fall 2021: Honour Specialist - Family Studies
Fall 2021: Intermediate Division - Family Studies
Fall 2021: Senior Division - Family Studies
Fall 2021: French as a Second Language Part 1
Fall 2021: French as a Second Language Part 2
Fall 2021: French as a Second Language Specialist
Fall 2021: Honour Specialist - French as a Second Language
Fall 2021:

In [4]:
# create html file with course listings
file = './courses.html'
with open(file, 'w') as f:
  s = f'<HTML>\n<HEAD>\n\t<TITLE>{term}</TITLE>\n</HEAD>\n<BODY>\n'
  f.write(s)
  for course, url in course_urls_dict.items():
    f.write(f'\t<a href="{url}">{course}</a><br>\n')
  s = '</BODY>\n<HTML>'
  f.write(s)
f.close()
print(f'{file} created.')

./courses.html created.
