<a href="https://colab.research.google.com/github/pbeens/python/blob/master/York_AQ_ABQ_Courses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#ToDo:

This program scans all the URLs to compile a list of all AQ/ABQ courses offered by York, then scans each of those courses to see if they are offered in the prescribed term. The name and URL of each course offered that term is then stored locally in a webpage. 

GitHub URL: https://github.com/pbeens/python/blob/master/York_AQ_ABQ_Courses.ipynb

Colab URL: https://colab.research.google.com/drive/1BKyub3iYTUgvGmmami1hl1YEQHPjP6Zs

York has a main page with links to category pages
https://www.yorku.ca/edu/professional-learning/aq-abq-pqp-courses/

* 3 Part AQs
* ABQs
* Schedule C AQs
* Honour Specialist AQs
* PQPs

Each category page then has links to the course pages. There are course pages for each offering, so we need to search for the correct registration deadlines. The course name is extracted from the H1 tag.

Using this tutorial for guidance:
https://www.dataquest.io/blog/web-scraping-python-using-beautiful-soup/

In [1]:
# imports
from bs4 import BeautifulSoup
import urllib.request

In [2]:
main_url = 'https://www.yorku.ca/edu/professional-learning/aq-abq-pqp-courses/'

page = urllib.request.urlopen(main_url)

#
# ### warm fuzzy stuff - comment out as appropriate ###
#
# return_code = page
# print(f'return_code = {return_code}')

# content = page.content
# print(content)

# convert page to soup object
soup = BeautifulSoup(page, "html.parser")

# ### warm fuzzy stuff - comment out as appropriate ###
# print(soup.prettify())
# print(list(soup.children))
# print([type(item) for item in list(soup.children)])



In [3]:
#
# Get the category links 
#
category_links = []

print(f'Finding category links in {main_url}... ')

for link in soup.findAll('a'):
    s = str(link.get('href'))
    if s.find('pdis/web/') > 0:
        category_links.append(s)

category_links.sort()

# warm fuzzy
# for link in category_links:
#     print(link)

print('Category links found.')

Finding category links in https://www.yorku.ca/edu/professional-learning/aq-abq-pqp-courses/... 
Category links found.


In [4]:
# 
# Get the course links from each category link
#
course_links = []

for category_link in category_links:
    page = urllib.request.urlopen(category_link)
    soup = BeautifulSoup(page, "html.parser")
    for link in soup.findAll('a'):
        s = str(link.get('href'))
        if s.find('/pdis/course/') > 0 and not s.endswith('/'):
            course_links.append(s)

course_links = list(set(course_links)) # delete dupes
course_links.sort()

# warm fuzzy
# for link in course_links:
#     print(link)

print('Course links found.')

Course links found.


In [5]:
#
# Process all the course links to find which courses are running
#
s_to_find = 'Registration Deadline: Jan 11, 2022'
courses_this_term = []
count = 0

print('Finding courses this term...')

for course_link in course_links:
    # print(course_link)
    page = urllib.request.urlopen(course_link)
    soup = BeautifulSoup(page, "html.parser")
    text = soup.get_text()
    if text.count(s_to_find) > 0:
        courses_this_term.append(course_link)
        count += 1
        print(count, course_link)

print(f'\nFound {count} courses this term.')

Finding courses this term...
1 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-intermediate-division/vw22ind1
2 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-senior-division/tcvw22sdr1
3 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-senior-division/vw22sdr1
4 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/english-intermediate-division/yw22ine1
5 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/english-senior-division/yw22sbe1
6 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/first-nations-m-tis-and-inuit-studies-intermediate-division/yw22inns1
7 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/first-nations-m-tis-and-inuit-studies-senior-division/yw22sns1
8 http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/geography-intermediate-division/yw22ing1
9 http://apps.edu.yorku.ca/pdis/

In [6]:
term_course_dict = {}
for course_link in courses_this_term:
    page = urllib.request.urlopen(course_link)
    soup = BeautifulSoup(page, "html.parser")
    h1_tag = soup.find('h1').get_text()
    term_course_dict[h1_tag] = course_link
    print(f'{h1_tag}\n\t{course_link}')

Intermediate Basic (Dramatic Arts)
	http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-intermediate-division/vw22ind1
Senior Basic (Dramatic Arts)
	http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-senior-division/tcvw22sdr1
Senior Basic (Dramatic Arts)
	http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-senior-division/vw22sdr1
Intermediate Basic (English)
	http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/english-intermediate-division/yw22ine1
Senior Basic (English)
	http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/english-senior-division/yw22sbe1
Intermediate Basic (First Nations, Metis and Inuit Studies)
	http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/first-nations-m-tis-and-inuit-studies-intermediate-division/yw22inns1
Senior Basic (First Nations, Metis and Inuit Studies)
	http://apps.edu.yorku.ca/pdis/course/additional-basic

In [7]:
# test section to test the term_courses dict
for (k, v) in term_course_dict.items():
  print(f'{k}: {v}')

Intermediate Basic (Dramatic Arts): http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-intermediate-division/vw22ind1
Senior Basic (Dramatic Arts): http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/dramatic-arts-senior-division/vw22sdr1
Intermediate Basic (English): http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/english-intermediate-division/yw22ine1
Senior Basic (English): http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/english-senior-division/yw22sbe1
Intermediate Basic (First Nations, Metis and Inuit Studies): http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/first-nations-m-tis-and-inuit-studies-intermediate-division/yw22inns1
Senior Basic (First Nations, Metis and Inuit Studies): http://apps.edu.yorku.ca/pdis/course/additional-basic-qualifications/first-nations-m-tis-and-inuit-studies-senior-division/yw22sns1
Intermediate Basic (Geography): http://apps.edu.yorku.ca/pdis