<a href="https://colab.research.google.com/github/jeffreyong15/Counsel.NLP/blob/main/Baseline%20Experiment/Data%20Collection/Data_Collection_Jeffrey.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Synthetic Academic Advising Dataset

In [None]:
import pandas as pd
import random

In [None]:
# Define categories and templates for prompts and responses
data_templates = {
    "Prerequisites": [
        ("What are the prerequisites for {course}?", "You need to complete {course1} and {course2}."),
        ("Can I take {course} without {course1}?", "No, you need to complete {course1} first."),
        ("Are there prerequisites for {course}?", "Yes, you need to complete {course1} and {course2}.")
    ],
    "Graduation Requirements": [
        ("How many credits do I need to graduate?", "You need a total of {credits} credits."),
        ("What are the core requirements for graduation?", "You must complete core courses in Math, Science, and English."),
        ("Do I need elective credits to graduate?", "Yes, you need at least {elective_credits} elective credits.")
    ],
    "Academic Support": [
        ("Where can I find tutoring services?", "Tutoring services are available at the Academic Resource Center."),
        ("Is there a study group for {course}?", "Yes, check the bulletin board for study group information for {course}."),
        ("How can I get help with assignments?", "You can get help from tutors and your course TA.")
    ],
    "Course Scheduling": [
        ("When is {course} offered?", "{course} is offered every {semester}."),
        ("Are summer courses available?", "Yes, summer courses are available for selected subjects."),
        ("How do I register for next semester?", "You can register through the online portal starting in October.")
    ],
    "Changing Major": [
        ("How can I change my major?", "Meet with an academic advisor to discuss changing your major."),
        ("What are the steps to change my major?", "Fill out a change of major form and get approval from your advisor."),
        ("Can I switch to a double major?", "Yes, you can discuss this option with your advisor.")
    ],
    "Academic Policies": [
        ("What is the grading scale?", "The grading scale is A, B, C, D, and F."),
        ("What happens if I fail a course?", "You should meet with your advisor to discuss options."),
        ("Can I retake a course for a better grade?", "Yes, you can retake a course, and the new grade will replace the old one.")
    ],
    "Senior Project Requirements": [
        ("When should I take the Senior Project course?", "The Senior Project course should be taken in your final semester."),
        ("What is required for the Senior project?", "The Senior project requires a comprehensive research or practical project."),
        ("Is there a prerequisite for the Senior Project course?", "Yes, you need to complete all core courses before the Senior Project course.")
    ]
}

In [None]:
# Generate random values
def random_course_code():
    return f"CS{random.randint(100, 499)}"

def random_credits():
    return random.choice([120, 130, 140])

def random_semester():
    return random.choice(["Fall", "Spring", "Fall and Spring", "Summer"])

In [None]:
num_samples = 10000
rows = []

for _ in range(num_samples):
    category = random.choice(list(data_templates.keys()))
    query_template, response_template = random.choice(data_templates[category])

    course = random_course_code()
    course1 = random_course_code()
    course2 = random_course_code()
    credits = random_credits()
    elective_credits = random.choice([20, 30, 40])
    semester = random_semester()

    query = query_template.format(
        course=course,
        course1=course1,
        course2=course2,
        credits=credits,
        elective_credits=elective_credits,
        semester=semester
    )
    response = response_template.format(
        course=course,
        course1=course1,
        course2=course2,
        credits=credits,
        elective_credits=elective_credits,
        semester=semester
    )

    rows.append((query, response, category))

df = pd.DataFrame(rows, columns=["Prompt", "Response", "Category"])

output_path = "academic_advising_data.csv"
df.to_csv(output_path, index=False)

In [None]:
output_path

'academic_advising_data.csv'

## Real Student Academic Dataset

#### Install Required Libraries

In [1]:
!echo | sudo add-apt-repository ppa:saiarcot895/chromium-beta
!sudo apt remove chromium-browser
!sudo snap remove chromium
!sudo apt install chromium-browser -qq
# Chromium (an open-source version of Chrome) and Chromium WebDriver (which allows Selenium to control Chromium).

PPA publishes dbgsym, you may need to include 'main/debug' component
Repository: 'deb https://ppa.launchpadcontent.net/saiarcot895/chromium-beta/ubuntu/ jammy main'
Description:
This PPA contains the latest Chromium Beta builds, with hardware video decoding enabled (hidden behind a flag), and support for Widevine (needed for viewing many DRM-protected videos) enabled.

== Hardware Video Decoding ==

To enable hardware video decoding, start Chromium with the --enable-features=VaapiVideoDecoder argument. To make this persistent, create a file at /etc/chromium-browser/customizations/92-vaapi-hardware-decoding with the following contents:

CHROMIUM_FLAGS="${CHROMIUM_FLAGS} --enable-features=VaapiVideoDecoder"

See also https://wiki.archlinux.org/title/Chromium#Hardware_video_acceleration for more information on VAAPI video decoding support.

=== Widevine Support ===

The packages in this PPA have support for Widevine inside Chromium enabled. However, you still need to copy some files from 

In [2]:
!pip3 install selenium --quiet
!apt-get update
!apt install chromium-chromedriver -qq
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
#Selenium requires a browser driver (in this case, chromedriver) to communicate with the browser. You're installing it using the chromium-chromedriver package and copying it to /usr/bin for easy access.

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/9.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/9.5 MB[0m [31m58.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━[0m [32m5.9/9.5 MB[0m [31m86.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m9.3/9.5 MB[0m [31m87.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m9.5/9.5 MB[0m [31m85.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m65.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m486.3/486.3 kB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cu

In [3]:
!pip install selenium
!apt-get update
!apt-get install -y chromium-chromedriver

Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/saiarcot895/chromium-beta/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.lis

#### Import Library

In [6]:
import time
import sys
import warnings
import json
import glob
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from urllib.parse import urljoin
from bs4.element import NavigableString, Tag

#### Load the chrome webdriver

In [80]:
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
# download the selenium chromedriver executable file and paste the link in the following code
# this code should open a new chrome window in your machine
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_service = ChromeService(
    executable_path='/usr/lib/chromium-browser/chromedriver',
    log_path='/dev/null'  # You can change the log path as needed
)
driver = webdriver.Chrome(service=chrome_service,options=chrome_options)
#The ChromeService class sets up the path to the chromedriver

#### Data Scraping and Collection

In [84]:
# Extract course description
def extract_description(course_table):
    description = "Description not found"
    hr_tag = course_table.find('hr')

    if hr_tag:
        description_parts = []
        content_div = course_table.find('div', {'class': None, 'style': None})

        if content_div:
            current = content_div.find('hr').next_sibling

            # Skip irrelevant siblings
            while current and (
                (isinstance(current, NavigableString) and not current.strip()) or
                (isinstance(current, Tag) and current.name == 'em' and 'unit' in current.text.lower())
            ):
                current = current.next_sibling

            # Collect description until encountering a stopping keyword
            stop_keywords = ['Lecture', 'Prerequisite(s)', 'Corequisite(s)', 'Grading', 'Notes(s)']

            while current:
                if isinstance(current, NavigableString):
                    text = current.strip()
                    if text:
                        description_parts.append(text)
                elif isinstance(current, Tag):
                    if current.name == 'br':
                        next_sibling = current.next_sibling
                        while isinstance(next_sibling, NavigableString) and not next_sibling.strip():
                            next_sibling = next_sibling.next_sibling
                        if isinstance(next_sibling, Tag) and next_sibling.name == 'strong':
                            if any(keyword in next_sibling.text for keyword in stop_keywords):
                                break
                    elif current.name == 'strong' and any(keyword in current.text for keyword in stop_keywords):
                        break
                    elif current.name not in ['em', 'strong']:
                        if current.text and 'unit' not in current.text.lower():
                            description_parts.append(current.text.strip())

                current = current.next_sibling

            if description_parts:
                description = ' '.join(description_parts).strip()

    return description

# Extract course units
def extract_units(course_table):
    hr_tag = course_table.find('hr')
    if hr_tag:
        unit_ems = hr_tag.find_next_siblings('em', limit=2)
        if len(unit_ems) >= 2:
            return f"{unit_ems[0].text.strip()} {unit_ems[1].text.strip()}"
    return 'Units not found'

# Extract class structure (lecture/lab hours)
def extract_class_structure(course_table):
    lecture_lab = course_table.find('em', string=lambda x: x and ('hour' in x.lower() or 'lab' in x.lower()))
    return lecture_lab.text.strip() if lecture_lab else 'Class structure not found'

# Extract only prerequisites
def extract_prerequisites(course_table):
    prerequisites = []

    for strong_tag in course_table.find_all('strong'):
        text = strong_tag.get_text(strip=True)
        if "Prerequisite(s)" in text:
            next_elem = strong_tag.next_sibling
            while next_elem and not (isinstance(next_elem, type(strong_tag)) and next_elem.name == 'strong'):
                if isinstance(next_elem, str):
                    prerequisites.append(next_elem.strip())
                elif hasattr(next_elem, 'get_text'):
                    prerequisites.append(next_elem.get_text(strip=True))
                next_elem = next_elem.next_sibling

    return " ".join(prerequisites).replace(" .", "").strip() if prerequisites else "No prerequisites listed"

# Extract only corequisites
def extract_corequisites(course_table):
    corequisites = []

    for strong_tag in course_table.find_all('strong'):
        text = strong_tag.get_text(strip=True)
        if "Corequisite(s)" in text and "Pre/Corequisite(s)" not in text:
            next_elem = strong_tag.next_sibling
            while next_elem and not (isinstance(next_elem, type(strong_tag)) and next_elem.name == 'strong'):
                if isinstance(next_elem, str):
                    corequisites.append(next_elem.strip())
                elif hasattr(next_elem, 'get_text'):
                    corequisites.append(next_elem.get_text(strip=True))
                next_elem = next_elem.next_sibling

    return " ".join(corequisites).replace(" .", "").strip() if corequisites else "No corequisites listed"

# Extract Pre/Corequisite(s)
def extract_pre_corequisites(course_table):
    pre_corequisites = []

    for strong_tag in course_table.find_all('strong'):
        text = strong_tag.get_text(strip=True)
        if "Pre/Corequisite(s)" in text:
            next_elem = strong_tag.next_sibling
            while next_elem and not (isinstance(next_elem, type(strong_tag)) and next_elem.name == 'strong'):
                if isinstance(next_elem, str):
                    pre_corequisites.append(next_elem.strip())
                elif hasattr(next_elem, 'get_text'):
                    pre_corequisites.append(next_elem.get_text(strip=True))
                next_elem = next_elem.next_sibling

    return " ".join(pre_corequisites).replace(" .", "").strip() if pre_corequisites else "No pre/corequisites listed"

# Extract grading information
def extract_grading(course_table):
    grading_tag = course_table.find('strong', string=lambda x: x and 'Grading' in x)
    if grading_tag:
        next_elem = grading_tag.next_sibling
        while next_elem and isinstance(next_elem, NavigableString):
            grading_text = next_elem.strip()
            if grading_text:
                return grading_text  # Return only the grading information without the label
            next_elem = next_elem.next_sibling
    return 'Grading info not found'

def expand_all_course_links(driver):
    course_links = driver.find_elements(By.XPATH, "//td[@class='width']/a[contains(@onclick, 'showCourse')]")

    for course_link in course_links:
        try:
            # Click each course link to expand
            ActionChains(driver).move_to_element(course_link).click().perform()
            time.sleep(2)
        except Exception as e:
            print(f"Error clicking course link: {e}")
            continue

def get_total_pages(soup):
    try:
        all_tds = soup.find_all('td')
        page_td = None
        for td in all_tds:
            if 'Page:' in td.get_text():
                page_td = td
                break

        if page_td:
            page_links = page_td.find_all('a')
            if page_links:
                last_page = max(int(link.text.strip()) for link in page_links if link.text.strip().isdigit())
                return last_page
    except Exception as e:
        print(f"Error in page detection: {str(e)}")
    return 1

def count_course_pages(driver, course_filters):
    page_counts = {}

    for course_filter in course_filters:
        # Construct the URL
        url = (f"https://catalog.sjsu.edu/content.php?"
               f"catoid=15&navoid=5382&filter%5B27%5D={course_filter}"
               f"&filter%5Bexact_match%5D=1&filter%5Bitem_type%5D=3"
               f"&filter%5Bonly_active%5D=1")

        try:
            driver.get(url)
            time.sleep(3)

            html = driver.page_source
            soup = BeautifulSoup(html, 'html.parser')

            total_pages = get_total_pages(soup)
            page_counts[course_filter] = total_pages
            print(f"Found {total_pages} pages for {course_filter}")

        except Exception as e:
            print(f"Error processing {course_filter}: {str(e)}")
            page_counts[course_filter] = 1

    return page_counts

def extract_course_details(driver):
    course_details = []

    expand_all_course_links(driver)

    soup = BeautifulSoup(driver.page_source, 'html.parser')
    course_tables = soup.find_all('table', class_='td_dark')

    for course_table in course_tables:
        raw_title = course_table.find('h3').text.strip() if course_table.find('h3') else 'Title not found'
        clean_title = " ".join(raw_title.split())  # Removes weird spaces/non-breaking spaces
        details = {
            'title': clean_title,
            'units': extract_units(course_table),
            'description': extract_description(course_table),
            'class_structure': extract_class_structure(course_table),
            'prerequisite(s)': extract_prerequisites(course_table),
            'corequisite(s)': extract_corequisites(course_table),
            'pre/corequisite(s)': extract_pre_corequisites(course_table),
            'grading': extract_grading(course_table)
        }

        course_details.append(details)

    return course_details

def extract_courses_details(driver, course_filters):  # '-1' for all courses
    all_course_details = []

    # Get the total pages for each course filter
    page_counts = count_course_pages(driver, course_filters)

    for course_filter in course_filters:
        # Get the total pages for the current course filter
        total_pages = page_counts.get(course_filter, 1)

        for page in range(1, total_pages + 1):
            url = (f"https://catalog.sjsu.edu/content.php?catoid=15&navoid=5382&filter%5B27%5D={course_filter}"
                   f"&filter%5Bcpage%5D={page}&filter%5Bexact_match%5D=1&filter%5Bitem_type%5D=3&filter%5Bonly_active%5D=1")

            driver.get(url)
            time.sleep(3)

            course_details = extract_course_details(driver)
            all_course_details.extend(course_details)

    pd.set_option('display.max_colwidth', None)
    pd.set_option('display.width', 1000)
    df = pd.DataFrame(all_course_details)

    df['id'] = range(1, len(df) + 1)
    df = df[['id'] + [col for col in df.columns if col != 'id']]  # Reorder columns to move 'id' to the front

    return df

def extract_all_course_details(driver, course_filters=['-1'], start_page=1, end_page=10, save_interval=10):
    all_course_details = []
    page_counts = count_course_pages(driver, course_filters)

    for course_filter in course_filters:
        total_pages = page_counts.get(course_filter, 1)

        # Ensure the end_page doesn't exceed the total available pages
        end_page = min(end_page, total_pages)

        for page in range(start_page, end_page + 1):
            print(f"Scraping page {page} now...")

            url = (f"https://catalog.sjsu.edu/content.php?catoid=15&navoid=5382&filter%5B27%5D={course_filter}"
                   f"&filter%5Bcpage%5D={page}&filter%5Bexact_match%5D=1&filter%5Bitem_type%5D=3&filter%5Bonly_active%5D=1")

            driver.get(url)
            time.sleep(3)

            course_details = extract_course_details(driver)
            all_course_details.extend(course_details)

            # Save progress after every 'save_interval' pages
            if (page - start_page + 1) % save_interval == 0 or page == end_page:
                partial_df = pd.DataFrame(all_course_details)
                partial_df['id'] = range(1, len(partial_df) + 1)
                partial_df = partial_df[['id'] + [col for col in partial_df.columns if col != 'id']]

    # Display settings
    pd.set_option('display.max_colwidth', None)
    pd.set_option('display.width', 1000)

    # Create and format the final DataFrame
    df = pd.DataFrame(all_course_details)
    df['id'] = range(1, len(df) + 1)
    df = df[['id'] + [col for col in df.columns if col != 'id']]  # Reorder columns to move 'id' to the front

    return df

def save_to_json(df, filename="SJSU_courses_dataset.json"):
    course_json = df.to_dict(orient='records')

    with open(filename, 'w') as f:
        json.dump(course_json, f, indent=4)

    print(f"Data saved to {filename}")

In [85]:
courses_df = extract_courses_details(driver, course_filters=['AE'])
courses_df_display = courses_df.drop(columns='id')
courses_df_display

Found 1 pages for AE


Unnamed: 0,title,units,description,class_structure,prerequisites,corequisites,pre_corequisites,grading
0,"AE 15 - Air & Space Flight: Past, Present, and Future",1 unit(s),"Introduction to the history, basic principles, current and future developments of the aerospace engineering field through projects, guest speakers and field trips.",Class structure not found,No prerequisites listed,No corequisites listed,No pre/corequisites listed,Letter Graded
1,AE 20 - Computer-Aided Design for Aerospace Engineers,2 unit(s),Introduction to the fundamentals of drafting and computer-aided design with applications in aircraft and spacecraft design.,Lecture 1 hour/Lab 3 hours,No prerequisites listed,No corequisites listed,No pre/corequisites listed,Letter Graded
2,AE 30 - Computer Programming for Aerospace Engineers,2 unit(s),"C language: Variables, data types, operators, functions, modular programming, input/output sequence, pointers and memory addressing, external libraries, dynamic memory allocation. MATLAB: Variables, scripts, operations, visualization, plotting and programming. Equation solving and curve fitting. Symbolics, Simulink and I/O building block.",Lecture 1 hour/Lab 3 hours,No prerequisites listed,No corequisites listed,No pre/corequisites listed,Letter Graded
3,AE 92 - International Program Studies,1-12 unit(s),"Study Abroad and Away transfer credit course. Study Abroad and Away provides students the opportunity to study abroad on long term programs (Exchange Programs, CSU International Programs, and International Student Exchange Programs) and short-term programs (Faculty-Led Programs and Summer School Abroad Programs) for academic credit, offering Alternative Break Programs for cultural immersion, and designing other globally focused opportunities. This course is designated as a placeholder course for Study Abroad and Away programs.",Class structure not found,No prerequisites listed,No corequisites listed,No pre/corequisites listed,Mixed Grading
4,AE 100 - Fundamentals of Aerospace Engineering,3 unit(s),"Introduction to the fundamental disciplines and concepts of aerospace engineering and in particular of aerodynamics, aerospace structures, stability and control, propulsion, and flight mechanics.",Class structure not found,"C or better in ( MATH 30 or MATH 30X ) and PHYS 50 , or graduate standing.",No corequisites listed,No pre/corequisites listed,Letter Graded
...,...,...,...,...,...,...,...,...
56,AE 295B - Aerospace Engineering Masters Project II,3 unit(s),"This is a second-semester aerospace engineering Master’s Project supervision course, following AE295A. Students perform original, graduate level research and/or design and/or development, involving aerospace systems or subsystems under the supervision of an aerospace engineering faculty member and/or engineers from NASA / aerospace industry.",Class structure not found,“B” or better in AE 295A,No corequisites listed,No pre/corequisites listed,Letter Graded/RP.
57,AE 297 - Special Topics in Aerospace Engineering,1-4 unit(s),Special topics that are currently of interest to industry and academia. Content varies from semester to semester. Repeatable for up to six units.,Class structure not found,Graduate Standing or instructor consent,No corequisites listed,No pre/corequisites listed,Letter Graded
58,AE 298 - Special Projects in Aerospace Engineering,1-3 unit(s),Advanced individual work in Aerospace Engineering.,Class structure not found,Instructor consent.,No corequisites listed,No pre/corequisites listed,Credit/No Credit/RP.
59,AE 299 - Aerospace Engineering Masters Thesis,3 unit(s),Master’s thesis work in aerospace engineering.,Class structure not found,“CR” in 1st semester to continue in 2nd semester.,No corequisites listed,No pre/corequisites listed,Mandatory Credit/No Credit/RP


In [86]:
save_to_json(courses_df, filename="SJSU_courses_dataset.json")

Data saved to SJSU_courses_dataset.json


In [73]:
# Run for pages 1 to 10
courses_df1_10 = extract_all_course_details(driver, start_page=1, end_page=10)
courses_df1_10_display = courses_df1_10.drop(columns='id')
courses_df1_10_display

Found 54 pages for -1
Scraping page 1 now...
Scraping page 2 now...
Scraping page 3 now...
Scraping page 4 now...
Scraping page 5 now...
Scraping page 6 now...
Scraping page 7 now...
Scraping page 8 now...
Scraping page 9 now...
Scraping page 10 now...


Unnamed: 0,title,units,description,class_structure,prerequisites/corequisite,grading
0,KIN 1 - Adapted Physical Activities,1 unit(s),"Structured individualized physical activities to enhance physical/motor fitness and develop an active, health-oriented lifestyle for students unable to participate in the general activity program.",Class structure not found,No prerequisites/corequisites listed,Letter Graded
1,KIN 2A - Beginning Swimming,1 unit(s),This course is designed for the non-swimmer and beginning swimmer. It is assumed that all students enrolled in the class have had little or no experience in learning the basic skills of swimming. The course is designed to instruct the student in the basic skills necessary to enable him/her to swim safely in deep water. There are no prerequisites for the course.,Class structure not found,No prerequisites/corequisites listed,Letter Graded
2,KIN 2B - Intermediate Swimming,1 unit(s),This course is designed to meet the needs of students who have satisfactorily completed the skills involved in beginning swimming.,Class structure not found,Beginning level or its equivalent.,Letter Graded
3,KIN 2C - Advanced Swimming,1 unit(s),This course is designed to refine and extend the development of advanced skills in swimming.,Class structure not found,Intermediate level or its equivalent.,Letter Graded
4,KIN 3 - Water Polo,1 unit(s),"Fundamental skills, techniques, strategies, rules, and knowledge necessary to safely and correctly play water polo.",Class structure not found,Beginning level swimming proficiency.,Letter Graded
...,...,...,...,...,...,...
995,BUS 298I - Applied Business Experience Internship,1 unit(s),For the student with a specific internship providing a quality experience that reinforces the curriculum and involves meaningful work. The student must submit a one-page formal proposal to the graduate program director. A final report is required. The internship must qualify as Curricular Practical Training (CPT) for international students.,Class structure not found,Approved advancement to candidacy.,Mandatory Credit/No Credit/RP
996,BUS 299 - Master’s Thesis,1-4 unit(s),Master’s Thesis Plan A.,Class structure not found,Approval of the instructor and advancement to candidacy. Not available to Open University Students,Mandatory Credit/No Credit/RP
997,"HSPM 1 - Travel to Learn, Learn to Travel",3 unit(s),"Course examines the relations among tourists, locals, and the tourism industry and how the global tourism industry facilitates diverse travelers¿ travel experience from beginning to end. Focus on the industry¿s history, growth, development, impacts, trends, technology and career opportunity.",Class structure not found,No prerequisites/corequisites listed,Letter Graded.
998,HSPM 11 - Restaurant Entrepreneurship,3 unit(s),"The comprehensive process of conceptualizing, planning, starting, and managing a restaurant business. The topics cover business planning, operations, menu planning, staffing, marketing, and customer service.",Class structure not found,No prerequisites/corequisites listed,Letter Graded.


In [74]:
save_to_json(courses_df1_10, filename="SJSU_courses_dataset(1-10).json")

Data saved to SJSU_courses_dataset(1-10).json


In [None]:
# Run for pages 11 to 20 in another script
courses_df2 = extract_all_course_details(driver, start_page=11, end_page=20)
courses_df2_display = courses_df2.drop(columns='id')
courses_df2_display

In [None]:
# Run for pages 21 to 30 in another script
courses_df2 = extract_all_course_details(driver, start_page=11, end_page=20)
courses_df2_display = courses_df2.drop(columns='id')
courses_df2_display

In [None]:
# Run for pages 31 to 40 in another script
courses_df2 = extract_all_course_details(driver, start_page=11, end_page=20)
courses_df2_display = courses_df2.drop(columns='id')
courses_df2_display

In [None]:
# Run for pages 41 to 50 in another script
courses_df2 = extract_all_course_details(driver, start_page=11, end_page=20)
courses_df2_display = courses_df2.drop(columns='id')
courses_df2_display