# Group components

|**Name**|**Surname**|**ID**|**Master Degree**|
|-------|------|-------|--------|
|Nicola|Lorenzon|2087643|Computer Engineering - AI curriculum|
|Manuel|Lovo|2122856|Computer Engineering - AI curriculum|

# Creation of the dataset





In this project, the goal is to develop a Retrieval-Augmented Generation (RAG) system that provides comprehensive information about the Master's degree courses in Computer Engineering and Bioengineering at the University of Padua (UniPD). The source of our textual data is the individual course pages provided by the university.

More specifically, information related to a single course are provided by the UniPD's page of that course.
Information regard the syllabus, so prerequisites, suggested material, exam evaluation etc, and more generic information like the code, the language of taught etc.
Moreover, it's also provided the study plan taken from Allegato 3 of courses, which gives information regarding mandatory and electivites courses, final project, internship etc.

In this initial phase, a dataset generator is created. This tool addresses various challenges such as the structural differences between web pages, and extracting data from tables and other non-standard formats. The dataset generator ensures that we can efficiently gather and organize the necessary course information for our RAG system.

## Libraries and collection of links

In this snippet of code all courses links, taken from UniPD's page of BioEngineering and Computer Engineering Courses, are collected, ready to be processed.


In [None]:
import requests #Tools for handling HTTP requests
from bs4 import BeautifulSoup #Tools for parsing HTML content
import pandas as pd #Tools for the creation of the dataframe

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#URL of courses that have to be parsed
urls = ['https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/000ZZ/INQ0091306/N0', 'https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/000ZZ/INP9087775/N0', 'https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/002PD/INQ0091582/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/000ZZ/INQ0091561/N0', 'https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD/INQ0091601/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD/INQ3103090/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD/INP7079233/G2GR1','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD/INP7079233/G2GR2','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD/INP9087774/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD/INQ0091579/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD/INQ0091311/N0','https://en.didattica.unipd.it/off/2022/LM/IN/IN2547/001PD/INP9087836/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/001PD','https://en.didattica.unipd.it/off/2023/LM/IN/IN2547/002PD','https://en.didattica.unipd.it/off/2022/LM/IN/IN2547/001PD/INQ0091301/N0','https://en.didattica.unipd.it/off/2022/LM/IN/IN2547/001PD/INP8084359/N0','https://en.didattica.unipd.it/off/2022/LM/IN/IN2547/001PD/INQ0091300/N0','https://en.didattica.unipd.it/off/2022/LM/IN/IN2547/001PD/INQ0091621/N0','https://en.didattica.unipd.it/off/2022/LM/IN/IN2547/001PD/INQ0091105/N0','https://en.didattica.unipd.it/off/2022/LM/IN/IN2547/000ZZ/INQ0091640/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/005PD/INQ0092699/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ3103402/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ1096824/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INL1001846/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INQ0091998/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/000ZZ/INP9087105/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INP9086343/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INQ1096858/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ0092039/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/005PD/INQ0092019/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/003PD/INQ0092698/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/001PD/INQ0091585/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INP9087820/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ3103421/N0', 'https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/005PD/INP9087854/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INQ3103362/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/005PD/INQ3103363/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ2101139/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/003PD/INL1000215/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ0092699/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ0092860/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ0092861/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INQ3103200/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/003PD/INP9087773/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/005PD/INQ0091285/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INQ1096859/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/003PD/INQ3103240/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/003PD/INP9086378/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/005PD/INQ0092102/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ3103401/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/001PD/INQ0092101/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/002PD/INQ3103420/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/001PD/INP9087772/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/001PD/INQ0092039/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/001PD/INQ0092018/N0','https://en.didattica.unipd.it/off/2022/LM/IN/IN0532/004PD/INQ0091642/N0','https://en.didattica.unipd.it/off/2023/LM/IN/IN0532/004PD/INQ0092120/N0']

responses = []
for url in urls:
  responses.append(requests.get(url))


## Utility functions for parsing text from the structure of a single course page


In this section are collected functions used for parsing the page of a single course.<br>
More specifically:<br>
-**parse_sel_informazioni(div)**: This function is designed to extract information from a given HTML division that contains a table. The function assumes that the information is structured in rows, where each row contains a table header and table data. It returns a formatted string containing the extracted information from the table. <br>
-**parse_sel_docenti(div)**: This function extracts and formats information about the instructors from a given HTML division that contains a table similar to the function above. <br>
-**parse_sel_boll(div)**:This functions extracts and formats information regarding syllabus, prerequisites, books etc. similar as functions above.

In [None]:
def parse_sel_informazioni(div):
  #Extracting all the table rows
  rows = div.find_all('tr')
  #Extracting and printing each piece of information
  extracted_info = ''
  for row in rows:
    th = row.find('th')
    td = row.find('td')
    if th and td:
      extracted_info += "{}: {} ".format(th.text.strip(), td.text.strip())
  return extracted_info

In [None]:
def parse_sel_docenti(div):
    result = ""
    # Find all table rows
    rows = div.find_all('tr')
    for row in rows:
        # Find all cells in the row
        cells = row.find_all(['td', 'th'])  # Include 'th' as headers are common in tables
        row_data = []
        for cell in cells:
            # Extract text from cell and append to row_data
            row_data.append(cell.get_text(strip=True))
        # Convert row_data to a string and append to result
        row_data.remove('')
        result += ' '.join(row_data)
    return result

In [None]:
def parse_sel_boll(div):
    result = ""
    # Find all table rows
    rows = div.find_all('tr')
    for row in rows:
        # Find all cells in the row
        cells = row.find_all('td')  # Include 'th' as headers are common in tables
        row_data = []
        for cell in cells:
            # Extract text from cell and append to row_data
            row_data.append(cell.get_text(strip=True))
        # Convert row_data to a string and append to result
        result += ' '.join(row_data)
    return result

In [None]:
def parse_titolopagina(div):
  # Extract the text
  course_title = div.find_all(string=True, recursive=False)[1].strip()
  return course_title

## Creation of the dataframe

In this section, the dataset is finally created.

Firstly, a dataframe is constructed with three columns: the course, the type of information, and the content of each part. This was achieved using BeautifulSoup tools.

During the dataset creation, some challenges were encountered:

- If the information related to the teacher was in another section, RAG had difficulty finding it. To resolve this, we merged this information into the part containing information regarding number of CFU, code of the exam etc.
- The information regarding mandatory or elective exams for each Master's degree was provided by "Allegato 3" on the Unipd page in a table format. RAG struggled to extract this information effectively. Therefore, we utilized state-of-the-art tools, specifically GPT-4, to create a comprehensive discourse on this data.
- The type of information for each course was removed (e.g., sel_boll) to produce a dataframe with only a column for the course and a column for the content.
- The inclusion of additional information from Wikipedia relating topics about courses was testes. However, RAG had more difficult, problably due to conflicts between information.

At the end, for each course, two documents are available: one containing information regarding CFU, course code, etc., and one containing information regarding prerequisites, suggested materials, etc, each with the name of the course at the beginning of the document.


In [None]:
# Define the column names
columns = ['course', 'info', 'content']

# Create an empty DataFrame with the specified columns
df = pd.DataFrame(columns=columns)


In [None]:
df = pd.DataFrame(columns=columns)

def concatenated_info(existing_info, new_info):
  return existing_info + " " + new_info

for response in responses:
    # Check if the request was successful
    if response.status_code == 200:
        # Parse the HTML content of the page
        soup = BeautifulSoup(response.text, 'html.parser')

        # Find the div elements containing the table
        divs = soup.find_all('div')
        for div in divs:
            if div:
                classes = div.get('class', [])

                if 'titolopagina' in classes:
                    course = parse_titolopagina(div)

                elif 'sel_informazioni' in classes:
                    df.loc[len(df)] = [course, 'sel_informazioni', parse_sel_informazioni(div)]

                elif 'sel_docenti' in classes:
                    existing_info = df[(df['course'] == course) & (df['info'] == 'sel_informazioni')]
                    if not existing_info.empty:
                        existing_index = existing_info.index[0]
                        existing_info_value = df.at[existing_index, 'content']
                        concatenated_info_value = concatenated_info(existing_info_value, parse_sel_docenti(div))
                        df.at[existing_index, 'content'] = concatenated_info_value

                elif 'sel_boll' in classes:
                    df.loc[len(df)] = [course, 'sel_boll', parse_sel_boll(div)]

In [None]:
new_row_1 = pd.DataFrame([{
    'course': 'Study Plan AI&Robotics track',
    'info': 'plan',
    'content': '''For the master's degree course in Computer Engineering at the University of Padua, specifically within the Artificial Intelligence and Robotics curriculum (curriculum code 001PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.
    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Automata, Languages, and Computation," which falls under the subject area MAT/01. The course code is INQ0091306, and it offers 9 CFU, with 72 hours of lectures. Attendance is not required, and the course is taught in English, with evaluation based on a final grade. Another course in the first semester is "Operations Research 1," under the subject area MAT/09, with the code INQ0091561. This course also offers 9 CFU and involves 72 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation. "Machine Learning" is another first-semester course, under the subject area ING-INF/05, with the code INP9087775. It offers 6 CFU, with 48 hours of lectures, and follows the same pattern of not requiring attendance, being taught in English, and having a final grade evaluation.
    Moving to the second semester of the first year, students must take "Artificial Intelligence" in the first semester. This course, under the subject area ING-INF/05, has the code INQ3103090 and provides 9 CFU with 72 hours of lectures. Like the other courses, attendance is not required, the language of instruction is English, and evaluation is based on a final grade. Always in the second semester of the first year, students will take "Computer Vision," which is also under the subject area ING-INF/05, with the code INP9087774. This course offers 9 CFU and includes 48 hours of lectures plus 24 hours of practicals. It is taught in English, with no attendance requirement and a final grade evaluation. Another second-semester course is "Deep Learning," under the subject area ING-INF/05, with the code INQ0091579. This course provides 6 CFU with 48 hours of lectures, taught in English, with no attendance requirement and a final grade evaluation.
    In the second year, there is one mandatory course called "Intelligent Robotics," under the subject area ING-INF/05, with the code INQ0091300. This course offers 9 CFU and includes 72 hours of lectures in the first semester. It follows the same pattern of being taught in English, with no attendance requirement, and evaluation based on a final grade.
    Students must also choose elective courses totaling 18 credits from several options. One option is "Big Data Computing," under the subject area ING-INF/05, with the code INP7079233, offering 6 CFU and 48 hours of lectures in the second semester. "Robotics and Control 1" is another option, under the subject area ING-INF/04, with the code INQ0091311. This course provides 9 CFU with 72 hours of lectures in the second semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Neurorobotics and Neurorehabilitation," under the subject area ING-INF/05, with the code INQ0091642, offering 6 CFU and 48 hours of lectures in the first semester. It is taught in English, with no attendance requirement and a final grade evaluation.
    Another elective option is "Learning from Networks," under the subject area ING-INF/05, with the code INQ0091104. This course offers 6 CFU with 48 hours of lectures in the first semester, taught in English, with no attendance requirement and a final grade evaluation. Students may also select "Natural Language Processing," under the subject area ING-INF/05, with the code INQ0091105. This course provides 6 CFU and 48 hours of lectures in the second semester. It follows the same pattern of being taught in English, with no attendance requirement and evaluation based on a final grade. Lastly, "3D Data Processing," under the subject area ING-INF/05, with the code INQ0091621, offers 6 CFU and includes 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement, and evaluation is based on a final grade.
    In addition to the aforementioned courses, students must choose 12 credits from the following courses or from the courses not chosen previously:
    One option is "Game Theory," under the subject area INF/01 (CFU 3.0), ING-INF/03 (CFU 3.0), with the code INP9087836. This course offers 6 CFU and includes 48 hours of lectures in the first semester. "Industrial Robotics" is not mandatory, under the subject area ING-IND/13, with the code INQ0091301. This course provides 9 CFU with 72 hours of lectures in the first semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Innovation, Entrepreneurship and Finance," under the subject area ING-IND/35, with the code INP8084359, offering 9 CFU and 72 hours of lectures in the first semester. It is taught in English, with no attendance requirement and a final grade evaluation. "Operations Research 2," under the subject area MAT/09, with the code INQ0091640, offers 6 CFU and 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement and evaluation based on a final grade. Lastly, "Quality Engineering," under the subject area ING-INF/07, with the code INQ0091601, offers 6 CFU and 48 hours of lectures in the first semester. The course is taught in English, with no attendance requirement and evaluation based on a final grade.
    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP9087943. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Italian Language," with the code INQ0093091. This course is also part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.
    In their second year, students must choose one of the following activities to gain practical experience:
    One option is the "Internship," with the course code INP9087862. This course is part of the Comune curriculum, offers 9 CFU, and involves 225 hours of practical work. The internship provides students with the opportunity to apply theoretical knowledge in a real-world professional setting, enhancing their practical skills and gaining valuable work experience. It is taught in English, and evaluation is based on a final judgement.
    Another option is "Research Training," with the course code INQ0091098. This course is also part of the Comune curriculum, offers 9 CFU, and involves 225 hours of research-oriented practice. Research training allows students to engage in a structured research project, working closely with faculty members or researchers. This experience helps students develop advanced research skills, critical thinking, and problem-solving abilities. It is taught in English, and evaluation is based on a final judgement.
    Finally, an additional mandatory activity in the second year is the "Final Project," with the course code INP9087846. This course is part of the Comune curriculum, offers 21 CFU, and involves 525 hours of research. The final project, often referred to as the thesis, is a comprehensive research project where students conduct independent research on a topic related to their field of study. This project allows students to demonstrate their ability to apply the knowledge and skills they have acquired throughout the course to solve complex problems. The thesis is a significant component of the master's degree program and is evaluated based on a final judgement.
    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_1], ignore_index=True)

In [None]:
new_row_2 = pd.DataFrame([{
    'course': 'Study Plan Bioinformatics track',
    'info': 'plan',
    'content': '''For the master's degree course in Computer Engineering at the University of Padua, specifically within the Bioinformatics curriculum (curriculum code 002PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.

    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Automata, Languages, and Computation," which falls under the subject area MAT/01. The course code is INQ0091306, and it offers 9 CFU, with 72 hours of lectures. Attendance is not required, and the course is taught in English, with evaluation based on a final grade. Another course in the first semester is "Operations Research 1," under the subject area MAT/09, with the code INQ0091561. This course also offers 9 CFU and involves 72 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation. "Machine Learning" is another first-semester course, under the subject area ING-INF/05, with the code INP9087775. It offers 6 CFU, with 48 hours of lectures, and follows the same pattern of not requiring attendance, being taught in English, and having a final grade evaluation. Additionally, "Inferential Statistics" under the subject area SECS-S/01 with the code INQ0091562, provides 6 CFU with 48 hours of lectures. It is taught in English, with no attendance requirement, and a final grade evaluation. "Bioinformatics" is another course in the first semester under the subject area ING-INF/05 with the code INQ0091560, offering 9 CFU with 72 hours of lectures, taught in English, with no attendance requirement, and a final grade evaluation.

    In the second year, there are mandatory courses as well. "Computational Genomics," under the subject area ING-INF/05, with the code INP9087773, provides 6 CFU with 48 hours of lectures in the first semester. "Learning from Networks" under the subject area ING-INF/05 with the code INQ0101044, offers 6 CFU with 48 hours of lectures in the first semester. Both courses follow the same pattern of being taught in English, with no attendance requirement, and evaluation based on a final grade.

    Students must also choose elective courses totaling 24 credits from several options. One option is "Big Data Computing," under the subject area ING-INF/05, with the code INP7079233, offering 6 CFU and 48 hours of lectures in the second semester. "Deep Learning" is another option, under the subject area ING-INF/05, with the code INQ0091579. This course provides 6 CFU with 48 hours of lectures in the second semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Search Engines," under the subject area ING-INF/05, with the code INQ0091116, offering 6 CFU and 48 hours of lectures in the second semester. "Web Applications" is another elective under the subject area ING-INF/05 with the code INQ0091420, offering 6 CFU with 48 hours of lectures in the second semester. "Distributed Systems," under the subject area ING-INF/05, with the code INQ0091520, provides 9 CFU with 72 hours of lectures in the first semester. "Natural Language Processing," under the subject area ING-INF/05, with the code INQ0091105, offers 6 CFU and 48 hours of lectures in the second semester.

    In addition to the aforementioned courses, students must choose 12 credits from the following courses or from the courses not chosen previously:
    One option is "Imaging for Neuroscience," under the subject area ING-INF/06, with the code INQ0091565. This course offers 6 CFU and includes 72 hours of lectures in the second semester. "Structural Bioinformatics" is another option, under the subject area BIO/10, with the code INP9088046. This course provides 6 CFU with 32 hours of lectures and 16 hours of practicals in the second semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Advanced Algorithm Design," under the subject area INF/01, CFU 3.0, ING-INF/05, CFU 6.0, with the code INQ0091559, offering 9 CFU and 72 hours of lectures in the first semester. "Genomics and NGS Data Analysis," under the subject area BIO/11, with the code INQ0091640, offers 6 CFU and 48 hours of lectures in the second semester. Lastly, "Operations Research 2," under the subject area MAT/09, with the code INQ0091640, offers 6 CFU and 48 hours of lectures in the second semester.

    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP9087943. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Italian Language," with the code INQ0093091. This course is also part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.

    In their second year, students must choose one of the following activities to gain practical experience:
    One option is the "Internship," with the course code INP9087862. This course is part of the Comune curriculum, offers 9 CFU, and involves 225 hours of practical work. The internship provides students with the opportunity to apply theoretical knowledge in a real-world professional setting, enhancing their practical skills and gaining valuable work experience. It is taught in English, and evaluation is based on a final judgement.
    Another option is "Research Training," with the course code INQ0091098. This course is also part of the Comune curriculum, offers 9 CFU, and involves 225 hours of research-oriented practice. Research training allows students to engage in a structured research project, working closely with faculty members or researchers. This experience helps students develop advanced research skills, critical thinking, and problem-solving abilities. It is taught in English, and evaluation is based on a final judgement.

    Finally, an additional mandatory activity in the second year is the "Final Project," with the course code INP9087846. This course is part of the Comune curriculum, offers 21 CFU, and involves 525 hours of research. The final project, often referred to as the thesis, is a comprehensive research project where students conduct independent research on a topic related to their field of study. This project allows students to demonstrate their ability to apply the knowledge and skills they have acquired throughout the course to solve complex problems. The thesis is a significant component of the master's degree program and is evaluated based on a final judgement.

    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_2], ignore_index=True)


In [None]:
new_row_3 = pd.DataFrame([{
    'course': 'Study Plan High Performance and Big Data Computing track',
    'info': 'plan',
    'content': '''For the master's degree course in Computer Engineering at the University of Padua, specifically within the High Performance and Big Data Computing curriculum (curriculum code 003PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.

    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Automata, Languages, and Computation," which falls under the subject area MAT/01. The course code is INQ0091306, and it offers 9 CFU, with 72 hours of lectures. Attendance is not required, and the course is taught in English, with evaluation based on a final grade. Another course in the first semester is "Operations Research 1," under the subject area MAT/09, with the code INQ0091561. This course also offers 9 CFU and involves 72 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation. "Machine Learning" is another first-semester course, under the subject area ING-INF/05, with the code INP9087775. It offers 6 CFU, with 48 hours of lectures, and follows the same pattern of not requiring attendance, being taught in English, and having a final grade evaluation. Additionally, "Inferential Statistics" under the subject area SECS-S/01 with the code INP7079233, provides 6 CFU with 48 hours of lectures. It is taught in English, with no attendance requirement, and a final grade evaluation. "Big Data Computing" is another course in the first semester under the subject area ING-INF/05 with the code INP7079233, offering 6 CFU with 48 hours of lectures, taught in English, with no attendance requirement, and a final grade evaluation. "Parallel Computing" is another first-semester course under the subject area ING-INF/05 with the code INQ0091560, offering 9 CFU with 72 hours of lectures, taught in English, with no attendance requirement, and a final grade evaluation.

    In the second year, there are mandatory courses as well. "Advanced Algorithm Design," under the subject area INF/01, CFU 3.0, ING-INF/05, CFU 6.0, with the code INQ0091643, provides 9 CFU with 72 hours of lectures in the first semester. It follows the same pattern of being taught in English, with no attendance requirement, and evaluation based on a final grade.

    Students must also choose elective courses totaling 21 credits from several options. One option is "Artificial Intelligence," under the subject area ING-INF/05, with the code INQ3103090, offering 9 CFU and 72 hours of lectures in the second semester. "Computer Networks" is another option, under the subject area ING-INF/05, with the code INQ0091302. This course provides 9 CFU with 72 hours of lectures in the second semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Deep Learning," under the subject area ING-INF/05, with the code INQ0091579, offering 6 CFU and 48 hours of lectures in the second semester. "Search Engines" is another elective under the subject area ING-INF/05 with the code INQ0091116, offering 9 CFU with 72 hours of lectures in the first semester. "Distributed Systems," under the subject area ING-INF/05, with the code INQ0091520, provides 9 CFU with 72 hours of lectures in the first semester. "Learning from Networks," under the subject area ING-INF/05, with the code INQ0091104, offers 6 CFU and 48 hours of lectures in the first semester. "Computers and Networks Security," under the subject area ING-INF/05, with the code INQ0091641, offers 6 CFU and 48 hours of lectures in the second semester.

    In addition to the aforementioned courses, students must choose 12 credits from the following courses or from the courses not chosen previously:
    One option is "Bioinformatics," under the subject area ING-INF/05, with the code INQ0091581. This course offers 9 CFU and includes 72 hours of lectures in the second semester. "Computational Genomics" is another option, under the subject area ING-INF/05, with the code INP9087773. This course provides 6 CFU with 48 hours of lectures in the second semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Cryptography," under the subject area INF/01, with the code INQ0091559, offering 6 CFU and 48 hours of lectures in the first semester. "Game Theory," under the subject area INF/01, CFU 3.0, ING-INF/05, CFU 6.0, with the code INP9087836, offers 6 CFU and 48 hours of lectures in the first semester. "Operations Research 2," under the subject area MAT/09, with the code INQ0091640, offers 6 CFU and 48 hours of lectures in the second semester. "Stochastic Processes," under the subject area ING-INF/03, with the code INP9087876, offers 6 CFU and 48 hours of lectures in the second semester.

    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP9087943. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Italian Language," with the code INQ0093091. This course is also part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.

    In their second year, students must choose one of the following activities to gain practical experience:
    One option is the "Internship," with the course code INP9087862. This course is part of the Comune curriculum, offers 9 CFU, and involves 225 hours of practical work. The internship provides students with the opportunity to apply theoretical knowledge in a real-world professional setting, enhancing their practical skills and gaining valuable work experience. It is taught in English, and evaluation is based on a final judgement.
    Another option is "Research Training," with the course code INQ0091098. This course is also part of the Comune curriculum, offers 9 CFU, and involves 225 hours of research-oriented practice. Research training allows students to engage in a structured research project, working closely with faculty members or researchers. This experience helps students develop advanced research skills, critical thinking, and problem-solving abilities. It is taught in English, and evaluation is based on a final judgement.

    Finally, an additional mandatory activity in the second year is the "Final Project," with the course code INP9087846. This course is part of the Comune curriculum, offers 21 CFU, and involves 525 hours of research. The final project, often referred to as the thesis, is a comprehensive research project where students conduct independent research on a topic related to their field of study. This project allows students to demonstrate their ability to apply the knowledge and skills they have acquired throughout the course to solve complex problems. The thesis is a significant component of the master's degree program and is evaluated based on a final judgement.

    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_3], ignore_index=True)

In [None]:
new_row_4 = pd.DataFrame([{
    'course': 'Study Plan Web Information and Data Engineering track',
    'info': 'plan',
    'content': '''For the master's degree course in Computer Engineering at the University of Padua, specifically within the Web Information and Data Engineering curriculum (curriculum code 004PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.

    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Automata, Languages, and Computation," which falls under the subject area MAT/01. The course code is INQ0091306, and it offers 9 CFU, with 72 hours of lectures. Attendance is not required, and the course is taught in English, with evaluation based on a final grade. Another course in the first semester is "Operations Research 1," under the subject area MAT/09, with the code INQ0091561. This course also offers 9 CFU and involves 72 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation. "Machine Learning" is another first-semester course, under the subject area ING-INF/05, with the code INP9087775. It offers 6 CFU, with 48 hours of lectures, and follows the same pattern of not requiring attendance, being taught in English, and having a final grade evaluation. Additionally, "Computer Networks" under the subject area ING-INF/05 with the code INQ0091643, provides 9 CFU with 72 hours of lectures. It is taught in English, with no attendance requirement, and a final grade evaluation. "Search Engines" is another course in the first semester under the subject area ING-INF/05 with the code INQ0091116, offering 9 CFU with 72 hours of lectures, taught in English, with no attendance requirement, and a final grade evaluation. "Web Applications" is another first-semester course under the subject area ING-INF/05 with the code INP7079233, offering 6 CFU with 48 hours of lectures, taught in English, with no attendance requirement, and a final grade evaluation.

    In the second year, there are mandatory courses as well. "Graph Databases," under the subject area ING-INF/05, with the code INQ3103080, provides 9 CFU with 72 hours of lectures in the first semester. It follows the same pattern of being taught in English, with no attendance requirement, and evaluation based on a final grade.

    Students must also choose elective courses totaling 18 credits from several options. One option is "Software Platforms," under the subject area ING-INF/05, with the code INQ0091500, offering 6 CFU and 48 hours of lectures in the second semester. "Concurrent and Real-Time Programming" is another option, under the subject area ING-INF/05, with the code INQ0091400. This course provides 6 CFU with 48 hours of lectures in the second semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Distributed Systems," under the subject area ING-INF/05, with the code INQ0091520, offering 9 CFU and 72 hours of lectures in the first semester. "Privacy Preserving Information Access" is another elective under the subject area ING-INF/05 with the code INQ0091641, offering 6 CFU with 48 hours of lectures in the second semester. "Computers and Networks Security," under the subject area ING-INF/05, with the code INQ0091105, provides 6 CFU with 48 hours of lectures in the second semester. "Computer Engineering for Music and Multimedia," under the subject area ING-INF/05, with the code INQ0091640, offers 6 CFU and 48 hours of lectures in the second semester. "Natural Language Processing," under the subject area ING-INF/05, with the code INQ0091105, offers 6 CFU and 48 hours of lectures in the second semester.

    In addition to the aforementioned courses, students must choose 12 credits from the following courses or from the courses not chosen previously:
    One option is "Inferential Statistics," under the subject area SECS-S/01, with the code INQ0091562. This course offers 6 CFU and includes 48 hours of lectures in the first semester. "Quality Engineering" is another option, under the subject area ING-INF/07, with the code INQ0091601. This course provides 6 CFU with 48 hours of lectures in the first semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Big Data Computing," under the subject area ING-INF/05, with the code INP7079233, offering 6 CFU and 48 hours of lectures in the second semester. "Geographic Information Systems" is another elective under the subject area ING-INF/05 with the code INQ0091579, offering 6 CFU with 48 hours of lectures in the first semester. "Advanced Text Analytics," under the subject area ING-INF/05, with the code INQ0091116, provides 6 CFU with 48 hours of lectures in the first semester. "Operation Security," under the subject area ING-INF/05, with the code INQ0091640, offers 6 CFU and 48 hours of lectures in the second semester. "Operations Research 2," under the subject area MAT/09, with the code INQ0091640, offers 6 CFU and 48 hours of lectures in the second semester.

    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP9087943. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Italian Language," with the code INQ0093091. This course is also part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.

    In their second year, students must choose one of the following activities to gain practical experience:
    One option is the "Internship," with the course code INP9087862. This course is part of the Comune curriculum, offers 9 CFU, and involves 225 hours of practical work. The internship provides students with the opportunity to apply theoretical knowledge in a real-world professional setting, enhancing their practical skills and gaining valuable work experience. It is taught in English, and evaluation is based on a final judgement.
    Another option is "Research Training," with the course code INQ0091098. This course is also part of the Comune curriculum, offers 9 CFU, and involves 225 hours of research-oriented practice. Research training allows students to engage in a structured research project, working closely with faculty members or researchers. This experience helps students develop advanced research skills, critical thinking, and problem-solving abilities. It is taught in English, and evaluation is based on a final judgement.

    Finally, an additional mandatory activity in the second year is the "Final Project," with the course code INP9087846. This course is part of the Comune curriculum, offers 21 CFU, and involves 525 hours of research. The final project, often referred to as the thesis, is a comprehensive research project where students conduct independent research on a topic related to their field of study. This project allows students to demonstrate their ability to apply the knowledge and skills they have acquired throughout the course to solve complex problems. The thesis is a significant component of the master's degree program and is evaluated based on a final judgement.

    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_4], ignore_index=True)

In [None]:
new_row_5 = pd.DataFrame([{
    'course': 'Study Plan Digital Health and Clinical Engineering track',
    'info': 'plan',
    'content': '''For the master's degree course in Digital Health and Clinical Engineering at the University of Padua, specifically within the Digital Health and Clinical Engineering curriculum (curriculum code 001PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.
    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Statistical Methods for Bioengineering," which falls under the subject area ING-INF/06. The course code is INP087105, and it offers 9 CFU, with 72 hours of lectures divided into 48 hours of lectures and 24 hours of practice. Attendance is not required, and the course is taught in Italian, with evaluation based on a final grade. Another course in the first semester is "Biological Signal Processing," under the subject area ING-INF/06, with the code INL010851. This course also offers 9 CFU and involves 72 hours of lectures divided into 56 hours of lectures and 16 hours of practice. It is taught in Italian, with no attendance requirement and a final grade evaluation. "Mechanics of Biological Tissues" is another first-semester course, under the subject area ING-IND/34, with the code INL100186. It offers 9 CFU, with 72 hours of lectures. Attendance is not required, the course is taught in Italian, and evaluation is based on a final grade.
    Moving to the second semester of the first year, students must take "Bioimaging." This course, under the subject area ING-INF/06, has the code INP086343 and provides 9 CFU with 72 hours of lectures. Attendance is not required, the language of instruction is Italian, and evaluation is based on a final grade. Another second-semester course is "Machine Learning for Bioengineering," under the subject area ING-INF/06, with the code INP087820. This course offers 6 CFU and includes 48 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation.
    In the second year, there are several mandatory courses. "Analysis of Biological Data," under the subject area ING-INF/06, with the code INL000215, offers 9 CFU and includes 48 hours of lectures in the first semester. It is taught in Italian, with no attendance requirement and a final grade evaluation. "Biomedical Wearable Technologies for Healthcare and Wellbeing," under the subject area ING-INF/06, with the code INQ009018, offers 6 CFU with 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement and evaluation based on a final grade. Another mandatory course is "Clinical Engineering and Health Technology Assessment," under the subject area ING-INF/06, with the code INP087772. This course provides 6 CFU with 48 hours of lectures in the second semester, taught in English, with no attendance requirement and a final grade evaluation.
    Students must also choose elective courses totaling 18 credits from several options. One option is "Functional Anatomy," under the subject area BIO/16, with the code INQ009268, offering 9 CFU and 72 hours of lectures in the second semester. "Biosensors" is another option, under the subject area ING-INF/01, with the code INQ313020, providing 9 CFU with 72 hours of lectures in the first semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Cardiovascular Flows Modelling," under the subject area ICAR/01, with the code INQ009039, offering 9 CFU and 72 hours of lectures in the second semester. It is taught in English, with no attendance requirement and a final grade evaluation.
    Another elective option is "Imaging for Neuroscience," under the subject area ING-INF/06, with the code INQ091585. This course offers 9 CFU with 72 hours of lectures in the first semester, taught in English, with no attendance requirement and a final grade evaluation. Students may also select "Wearable Sensing Design for Healthcare," under the subject area ING-INF/07, with the code INQ310353. This course provides 9 CFU and 72 hours of lectures in the second semester, taught in English, with no attendance requirement and evaluation based on a final grade. Another option is "Computational Genomics," under the subject areas INF/01 (CFU 3.0) and ING-INF/06 (CFU 3.0), with the code INP087773. This course offers 6 CFU with 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement, and evaluation is based on a final grade. Another elective is "Medical Biotechnologies," under the subject area MED/07, with the code INP087821, offering 6 CFU and including 48 hours of lectures in the first semester. It is taught in English, with no attendance requirement and a final grade evaluation. Lastly, "Medical Big Data Sources and Clinical Decision Support Systems," under the subject areas ING-INF/06 (CFU 3.0) and MED/13 (CFU 3.0), with the code INQ009201, offers 6 CFU and includes 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement and a final grade evaluation.
    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP406837. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Final Examination," with the code INQ106498. This course is also part of the Comune curriculum, offers 24 CFU, and involves 600 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.
    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_5], ignore_index=True)


In [None]:
new_row_6 = pd.DataFrame([{
    'course': 'Study Plan Industrial Bioengineering track',
    'info': 'plan',
    'content': '''For the master's degree course in Industrial Bioengineering at the University of Padua, specifically within the Industrial Bioengineering curriculum (curriculum code 002PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.
    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Statistical Methods for Bioengineering," which falls under the subject area ING-INF/06. The course code is INP087105, and it offers 9 CFU, with 72 hours of lectures divided into 48 hours of lectures and 24 hours of practice. Attendance is not required, and the course is taught in Italian, with evaluation based on a final grade. Another course in the first semester is "Mechanics of Biological Tissues," under the subject area ING-IND/34, with the code INL100186. This course also offers 9 CFU and involves 72 hours of lectures. It is taught in Italian, with no attendance requirement and a final grade evaluation.
    In the second year, there are several mandatory courses. "Advanced Biomaterials for Biomedicine," under the subject area ING-IND/34, with the code INQ101139, offers 6 CFU and includes 48 hours of lectures in the first semester. It is taught in Italian, with no attendance requirement and a final grade evaluation. "Artificial Organs," under the subject area ING-IND/34, with the code INQ009290, offers 9 CFU with 72 hours of lectures in the first semester. The course is taught in Italian, with no attendance requirement and evaluation based on a final grade. Another mandatory course is "Structures and Mechanics of Biomaterials," under the subject area ING-IND/34, with the code INQ313040, offering 6 CFU with 48 hours of lectures in the second semester. The course is taught in Italian, with no attendance requirement and a final grade evaluation.
    Students must choose one of the following pairs of courses in the second year:
    One option is "Computational Biomechanics," under the subject area ING-IND/34, with the code INQ009219, offering 9 CFU and including 72 hours of lectures in the second semester. Another option is "Computational Mechanics for Clinical Applications," under the subject area ING-IND/34, with the code INQ313042, offering 9 CFU and including 72 hours of lectures in the second semester. Another pair includes "Polymeric Biomaterials for Medicine," under the subject area ING-IND/34, with the code INQ313421, offering 9 CFU and including 72 hours of lectures in the second semester. The final option is "Nanotechnologies for Bioengineering," under the subject area ING-IND/34, with the code INQ313401, offering 9 CFU and including 72 hours of lectures in the second semester. All these courses are taught in Italian, with no attendance requirement and a final grade evaluation.
    Students must also choose elective courses totaling 18 credits from several options. One option is "Functional Anatomy," under the subject area BIO/16, with the code INQ009268, offering 9 CFU and 72 hours of lectures in the second semester. "Cardiovascular Flows Modelling" is another option, under the subject area ICAR/01, with the code INQ009039, providing 9 CFU with 72 hours of lectures in the first semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Biomimetic Activities," under the subject area CHIM/07, with the code INQ009280, offering 9 CFU and 72 hours of lectures in the second semester. It is taught in Italian, with no attendance requirement and a final grade evaluation.
    Another elective option is "Manufacturing for Biomedical Components," under the subject area ING-IND/16, with the code INQ098244. This course offers 6 CFU with 48 hours of lectures in the first semester, taught in English, with no attendance requirement and a final grade evaluation. Students may also select "Geometric Modeling of Medical Devices," under the subject area ING-IND/15, with the code INQ313402. This course provides 6 CFU and 48 hours of lectures in the second semester, taught in Italian, with no attendance requirement and evaluation based on a final grade. Another option is "Cellular and Tissue Engineering," under the subject area ING-IND/24, with the code INQ009278. This course offers 9 CFU with 72 hours of lectures in the first semester. The course is taught in Italian, with no attendance requirement, and evaluation is based on a final grade. Another elective is "Clinical Biotechnology and Bioengineering," under the subject area ING-IND/34, with the code INQ313032, offering 6 CFU and including 48 hours of lectures in the second semester. It is taught in Italian, with no attendance requirement and a final grade evaluation. Another option is "Biomaterials Technology," under the subject area ING-IND/22, with the code INQ009216, offering 9 CFU and including 72 hours of lectures in the second semester. The course is taught in Italian, with no attendance requirement and a final grade evaluation.
    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP406837. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Final Examination," with the code INQ106498. This course is also part of the Comune curriculum, offers 24 CFU, and involves 600 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.
    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_6], ignore_index=True)


In [None]:
new_row_7 = pd.DataFrame([{
    'course': 'Study Plan Biomedical Data Models and Analysis track',
    'info': 'plan',
    'content': '''For the master's degree course in Biomedical Data Models and Analysis at the University of Padua, specifically within the Biomedical Data Models and Analysis curriculum (curriculum code 003PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.
    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Statistical Methods for Bioengineering," which falls under the subject area ING-INF/06. The course code is INP087105, and it offers 9 CFU, with 72 hours of lectures divided into 48 hours of lectures and 24 hours of practice. Attendance is not required, and the course is taught in Italian, with evaluation based on a final grade. Another course in the first semester is "Biological Signal Processing," under the subject area ING-INF/06, with the code INL010851. This course also offers 9 CFU and involves 72 hours of lectures divided into 56 hours of lectures and 16 hours of practice. It is taught in Italian, with no attendance requirement and a final grade evaluation. "Modeling Methodology for Physiology and Medicine" is another first-semester course, under the subject area ING-INF/06, with the code INQ009198. It offers 9 CFU, with 72 hours of lectures. Attendance is not required, the course is taught in English, and evaluation is based on a final grade. Another first-semester course is "Bioimaging," under the subject area ING-INF/06, with the code INP086343. It offers 9 CFU, with 72 hours of lectures. Attendance is not required, the course is taught in Italian, and evaluation is based on a final grade. "Machine Learning for Bioengineering" is a second-semester course, under the subject area ING-INF/06, with the code INP087820. This course offers 6 CFU and includes 48 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation.
    In the second year, there are several mandatory courses. "Analysis of Biological Data," under the subject area ING-INF/06, with the code INL000215, offers 6 CFU and includes 48 hours of lectures in the first semester. It is taught in Italian, with no attendance requirement and a final grade evaluation. "Control of Biological Systems," under the subject area ING-INF/06, with the code INQ009285, offers 6 CFU with 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement and evaluation based on a final grade. Another mandatory course is "Mathematical Cell Biology," under the subject area ING-INF/06, with the code INP086734. This course provides 6 CFU with 48 hours of lectures in the first semester, taught in English, with no attendance requirement and a final grade evaluation.
    Students must also choose elective courses totaling 18 credits from several options. One option is "Functional Anatomy," under the subject area BIO/16, with the code INQ009268, offering 9 CFU and 72 hours of lectures in the second semester. "Biosensors" is another option, under the subject area ING-INF/01, with the code INQ313020, providing 9 CFU with 72 hours of lectures in the first semester, taught in English with no attendance requirement and a final grade evaluation. Another option is "Computational Genomics," under the subject areas INF/01 (CFU 3.0) and ING-INF/06 (CFU 3.0), with the code INP087773. This course offers 6 CFU with 48 hours of lectures in the first semester. The course is taught in English, with no attendance requirement, and evaluation is based on a final grade. Another elective is "Medical Biotechnologies," under the subject area MED/07, with the code INP087821, offering 6 CFU and including 48 hours of lectures in the first semester. It is taught in English, with no attendance requirement and a final grade evaluation. Lastly, "Systems Biology," under the subject area ING-INF/04, with the code INQ091284, offers 6 CFU and includes 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement and a final grade evaluation.
    Students must also choose 15 credits from the following courses or from courses not chosen previously:
    One option is "Imaging for Neuroscience," under the subject area ING-INF/06, with the code INQ091585. This course offers 9 CFU with 72 hours of lectures in the second semester, taught in English, with no attendance requirement and a final grade evaluation. Another option is "Human Neuromusculoskeletal Modelling," under the subject area ING-INF/06, with the code INQ310240. This course provides 6 CFU and 48 hours of lectures in the first semester, taught in English, with no attendance requirement and evaluation based on a final grade. Another option is "Medical Big Data Sources and Clinical Decision Support Systems," under the subject areas ING-INF/06 (CFU 3.0) and MED/13 (CFU 3.0), with the code INQ009201. This course offers 6 CFU with 48 hours of lectures in the second semester, taught in English, with no attendance requirement and a final grade evaluation.
    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP406837. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Final Examination," with the code INQ106498. This course is also part of the Comune curriculum, offers 24 CFU, and involves 600 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.
    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_7], ignore_index=True)


In [None]:
new_row_8 = pd.DataFrame([{
    'course': 'Study Plan Bioengineering for Neurosciences track',
    'info': 'plan',
    'content': '''For the master's degree course in Bioengineering for Neurosciences at the University of Padua, specifically within the Bioengineering for Neurosciences curriculum (curriculum code 004PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.
    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Statistical Methods for Bioengineering," which falls under the subject area ING-INF/06. The course code is INP087105, and it offers 9 CFU, with 72 hours of lectures divided into 48 hours of lectures and 24 hours of practice. Attendance is not required, and the course is taught in Italian, with evaluation based on a final grade. Another course in the first semester is "Biological Signal Processing," under the subject area ING-INF/06, with the code INL010851. This course also offers 9 CFU and involves 72 hours of lectures divided into 56 hours of lectures and 16 hours of practice. It is taught in Italian, with no attendance requirement and a final grade evaluation. "Modeling Methodology for Physiology and Medicine" is another first-semester course, under the subject area ING-INF/06, with the code INQ009198. It offers 9 CFU, with 72 hours of lectures. Attendance is not required, the course is taught in English, and evaluation is based on a final grade.
    Moving to the second semester of the first year, students must take "Bioimaging." This course, under the subject area ING-INF/06, has the code INP086343 and provides 9 CFU with 72 hours of lectures. Attendance is not required, the language of instruction is Italian, and evaluation is based on a final grade. Another second-semester course is "Biomarkers, Precision Medicine and Drug Development," under the subject area ING-INF/06, with the code INQ106968. This course offers 9 CFU and includes 72 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation. "Imaging for Neuroscience" is another second-semester course, under the subject area ING-INF/06, with the code INQ091585. This course offers 9 CFU and includes 72 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation.
    In the second year, there are several mandatory courses. "Neurophysiology, Neural Computation and Neurotechnologies," under the subject area BIO/09, with the code INQ009102, offers 6 CFU and includes 48 hours of lectures in the second semester. It is taught in English, with no attendance requirement and a final grade evaluation. "Mathematical Cell Biology," under the subject area ING-INF/06, with the code INP086378, offers 6 CFU with 48 hours of lectures in the first semester. The course is taught in English, with no attendance requirement and evaluation based on a final grade.
    Students must also choose elective courses totaling 15 credits from several options. One option is "Cell and Tissue Bioengineering," under the subject area ING-IND/24, with the code INQ009278, offering 6 CFU and 48 hours of lectures in the second semester. "Translational Biomedical Engineering for Cell and Gene Therapy" is another option, under the subject area ING-IND/24, with the code INQ313032, providing 6 CFU with 48 hours of lectures in the first semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Biosensors," under the subject area ING-INF/01, with the code INQ313020, offering 9 CFU and 72 hours of lectures in the first semester. It is taught in Italian, with no attendance requirement and a final grade evaluation. Another option is "Robotics and Neurorehabilitation," under the subject areas ING-INF/05 (CFU 3.0) and ING-INF/06 (CFU 3.0), with the code INQ009642. This course offers 6 CFU and includes 48 hours of lectures in the first semester. The course is taught in English, with no attendance requirement and a final grade evaluation. Another elective is "Medical Robotics," under the subject area ING-IND/13, with the code INQ009120, offering 9 CFU and including 72 hours of lectures in the first semester. It is taught in Italian, with no attendance requirement and a final grade evaluation.
    Students must also choose 12 credits from the following courses or from courses not chosen previously:
    One option is "Machine Learning for Bioengineering," under the subject area ING-INF/06, with the code INP087820. This course offers 6 CFU with 48 hours of lectures in the second semester, taught in English, with no attendance requirement and a final grade evaluation. Another option is "Analysis of Biological Data," under the subject area ING-INF/06, with the code INL000215. This course provides 6 CFU and 48 hours of lectures in the first semester, taught in Italian, with no attendance requirement and evaluation based on a final grade. Another option is "Control of Biological Systems," under the subject area ING-INF/06, with the code INQ009285. This course offers 6 CFU with 48 hours of lectures in the second semester, taught in English, with no attendance requirement and evaluation based on a final grade. Another elective is "Deep Learning Applied to Neuroscience and Rehabilitation," under the subject area ING-INF/06, with the code INQ106969, offering 6 CFU and including 48 hours of lectures in the first semester. It is taught in English, with no attendance requirement and a final grade evaluation.
    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP406837. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Final Examination," with the code INQ106498. This course is also part of the Comune curriculum, offers 24 CFU, and involves 600 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.
    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_8], ignore_index=True)

In [None]:
new_row_9 = pd.DataFrame([{
    'course': 'Study Plan Bioengineering for Rehabilitation track',
    'info': 'plan',
    'content': '''For the master's degree course in Bioengineering for Rehabilitation at the University of Padua, specifically within the Bioengineering for Rehabilitation curriculum (curriculum code 005PD) for the 2020 curriculum and cohort 2023, students are required to follow a specific formative plan. Below is a detailed description of the courses and activities involved in this plan.
    In the first year, students must complete several mandatory courses, each with specific details regarding content, credit units (CFU or ECTS), lecture hours, and evaluation methods.
    During the first semester of the first year, students need to take "Statistical Methods for Bioengineering," which falls under the subject area ING-INF/06. The course code is INP087105, and it offers 9 CFU, with 72 hours of lectures divided into 48 hours of lectures and 24 hours of practice. Attendance is not required, and the course is taught in Italian, with evaluation based on a final grade. Another course in the first semester is "Biological Signal Processing," under the subject area ING-INF/06, with the code INL010851. This course also offers 9 CFU and involves 72 hours of lectures divided into 56 hours of lectures and 16 hours of practice. It is taught in Italian, with no attendance requirement and a final grade evaluation. "Mechanics of Biological Tissues" is another first-semester course, under the subject area ING-IND/34, with the code INL100186. It offers 9 CFU, with 72 hours of lectures. Attendance is not required, the course is taught in Italian, and evaluation is based on a final grade. Another first-semester course is "Artificial Organs," under the subject area ING-IND/34, with the code INQ009269. It offers 9 CFU, with 72 hours of lectures. Attendance is not required, the course is taught in Italian, and evaluation is based on a final grade.
    Moving to the second semester of the first year, students must take "Computational Biomechanics." This course, under the subject area ING-IND/34, has the code INQ009219 and provides 9 CFU with 72 hours of lectures. Attendance is not required, the language of instruction is Italian, and evaluation is based on a final grade. Another second-semester course is "Machine Learning for Bioengineering," under the subject area ING-INF/06, with the code INP087820. This course offers 6 CFU and includes 48 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation. "Sports Engineering and Rehabilitation Devices" is another second-semester course, under the subject area ING-IND/14, with the code INP087854. This course offers 6 CFU and includes 48 hours of lectures. It is taught in English, with no attendance requirement and a final grade evaluation.
    In the second year, there are several mandatory courses. "Control of Biological Systems," under the subject area ING-INF/06, with the code INQ009285, offers 6 CFU and includes 48 hours of lectures in the first semester. It is taught in English, with no attendance requirement and a final grade evaluation. "Neurorobotics and Neurorehabilitation," under the subject areas ING-INF/05 (CFU 3.0) and ING-INF/06 (CFU 3.0), with the code INQ009642, offers 6 CFU with 48 hours of lectures in the first semester. The course is taught in English, with no attendance requirement and evaluation based on a final grade. Another mandatory course is "Medical Robotics," under the subject area ING-IND/13, with the code INQ009120. This course provides 9 CFU with 72 hours of lectures in the first semester, taught in Italian, with no attendance requirement and a final grade evaluation.
    Students must also choose elective courses totaling 15 credits from several options. One option is "Wearable Sensing Design for Healthcare," under the subject area ING-INF/07, with the code INQ310353, offering 9 CFU and 72 hours of lectures in the second semester. "Deep Learning Applied to Neuroscience and Rehabilitation" is another option, under the subject area ING-INF/06, with the code INQ106969, providing 6 CFU with 48 hours of lectures in the first semester, taught in English with no attendance requirement and a final grade evaluation. Additionally, students can choose "Biosensors," under the subject area ING-INF/01, with the code INQ313020, offering 9 CFU and 72 hours of lectures in the first semester. It is taught in Italian, with no attendance requirement and a final grade evaluation. Another option is "Biomedical Wearable Technologies for Healthcare and Wellbeing," under the subject area ING-INF/06, with the code INQ009018. This course offers 6 CFU and includes 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement and a final grade evaluation. Another elective is "Computational Mechanics for Clinical Applications," under the subject area ING-IND/34, with the code INQ313040, offering 9 CFU and including 72 hours of lectures in the second semester. It is taught in Italian, with no attendance requirement and a final grade evaluation. Another option is "Neurophysiology, Neural Computation and Neurotechnologies," under the subject area BIO/09, with the code INQ009102. This course offers 6 CFU and includes 48 hours of lectures in the second semester. The course is taught in English, with no attendance requirement and a final grade evaluation.
    In addition to the aforementioned courses, students must choose one of the following language proficiency activities:
    One option is "English Language B2 (Productive Skills)," with the code INP406837. This course is part of the Comune curriculum, offers 3 CFU, and involves 75 hours of practice in the first year. The course is taught in English, with evaluation based on a final judgement. Another option is "Final Examination," with the code INQ106498. This course is also part of the Comune curriculum, offers 24 CFU, and involves 600 hours of practice in the first year. It is taught in Italian, with evaluation based on a final judgement.
    Note on Study Plan Approval
    These courses constitute the study plan with automatic approval. Students may also select electives from other courses, but such choices must be approved by the commission of the University of Padua (UNIPD).
'''
}])

# Concatenate the new row to the DataFrame
df = pd.concat([df, new_row_9], ignore_index=True)


In [None]:
df = df[['course', 'content']]
df

Unnamed: 0,course,content
0,"AUTOMATA, LANGUAGES AND COMPUTATION",Degree course: Second cycle degree in\nCOMPUTE...
1,"AUTOMATA, LANGUAGES AND COMPUTATION",Prerequisites: In order to be able to successf...
2,MACHINE LEARNING,Degree course: Second cycle degree in\nCOMPUTE...
3,MACHINE LEARNING,"Prerequisites: Basic Knowledge of Mathematics,..."
4,INFERENTIAL STATISTICS,Degree course: Second cycle degree in\nCOMPUTE...
...,...,...
103,Study Plan Digital Health and Clinical Enginee...,For the master's degree course in Digital Heal...
104,Study Plan Industrial Bioengineering track,For the master's degree course in Industrial B...
105,Study Plan Biomedical Data Models and Analysis...,For the master's degree course in Biomedical D...
106,Study Plan Bioengineering for Neurosciences track,For the master's degree course in Bioengineeri...


In [None]:
# Merge two columns using the .str.cat() method
df['Merged_Column'] = df['course'].astype(str).str.cat(df['content'], sep=' ')
df = df['Merged_Column']
print(df)

0      AUTOMATA, LANGUAGES AND COMPUTATION Degree cou...
1      AUTOMATA, LANGUAGES AND COMPUTATION Prerequisi...
2      MACHINE LEARNING Degree course: Second cycle d...
3      MACHINE LEARNING Prerequisites: Basic Knowledg...
4      INFERENTIAL STATISTICS Degree course: Second c...
                             ...                        
103    Study Plan Digital Health and Clinical Enginee...
104    Study Plan Industrial Bioengineering track For...
105    Study Plan Biomedical Data Models and Analysis...
106    Study Plan Bioengineering for Neurosciences tr...
107    Study Plan Bioengineering for Rehabilitation t...
Name: Merged_Column, Length: 108, dtype: object


##Dataset saving in different format


In these snippet of code, the dataset was saved in different formats, ready to be further processed by other tools.

In [None]:
df.to_csv('/content/drive/MyDrive/Lorenzon_Lovo_NLP/Assignment_2/Data/first_dataset.csv', index=False)

In [None]:
# Open a file in write mode
with open('output.txt', 'w') as f:
    # Iterate through all rows using iloc
    for i in range(len(df)):
        text = df.iloc[i]
        text = text.replace('\n', ' ')
        f.write(f"Text {i}: {text}\n\n")

In [None]:
!cp '/content/output.txt' '/content/drive/MyDrive/Lorenzon_Lovo_NLP/Assignment_2/Data/dataset.txt'