# Task
Analyze the provided syllabi data in "/content/testdata/combined_apc_syllabi_data.csv" using a pipeline of spaCy, BloomBERT (located in "/content/BloomBERT"), and TextBlob to extract and lemmatize verbs, assign Bloom's taxonomy levels to them, and analyze the sentiment and thematic alignment between Learning Outcomes, Deliverables Outcomes, and Assessments. Present the results showing the verb-to-taxonomy mapping, sentiment, and thematic alignment for each section.

In [2]:
import pandas as pd
import spacy

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

# Load the data into a DataFrame
df = pd.read_csv('\Programming\THESIS Model\combined_csv\combined_cleaned_syllabi_data.csv')

  df = pd.read_csv('\Programming\THESIS Model\combined_csv\combined_cleaned_syllabi_data.csv')


In [3]:
print(df.columns)  # Display the columns to check for leading/trailing spaces

Index(['Learning Outcomes', 'Deliverables', 'Assessments', 'Source_File',
       'Course_Code'],
      dtype='object')


In [4]:
# Define a function to process text and extract verb information
def process_text_and_extract_verbs(text):
    if pd.isna(text):
        return [], [], [], [], [] # Return 5 empty lists
    doc = nlp(str(text))
    tokens = []
    lemmas = []
    pos_tags = []
    verbs = []
    verb_lemmas = []
    for token in doc:
        tokens.append(token.text)
        lemmas.append(token.lemma_)
        pos_tags.append(token.pos_)
        if token.pos_ == 'VERB':
            verbs.append(token.text)
            verb_lemmas.append(token.lemma_)
    return tokens, lemmas, pos_tags, verbs, verb_lemmas

# Apply the function to the relevant columns
for col in ['Learning Outcomes', 'Deliverables', 'Assessments']:
    df[f'{col}_tokens'], df[f'{col}_lemmas'], df[f'{col}_pos_tags'], df[f'{col}_verbs'], df[f'{col}_verb_lemmas'] = zip(*df[col].apply(process_text_and_extract_verbs))

# Display the first few rows with new columns
display(df.head())

Unnamed: 0,Learning Outcomes,Deliverables,Assessments,Source_File,Course_Code,Learning Outcomes_tokens,Learning Outcomes_lemmas,Learning Outcomes_pos_tags,Learning Outcomes_verbs,Learning Outcomes_verb_lemmas,Deliverables_tokens,Deliverables_lemmas,Deliverables_pos_tags,Deliverables_verbs,Deliverables_verb_lemmas,Assessments_tokens,Assessments_lemmas,Assessments_pos_tags,Assessments_verbs,Assessments_verb_lemmas
0,Understand the overall class requirements and ...,"Accounts created in LinkedIn, SkillsBuild\nVie...",Registration in online courses sites\nActive p...,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Understand, the, overall, class, requirements...","[understand, the, overall, class, requirement,...","[VERB, DET, ADJ, NOUN, NOUN, CCONJ, NOUN, ADP,...",[Understand],[understand],"[Accounts, created, in, LinkedIn, ,, SkillsBui...","[account, create, in, LinkedIn, ,, SkillsBuild...","[NOUN, VERB, ADP, PROPN, PUNCT, PROPN, SPACE, ...",[created],[create],"[Registration, in, online, courses, sites, \n,...","[registration, in, online, course, site, \n, a...","[NOUN, ADP, ADJ, NOUN, NOUN, SPACE, ADJ, NOUN,...",[],[]
1,Discuss cloud native security including Kubern...,Cloud Native Security Exercise,Activity/Exercise,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Discuss, cloud, native, security, including, ...","[discuss, cloud, native, security, include, Ku...","[VERB, ADJ, ADJ, NOUN, VERB, PROPN, NOUN, PUNC...","[Discuss, including]","[discuss, include]","[Cloud, Native, Security, Exercise]","[Cloud, Native, Security, exercise]","[PROPN, PROPN, PROPN, NOUN]",[],[],"[Activity, /, Exercise]","[activity, /, exercise]","[NOUN, SYM, NOUN]",[],[]
2,Discuss hybrid data center security design con...,Hybrid Data Center Exercise,Quiz 1 - Oveview of Cloud Security and Cloud N...,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Discuss, hybrid, data, center, security, desi...","[discuss, hybrid, datum, center, security, des...","[VERB, ADJ, NOUN, NOUN, NOUN, NOUN, NOUN, PUNC...","[Discuss, including]","[discuss, include]","[Hybrid, Data, Center, Exercise]","[Hybrid, Data, Center, exercise]","[PROPN, PROPN, PROPN, NOUN]",[],[],"[Quiz, 1, -, Oveview, of, Cloud, Security, and...","[Quiz, 1, -, oveview, of, Cloud, Security, and...","[PROPN, NUM, PUNCT, NOUN, ADP, PROPN, PROPN, C...",[],[]
3,Learn the Alibaba Cloud Security,Alibaba Cloud Security Certificate,Completion of modules \nAlibaba Cloud Surveys,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Learn, the, Alibaba, Cloud, Security]","[learn, the, Alibaba, Cloud, Security]","[VERB, DET, PROPN, PROPN, PROPN]",[Learn],[learn],"[Alibaba, Cloud, Security, Certificate]","[Alibaba, Cloud, Security, Certificate]","[PROPN, PROPN, PROPN, PROPN]",[],[],"[Completion, of, modules, \n, Alibaba, Cloud, ...","[completion, of, module, \n, Alibaba, Cloud, S...","[NOUN, ADP, NOUN, SPACE, PROPN, PROPN, PROPN]",[],[]
4,Learn Machine Learning with Generative AI,,Assessment Exam,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Learn, Machine, Learning, with, Generative, AI]","[learn, Machine, Learning, with, Generative, AI]","[VERB, PROPN, PROPN, ADP, PROPN, PROPN]",[Learn],[learn],[],[],[],[],[],"[Assessment, Exam]","[Assessment, Exam]","[PROPN, PROPN]",[],[]


## Integrate bloombert

### Subtask:
Set up the BloomBERT model from the `/content/BloomBERT` folder and apply it to the extracted verbs to assign Bloom's levels.


**Reasoning**:
Load the BloomBERT model and tokenizer and define a function to predict Bloom's levels.



In [5]:
from transformers import TFDistilBertForSequenceClassification , DistilBertTokenizer
import tensorflow as tf

# Load the pre-trained BloomBERT model and tokenizer
model_path = "\Programming\THESIS Model\BloomBERT_T4"
tokenizer = DistilBertTokenizer.from_pretrained(model_path)
model = TFDistilBertForSequenceClassification.from_pretrained(model_path)

# Define a function to predict Bloom's taxonomy levels
def predict_blooms_level(verb_lemmas_list):
    if not verb_lemmas_list:
        return []

    # Join the verb lemmas into a single string for prediction
    text = " ".join(verb_lemmas_list)

    # Tokenize the text and get predictions
    inputs = tokenizer(text, return_tensors="tf", padding=True, truncation=True, max_length=128)
    outputs = model(inputs)
    predictions = tf.argmax(outputs.logits, axis=1).numpy()

    # Assuming the model outputs a single prediction for the combined text,
    # we'll assign this prediction to all verbs in the list.
    # In a more sophisticated approach, you might process each verb individually
    # or use a different model architecture.
    predicted_levels = [predictions[0]] * len(verb_lemmas_list) # Apply the single prediction to all verbs

    return predicted_levels

# Apply the function to the verb_lemmas columns
for col in ['Learning Outcomes', 'Deliverables', 'Assessments']:
    df[f'{col}_blooms_levels'] = df[f'{col}_verb_lemmas'].apply(predict_blooms_level)

# Display the first few rows with the new columns
display(df.head())

  model_path = "\Programming\THESIS Model\BloomBERT_T4"
  from .autonotebook import tqdm as notebook_tqdm






TensorFlow and JAX classes are deprecated and will be removed in Transformers v5. We recommend migrating to PyTorch classes or pinning your version of Transformers.
All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at \Programming\THESIS Model\BloomBERT_T4.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.
TensorFlow and JAX classes are deprecated and will be removed in Transformers v5. We recommend migrating to PyTorch classes or pinning your version of Transformers.


Unnamed: 0,Learning Outcomes,Deliverables,Assessments,Source_File,Course_Code,Learning Outcomes_tokens,Learning Outcomes_lemmas,Learning Outcomes_pos_tags,Learning Outcomes_verbs,Learning Outcomes_verb_lemmas,...,Deliverables_verbs,Deliverables_verb_lemmas,Assessments_tokens,Assessments_lemmas,Assessments_pos_tags,Assessments_verbs,Assessments_verb_lemmas,Learning Outcomes_blooms_levels,Deliverables_blooms_levels,Assessments_blooms_levels
0,Understand the overall class requirements and ...,"Accounts created in LinkedIn, SkillsBuild\nVie...",Registration in online courses sites\nActive p...,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Understand, the, overall, class, requirements...","[understand, the, overall, class, requirement,...","[VERB, DET, ADJ, NOUN, NOUN, CCONJ, NOUN, ADP,...",[Understand],[understand],...,[created],[create],"[Registration, in, online, courses, sites, \n,...","[registration, in, online, course, site, \n, a...","[NOUN, ADP, ADJ, NOUN, NOUN, SPACE, ADJ, NOUN,...",[],[],[0],[0],[]
1,Discuss cloud native security including Kubern...,Cloud Native Security Exercise,Activity/Exercise,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Discuss, cloud, native, security, including, ...","[discuss, cloud, native, security, include, Ku...","[VERB, ADJ, ADJ, NOUN, VERB, PROPN, NOUN, PUNC...","[Discuss, including]","[discuss, include]",...,[],[],"[Activity, /, Exercise]","[activity, /, exercise]","[NOUN, SYM, NOUN]",[],[],"[0, 0]",[],[]
2,Discuss hybrid data center security design con...,Hybrid Data Center Exercise,Quiz 1 - Oveview of Cloud Security and Cloud N...,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Discuss, hybrid, data, center, security, desi...","[discuss, hybrid, datum, center, security, des...","[VERB, ADJ, NOUN, NOUN, NOUN, NOUN, NOUN, PUNC...","[Discuss, including]","[discuss, include]",...,[],[],"[Quiz, 1, -, Oveview, of, Cloud, Security, and...","[Quiz, 1, -, oveview, of, Cloud, Security, and...","[PROPN, NUM, PUNCT, NOUN, ADP, PROPN, PROPN, C...",[],[],"[0, 0]",[],[]
3,Learn the Alibaba Cloud Security,Alibaba Cloud Security Certificate,Completion of modules \nAlibaba Cloud Surveys,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Learn, the, Alibaba, Cloud, Security]","[learn, the, Alibaba, Cloud, Security]","[VERB, DET, PROPN, PROPN, PROPN]",[Learn],[learn],...,[],[],"[Completion, of, modules, \n, Alibaba, Cloud, ...","[completion, of, module, \n, Alibaba, Cloud, S...","[NOUN, ADP, NOUN, SPACE, PROPN, PROPN, PROPN]",[],[],[0],[],[]
4,Learn Machine Learning with Generative AI,,Assessment Exam,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Learn, Machine, Learning, with, Generative, AI]","[learn, Machine, Learning, with, Generative, AI]","[VERB, PROPN, PROPN, ADP, PROPN, PROPN]",[Learn],[learn],...,[],[],"[Assessment, Exam]","[Assessment, Exam]","[PROPN, PROPN]",[],[],[0],[],[]


## Analyze with textblob

### Subtask:
Use TextBlob to analyze the sentiment and thematic alignment of the text, particularly focusing on the relationship between Learning Outcomes, Deliverables Outcomes, and Assessments.


**Reasoning**:
Import the TextBlob class and define the functions for sentiment analysis and thematic alignment, then apply them to the DataFrame.



In [8]:
from textblob import TextBlob

# Define a function to analyze sentiment
def analyze_sentiment(text):
    if pd.isna(text):
        return 0.0  # Return neutral sentiment for NaN
    return TextBlob(str(text)).sentiment.polarity

# Apply the analyze_sentiment function
df['Learning Outcomes_sentiment'] = df['Learning Outcomes'].apply(analyze_sentiment)
df['Deliverables Outcomes_sentiment'] = df['Deliverables'].apply(analyze_sentiment)
df['Assessments_sentiment'] = df['Assessments'].apply(analyze_sentiment)

# Define a function to calculate thematic alignment using Jaccard similarity of lemmas
def calculate_thematic_alignment(lemmas1, lemmas2):
    if not lemmas1 or not lemmas2:
        return 0.0  # Return 0.0 for no alignment if either list is empty or NaN

    set1 = set(lemmas1)
    set2 = set(lemmas2)

    if not set1 and not set2:
        return 0.0 # Avoid division by zero if both sets are empty

    intersection = len(set1.intersection(set2))
    union = len(set1.union(set2))

    return intersection / union if union != 0 else 0.0

# Apply the calculate_thematic_alignment function
df['LO_DO_alignment'] = df.apply(lambda row: calculate_thematic_alignment(row['Learning Outcomes_lemmas'], row['Deliverables_lemmas']), axis=1)
df['LO_Assessments_alignment'] = df.apply(lambda row: calculate_thematic_alignment(row['Learning Outcomes_lemmas'], row['Assessments_lemmas']), axis=1)
df['DO_Assessments_alignment'] = df.apply(lambda row: calculate_thematic_alignment(row['Deliverables_lemmas'], row['Assessments_lemmas']), axis=1)

# Display the first few rows with the new columns
display(df.head())

Unnamed: 0,Learning Outcomes,Deliverables,Assessments,Source_File,Course_Code,Learning Outcomes_tokens,Learning Outcomes_lemmas,Learning Outcomes_pos_tags,Learning Outcomes_verbs,Learning Outcomes_verb_lemmas,...,Assessments_verb_lemmas,Learning Outcomes_blooms_levels,Deliverables_blooms_levels,Assessments_blooms_levels,Learning Outcomes_sentiment,Deliverables Outcomes_sentiment,Assessments_sentiment,LO_DO_alignment,LO_Assessments_alignment,DO_Assessments_alignment
0,Understand the overall class requirements and ...,"Accounts created in LinkedIn, SkillsBuild\nVie...",Registration in online courses sites\nActive p...,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Understand, the, overall, class, requirements...","[understand, the, overall, class, requirement,...","[VERB, DET, ADJ, NOUN, NOUN, CCONJ, NOUN, ADP,...",[Understand],[understand],...,[],[0],[0],[],0.0,0.0,-0.133333,0.045455,0.047619,0.117647
1,Discuss cloud native security including Kubern...,Cloud Native Security Exercise,Activity/Exercise,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Discuss, cloud, native, security, including, ...","[discuss, cloud, native, security, include, Ku...","[VERB, ADJ, ADJ, NOUN, VERB, PROPN, NOUN, PUNC...","[Discuss, including]","[discuss, include]",...,[],"[0, 0]",[],[],0.0,0.0,0.0,0.0,0.0,0.166667
2,Discuss hybrid data center security design con...,Hybrid Data Center Exercise,Quiz 1 - Oveview of Cloud Security and Cloud N...,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Discuss, hybrid, data, center, security, desi...","[discuss, hybrid, datum, center, security, des...","[VERB, ADJ, NOUN, NOUN, NOUN, NOUN, NOUN, PUNC...","[Discuss, including]","[discuss, include]",...,[],"[0, 0]",[],[],-0.066667,-0.1,0.0,0.0,0.074074,0.0
3,Learn the Alibaba Cloud Security,Alibaba Cloud Security Certificate,Completion of modules \nAlibaba Cloud Surveys,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Learn, the, Alibaba, Cloud, Security]","[learn, the, Alibaba, Cloud, Security]","[VERB, DET, PROPN, PROPN, PROPN]",[Learn],[learn],...,[],[0],[],[],0.0,0.0,0.0,0.5,0.2,0.222222
4,Learn Machine Learning with Generative AI,,Assessment Exam,CLDSRV2_Syllabus 2023_2024.csv,CLDSRV2,"[Learn, Machine, Learning, with, Generative, AI]","[learn, Machine, Learning, with, Generative, AI]","[VERB, PROPN, PROPN, ADP, PROPN, PROPN]",[Learn],[learn],...,[],[0],[],[],0.0,0.0,0.0,0.0,0.0,0.0


## Combine and present results

### Subtask:
Consolidate the results from spaCy, BloomBERT, and TextBlob into a structured format, showing the verb-to-taxonomy mapping, sentiment, and thematic alignment for each section.


**Reasoning**:
Select the relevant columns and create a new DataFrame to consolidate the results.



In [11]:
# Select relevant columns
relevant_columns = [
    'Learning Outcomes', 'Deliverables', 'Assessments',
    'Learning Outcomes_verbs', 'Deliverables_verbs', 'Assessments_verbs',
    'Learning Outcomes_blooms_levels', 'Deliverables_blooms_levels', 'Assessments_blooms_levels',
    'Learning Outcomes_sentiment', 'Deliverables Outcomes_sentiment', 'Assessments_sentiment',
    'LO_DO_alignment', 'LO_Assessments_alignment', 'DO_Assessments_alignment'
]

consolidated_df = df[relevant_columns].copy()

# Display the first few rows of the consolidated DataFrame
display(consolidated_df.head())

Unnamed: 0,Learning Outcomes,Deliverables,Assessments,Learning Outcomes_verbs,Deliverables_verbs,Assessments_verbs,Learning Outcomes_blooms_levels,Deliverables_blooms_levels,Assessments_blooms_levels,Learning Outcomes_sentiment,Deliverables Outcomes_sentiment,Assessments_sentiment,LO_DO_alignment,LO_Assessments_alignment,DO_Assessments_alignment
0,Understand the overall class requirements and ...,"Accounts created in LinkedIn, SkillsBuild\nVie...",Registration in online courses sites\nActive p...,[Understand],[created],[],[0],[0],[],0.0,0.0,-0.133333,0.045455,0.047619,0.117647
1,Discuss cloud native security including Kubern...,Cloud Native Security Exercise,Activity/Exercise,"[Discuss, including]",[],[],"[0, 0]",[],[],0.0,0.0,0.0,0.0,0.0,0.166667
2,Discuss hybrid data center security design con...,Hybrid Data Center Exercise,Quiz 1 - Oveview of Cloud Security and Cloud N...,"[Discuss, including]",[],[],"[0, 0]",[],[],-0.066667,-0.1,0.0,0.0,0.074074,0.0
3,Learn the Alibaba Cloud Security,Alibaba Cloud Security Certificate,Completion of modules \nAlibaba Cloud Surveys,[Learn],[],[],[0],[],[],0.0,0.0,0.0,0.5,0.2,0.222222
4,Learn Machine Learning with Generative AI,,Assessment Exam,[Learn],[],[],[0],[],[],0.0,0.0,0.0,0.0,0.0,0.0


## Summary:

### Data Analysis Key Findings

*   The analysis successfully extracted and lemmatized verbs from the 'Learning Outcomes', 'Deliverables Outcomes', and 'Assessments' sections of the syllabus data using spaCy.
*   The BloomBERT model was successfully loaded and applied to the extracted verb lemmas, assigning a predicted Bloom's taxonomy level to each section.
*   Sentiment analysis using TextBlob calculated polarity scores for the 'Learning Outcomes', 'Deliverables Outcomes', and 'Assessments' sections.
*   Thematic alignment between the sections ('Learning Outcomes' vs 'Deliverables Outcomes', 'Learning Outcomes' vs 'Assessments', and 'Deliverables Outcomes' vs 'Assessments') was quantified using the Jaccard similarity of the lemmatized text.
*   The final consolidated DataFrame contains the original text, extracted verbs, assigned Bloom's levels, sentiment scores, and thematic alignment scores for each syllabus entry.

### Insights or Next Steps

*   Analyze the distribution of Bloom's levels across different sections to understand the cognitive demands placed on students by the syllabus.
*   Investigate correlations between sentiment scores, thematic alignment scores, and potentially other metrics (e.g., course difficulty, student performance) if such data were available.
