# Overview

The aim: given a .csv file of reddit posts related to grading in engineering we want to extract the following information

* What are student perceptions of grades, particularly different grading practices?
* What is the spectrum of grading practices being discussed in these reddit posts? 
    * While this is not a dataset we can use to make statements about common grading practices in engineering, it would be useful for framing the posts.

The given dataset has 670 entries with ID, number of comments, score (I assume this is karma or upvotes?), date created, date retrieved, url, title, text, and title_and_post which combines the title and text. 

Things I'd like to do in this notebook:
Generate codebooks using Ollama under multiple conditions: 
* Different prompts, perhaps comparing "you are an engineering student" vs "you are an expert in education"
* Giving the model title_and_post vs just the post
* Trying two different workflows: one where I have the model summarize the post before codebook development, and one with codebook development straight from the text

Considering the comments and score columns. These might suggest agreement with the post (like to comment ratio is a thing on Facebook but it might not translate to Reddit so grains of salt)

Runtime should also be accounted for as a metric, because a minute imporovement in codebook might not justify a jump in computational power (sustainable AI kiddos!!)

## Imports Necessary Packages
pandas is necessary for using dataframes

ollama allows for the LLM I'd like to use 

In [1]:
import pandas as pd

import ollama
from ollama import chat



In [2]:
# Import the grading data
data_df = pd.read_csv("data/EngineeringStudents_2022_Posts.csv")
data_df.head()

Unnamed: 0,id,num_comments,score,created_utc,retrieved_on,url,title,selftext,title_and_post
0,rug5n6,7,25,1641148940,1654196744,https://www.reddit.com/r/EngineeringStudents/c...,Your New Year's Resolutions for engineering?,Mine is to start preparing for interviews seri...,Title: Your New Year's Resolutions for enginee...
1,rur8nq,1,2,1641178949,1654196023,https://www.reddit.com/r/EngineeringStudents/c...,Grad School GPA question,Hi! I am a bit confused on the GPA requirement...,Title: Grad School GPA question. Post: Hi! I a...
2,ruueii,8,43,1641188900,1654195819,https://www.reddit.com/r/EngineeringStudents/c...,Class ran like a “business”,One of my classes next semester requires stude...,Title: Class ran like a “business”. Post: One ...
3,rvl92h,48,233,1641269541,1654194077,https://www.reddit.com/r/EngineeringStudents/c...,Does anybody know how to turn there brains off...,Im having trouble trying to take a break and I...,Title: Does anybody know how to turn there bra...
4,rvrv9g,85,876,1641294290,1654193669,https://www.reddit.com/r/EngineeringStudents/c...,Does anyone get that heart dropping feeling af...,"Not everyone learns at the same pace, and sure...",Title: Does anyone get that heart dropping fee...


# Prime Ollama
We have two potential prompts to work from, either priming the model as a potential engineering student or as an educational researcher since we're interested in seeing how these may impact the themes given.

In [3]:
# Prompt for a model mimicking a student
student_role = """
You are a prospective student who is interested in pursuing engineering. You want to understand the experiences of current students in the program.
"""
student_task = """ 
Given the following posts online from current students, summarize key themes. 
"""
# Prompt for a model mimicking an education researcher
researcher_role = """
You are an education researcher who is analyzing online discussions about grading in engineering programs.
You want to understand the perspectives of students on grading practices and their impact on learning.
"""

researcher_task = """
Given the following posts online from current students, summarize key themes.
In particular, focus on answering the following questions: 
What are student perceptions of grades?
What is the spectrum of grading practices being discussed in these posts? 
"""

We also have two different workflows to work with, having summary points generated or having the whole post analyzed. The following function will summarize each post (which could be the title_and_post or just text from the data). 

In [4]:
def summarize_post(post, role):
    # Create a chat session with the model
    prompt = role + f"""
    You are given the task of summarizing the following post into a few sentences:\n \
    {post}
    """
    response = ollama.chat(model="llama3.1:8b", messages=[{"role": "user", "content": prompt}])
    
    return response['message']['content']

# Working with just text column

## Student

In [None]:
summarized_posts = []
for post in data_df['selftext']:
    summary = summarize_post(post, student_role)
    summarized_posts.append(summary)

In [None]:
# Run the model on the posts without summaries

In [None]:
# Run the model on the posts with summaries

## Researcher

In [None]:
# Summarize the posts with the researcher role

In [None]:
# Run the model on the posts without summaries

In [None]:
# Run the model on the posts with summaries

# Working with title_and_post

## Student

In [None]:
# Summarize the posts with the student role

# Run the model on the posts without summaries

# Run the model on the posts with summaries

## Researcher

In [None]:
# Summarize the posts with the researcher role

# Run the model on the posts without summaries

# Run the model on the posts with summaries