# Using Large Language Models for Data Collection in Social Sciences

In [29]:
# Install and load required packages
install.packages(c("ellmer", "patchwork", "irr"))
library(tidyverse)
library(ellmer)
library(patchwork)
library(irr)

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



## Gabrielle Martins van Jaarsveld's SoDa fellowship dataset

Feel free to use your own data. By default, we will use a toy dataset from Gabrielle Martins van Jaarsveld's SoDa fellowship project on annotating markers of self-regulated learning from student conversation data.

This dataset contains the following columns:

- `id`: The id of the row/conversation/student.

- `conversation`: The text of the conversations based on which specificity scores are derived (by humans or LLMs).

- `score_specificity_llm`: The specificity score of a conversation based on carefully prompted response from LLMs. It varies between 0, 1 and 2.

- `score_specificity_human`: The specificity score of a conversation based on human expert annotators. It is treated as gold standard (i.e., free from measurement error). It varies between 0, 1 and 2.

- `performance`: The academic performance of a student, varying from 1 to 10.

Load the data into a dataframe.

In [35]:
# url where you can download our example data
data_url <- "https://sodascience.github.io/workshop_llm_data_collection/data/srl_data_example.csv"

# Read CSV into dataframe
df <- read_csv(data_url)

[1mRows: [22m[34m80[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (2): id, conversation
[32mdbl[39m (3): score_specificity_llm, score_specificity_human, performance

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Note that only the first 10 rows contain the text of the conversations. We will use these texts for the prompting experiments to come.

Display the first 10 rows of the dataset.

In [36]:
head(df, 10)

id,conversation,score_specificity_llm,score_specificity_human,performance
<chr>,<chr>,<dbl>,<dbl>,<dbl>
request_1,"PROMPT: Set an academic goal for the upcoming week. ANSWER: I would like to catch up on my geography reading PROMPT: Add details to make your goal more specific. ANSWER: I need to either read the book from last week and this week, or read my friends notes on the reading to take notes of my own so I dont fall behind. PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: by the number of pages I write per day PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: It is important to achieve because if I dont, I will fall behind and most likely wont be ready for the exam. PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: 1. evaluate how much there is to do 2. get help from my friends 3. takes notes day by day",1,1,3.5
request_2,"PROMPT: Set an academic goal for the upcoming week. ANSWER: I would like write notes on theme 1 from my mathematics course. PROMPT: Add details to make your goal more specific. ANSWER: I will take notes on Chapter 1 from the Basic Algebra book and the two articles that belonged to theme 1 PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: I will split it into three parts. First the book chapter, then the first article and lastly the second article. When there is well written notes that sums up the important parts from the reading i will have achieved my goal. PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: By achieving this goal I will remember and understand the material better, which will make it easier so study for the upcoming exam. I will not feel as overwhelmed as for earlier exam because I have made good notes that I can go through instead of going through all the materials. PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: 1. Go to a nice cafe where I can study uninterrupted 2. Read through the highlighted parts of the book chapter and simultaneously write down notes for it. 3. Read through the highlighted parts of the first article and simultaneously write down notes for it. 4. Read through the highlighted parts of the second article and simultaneously write down notes for it. 5. Red over all the notes and check that it makes sense and give a good summary of the important parts of the material.",2,2,6.8
request_3,"PROMPT: Set an academic goal for the upcoming week. ANSWER: To not procastinate PROMPT: Add details to make your goal more specific. ANSWER: To not spend a lot of time on my phone, and leave a lot of the reading till the last minute, and not late into the night PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: By finishing work at a certain time of the day PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: To help me strategically work, and study smart PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: 1. I will reduce my phone time 2. I will organize my workload 3. I will spread my workload evenly across the week",0,0,6.7
request_4,"PROMPT: Set an academic goal for the upcoming week. ANSWER: I would like to start studying earlier (right after class). PROMPT: Add details to make your goal more specific. ANSWER: Once I finish class, and after going to the gym, I will try to make a start to the reading for next class. I want to do this because I will at least have made some progress before going home and relaxing. PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: I want to start small, so I should read minimum a quarter of the readings for next class before going home. If I go home before reading, I will have failed my goal. PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: This will help me because I tend to procrastinate everything till the last minute. By making a start early, I will have more time for other things and feel more motivated to continue reading later on. It will feel less daunting when I have already read something. PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: 1. After class, find a quiet spot on campus. 2. Read at least one quarter of the readings. 3. Profit.",1,1,7.3
request_5,"PROMPT: Set an academic goal for the upcoming week. ANSWER: I want to watch the first lecture of history. PROMPT: Add details to make your goal more specific. ANSWER: I want to watch the lecture, make notes while listening and make the exercises from that week PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: Evaluating moments? Lets do Friday an evaluating moment PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: I have in 2 months the resits of this. I really want to get my degree next year, so I want to pass this exam. PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: 1. Watch the lectures, also taking notes 2. Make the exercises of that week 3. Check my answers",1,1,5.8
request_6,"PROMPT: Set an academic goal for the upcoming week. ANSWER: Doing the reading and taking notes before class PROMPT: Add details to make your goal more specific. ANSWER: I would like to read the given pages before my classes, which are on Mondays, Wednesdays, and Thursdays. Each class requires us to have read certain pages and articles, and then discuss in class. However, I have troubles which time management, which is why I never get the reading done in time and then cram before exams PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: I will measure my progress by examinig the notes I have prepared for every class, and we can measure if Ive achieved it by maybe asking some questions about the literature? PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: Because cramming all the material before exams has had a severe impact on my grades, and Im trying to fix my average with the upcoming exams, so I need to ensure that my method will give me best results PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: Start every preperation with a video explaining the concept briefly, then read the literature while taking notes by hand. Have all this done latest a day before the lesson.",1,1,6.4
request_7,"PROMPT: Set an academic goal for the upcoming week. ANSWER: I want to understand and summarize this weeks literature for my Philosophy course. PROMPT: Add details to make your goal more specific. ANSWER: This week, I have to read several pages of academic literature regarding the Philosophy course. We have a weekly in-class meeting in which we discuss the contents of the literature. My concrete goal is to have read and summarized all of the literature of this week before the tutorial meeting. PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: The literature consists of multiple pagesorchapters, I can measure the number of pages I have processed as well as the number I still need to read. PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: By keeping up with the weekly literature, I am spreading the workload across this Block, which I know is an effective learning method (especially compared to cramping in lots of information around the exam). PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: At the start of the week, I will make a planning for every weekday, specifying what parts of the literature I will read and summarize on what moment. I will check this planning at the start of the day and in-between tasks.",2,1,5.9
request_8,"PROMPT: Set an academic goal for the upcoming week. ANSWER: I want to get my motivation back to study PROMPT: Add details to make your goal more specific. ANSWER: Be happier while studying, focusing on the reason again why I chose this study, feeling enthusiastic and motivated when I have to study, be more efficient in planning, be more assertive during classes and contacting supervisors, feeling more relaxed while studying and less stressed out PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: Measure by subjective feelings, so how am I feeling that day when studying? Also measure by productivity, how much can I get done in that period of time and is that different from before? Measure by the amount of work Im planning each week on my studies. But I dont know how to get more assertive PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: Because studying has always been an important part of my life for me and because I wanted to keep studying I feel the pressure of succeeding. Right now Im not enjoying my study like I used to and am just looking forward to my master (if I find an internship) so I can go to work. But learning will always be an important part of my work so I need to be motivated to go back to school and keep up with scientific literature. If I want to do healthcare psychology I need to keep this pace up. PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: - Planning my studyhours everyday before studying - Planning a period of time in the morning to set up my studywork, making a plan on how to work on my studyload that day - Do the work - Take regular breaks, every 50 minutes. Use 10 minutes to relax, read, play a game, watch Netflix for a bit - At the end of the studyblock, write down any feelings about how it went - Reflect on the amount of work done - Reflect on the planning this week overall",0,0,7.6
request_9,"PROMPT: Set an academic goal for the upcoming week. ANSWER: I would like to keep on track with the readings I have to do PROMPT: Add details to make your goal more specific. ANSWER: Every week we have two tutorials, which are smaller groups, where we have a debate about the readings we have done before. Each week for both of them we have more than 100 pages to read and we need to arrive prepared at the tutorial. So I would like to manage better my time in order to arrive prepared at every tutorial with all the readings well done PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: I can decide how many pages I have to study each day PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: I think is good because it will help me in manage my time better, but also i will have more time for myself, for working, hanging out with friends. I also think is fundamental to learn how to manage many things all together at the same time now at university, because later on it will be even worse PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: At first I will count all the pages i have to do, so that i know exactly how much work is, after I will take the calendar and i will divide the work for every day, writing precisely how many pages i should do each day. After that, every day i will try to stick to the plan as much as possible and maybe to motivate myself, I will decide a reward that I can have just if i manage to do everything that was written in the plan of the day",1,1,7.6
request_10,"PROMPT: Set an academic goal for the upcoming week. ANSWER: Reading all the literature given PROMPT: Add details to make your goal more specific. ANSWER: Each week I am given 8 articles to read. We cover 2 topics and half the articles about the first topic and half are about the second topic. Usually I am not able to finish reading all of the articles before the class while it is necessary to be able to participate well. So Id like to read all the articles before class. PROMPT: How will you measure progress on and acheivement of your goal? ANSWER: Count how many articles I read before class. PROMPT: Why is this goal important to you in the context of your prior experiences and future goals? ANSWER: By completing this goal, I will be better set up for passing the course exam. PROMPT: Create a step-by-step plan for achieving this goal in the coming week. ANSWER: 1: make a schedule of the days I will read each article 2: make sure there are no distractions",2,2,7.0


## Using ellmer to call OpenAI's API

We will be using the R package `ellmer` to perform our prompting experiments. One great advantage of using `ellmer` is that it takes away the trouble of having to learn different LLM APIs. Instead, it allows you to call different LLM APIs (both commercial and open-source) effortlessly (relatively speaking) with very simple modifications of your `ellmer` code!

We will be calling OpenAI's LLM in this notebook. Feel free to experiment with other APIs and models! To do so, check out https://ellmer.tidyverse.org/.

Let's first configure your OpenAI API key. Enter when being prompted. Don't have one? Ask the workshop instructors!

In [37]:
# Prompt user for API key
openai_api_key <- readline(prompt = "Enter API key for OpenAI: ")
Sys.setenv(OPENAI_API_KEY = openai_api_key)

Enter API key for OpenAI: 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


Let's now define a function that makes a call to the OpenAI API for gpt-4o-mini.

In [38]:
call_openai_api <- function(system_prompt, user_prompt) {
  chat <- chat_openai(
    model = "gpt-4o-mini",
    system_prompt = system_prompt,
    seed = 42,
    api_args = list(temperature = 0, max_tokens = 1000),
    echo = "none" #suppress the output from being printed
  )
  return(chat$chat(user_prompt))
}

## Working with a single prompt

Let's start with the system prompt (i.e., high-level instruction to the model).

In [39]:
system_prompt <- "
You are an expert in educational assessment and goal evaluation, with
specialized expertise in applying deductive coding schemes to score the quality
and content of student goals.

##TASK##
A university student was given a series of prompts, guiding them through the
process of setting and elaborating on an academic goal for the coming week. You
will be provided with the entire conversation including the prompts, and the
student answers. Your objective is to assess the specificity of of the student’s
goal on a scale of 0 to 2 based on the entire conversation.
"


Create a prompt request with the system prompt and the user prompt based on the first conversation from the dataset.

In [40]:
response <- call_openai_api(
  system_prompt = system_prompt,
  user_prompt   = df[["conversation"]][1]
)
cat(response)

To assess the specificity of the student's goal, we can evaluate each part of their responses based on clarity, detail, and measurability.

1. **Initial Goal**: "I would like to catch up on my geography reading."
   - This goal is quite vague and lacks specificity regarding what "catching up" entails. It does not specify which readings or how much needs to be done. **Score: 0**

2. **Details Added**: "I need to either read the book from last week and this week, or read my friends notes on the reading to take notes of my own so I don't fall behind."
   - This response adds more detail by specifying the materials (the book and friends' notes) and the action (taking notes). However, it still lacks a clear plan on how much reading or note-taking is required. **Score: 1**

3. **Measuring Progress**: "by the number of pages I write per day."
   - This provides a measurable aspect to the goal, which is a positive addition. However, it does not specify a target number of pages or a timeline fo

Voila! You have your first successful prompting interaction with the API of a large language model!

## Working with multiple prompts
Next, we are going beyond a single prompt. Instead, we will work with **multiple prompts** at the same time!

Define a list of request IDs and another list of conversations (necessary for forming the user prompts)

In [41]:
df |> slice(1) |> pull("conversation")

Run a for loop through all the 10 conversations.

In [42]:
# initialize output list
responses <- list()

for (i in 1:10) {
  # extract current info
  cur_convo <- df |> slice(i) |> pull("conversation")
  cur_id    <- df |> slice(i) |> pull("id")

  # get rating from llm
  response <- call_openai_api(
    system_prompt = system_prompt,
    user_prompt   = cur_convo
  )

  # assign to output list
  responses[[cur_id]] <- response

  # report progress (does not work well in colab)
  cat(i, "/ 10 completed.\n")
}

1 / 10 completed.
2 / 10 completed.
3 / 10 completed.
4 / 10 completed.
5 / 10 completed.
6 / 10 completed.
7 / 10 completed.
8 / 10 completed.
9 / 10 completed.
10 / 10 completed.


Let's inspect the responses!

In [43]:
cat(responses[[2]])

To assess the specificity of the student's goal, we can evaluate the clarity and detail provided in their responses throughout the conversation.

1. **Initial Goal**: The student starts with a general goal of writing notes on theme 1 from their mathematics course. This is quite vague and lacks specificity, earning a score of 0.

2. **Adding Details**: The student then specifies that they will take notes on Chapter 1 from the Basic Algebra book and two articles related to theme 1. This adds clarity and context to the goal, moving it to a score of 1.

3. **Measuring Progress**: The student outlines a clear plan to split the task into three parts: the book chapter and the two articles. They also define what constitutes achievement (well-written notes summarizing important parts). This further enhances the specificity of the goal, justifying a score of 2.

4. **Importance of the Goal**: The student explains the significance of the goal in relation to their understanding of the material and

## Using structured output with a single prompt

To force the LLM to produce outputs in formats specified by you, you need to use the `$extract_data()` method instead of the `$chat` method.

Below, we define our desired output format as:
- "reasoning": a string that provides the model's reasoning.
- "specificity_score": an integer (either 0, 1 or 2) reflecting the specificity of a conversation.

In [44]:
output_structure <- type_object(
  specificity_score = type_integer("The specificity score of the entire conversation on a scale of 0, 1 and 2."),
  reasoning = type_string("Your reasoning process.")
)

call_openai_api_structured <- function(system_prompt, user_prompt) {
  chat <- chat_openai(
    model = "gpt-4o-mini",
    system_prompt = system_prompt,
    seed = 42,
    api_args = list(temperature = 0, max_tokens = 1000),
    echo = "none" #suppress the output from being printed
  )
  response <- chat$extract_data(user_prompt, type = output_structure)
  return(response)
}

Try with a single prompt request.

In [45]:
structured_response = call_openai_api_structured(system_prompt, df[["conversation"]][1])
print(structured_response)

$specificity_score
[1] 1

$reasoning
[1] "The student's goal of catching up on geography reading is somewhat specific, as they mention reading the book from the previous week and this week or using a friend's notes. However, it lacks precise details such as the exact pages or chapters to read, which would enhance specificity. The measurement of progress by the number of pages written per day is a good indicator, but the overall goal still remains vague. The importance of the goal is clear, and the step-by-step plan provides a basic structure, but it could benefit from more detailed actions and timelines."



## Using structured output with muitiple prompts

Being able to work with multiple prompts at the same time and obtain structured output will save you a substantial amount of time in research projects!

In [46]:
structured_responses <- list()
for (i in 1:10) {
  # extract current info
  cur_convo <- df |> slice(i) |> pull("conversation")
  cur_id    <- df |> slice(i) |> pull("id")

  # get rating from llm
  response <- call_openai_api_structured(
    system_prompt = system_prompt,
    user_prompt   = cur_convo
  )

  # assign to output list
  structured_responses[[cur_id]] <- response

  # report progress (does not work well in colab)
  cat(i, "/ 10 completed.\n")
}

1 / 10 completed.
2 / 10 completed.
3 / 10 completed.
4 / 10 completed.
5 / 10 completed.
6 / 10 completed.
7 / 10 completed.
8 / 10 completed.
9 / 10 completed.
10 / 10 completed.


Turn the structured responses into a data frame and show it.

In [47]:
# use map_dfr from purrr package to turn list into dataframe
response_df <- map_dfr(structured_responses, I, .id = "id")
response_df

id,specificity_score,reasoning
<chr>,<int>,<chr>
request_1,1,"The student's goal of catching up on geography reading is somewhat specific, as they mention reading the book from the previous week and this week or using a friend's notes. However, the goal lacks precise details such as the exact number of pages to read or a specific timeline for completion. The measurement of progress is based on the number of pages written per day, which adds some clarity but still lacks a clear target. The step-by-step plan provides a basic structure but is not detailed enough to fully outline how they will achieve the goal. Overall, the goal is more specific than a general statement but still lacks the clarity and detail needed for a higher score."
request_2,2,"The student's goal is specific, measurable, and detailed. They clearly state the focus on Chapter 1 of the Basic Algebra book and two articles related to theme 1. The breakdown of the goal into three parts (book chapter and two articles) allows for measurable progress. The step-by-step plan provides a clear method for achieving the goal, including a conducive study environment and a systematic approach to note-taking. Overall, the goal is well-defined and actionable, warranting a score of 2."
request_3,1,"The student's goal of 'not procrastinating' is vague and lacks clear metrics for success. While they provide some details about reducing phone time and organizing workload, these actions are still somewhat general and do not specify how they will be implemented or measured. The step-by-step plan offers a bit more clarity, but it still does not provide a concrete timeline or specific targets, which limits the overall specificity of the goal."
request_4,2,"The student's goal is specific, measurable, and actionable. They clearly state that they will start studying right after class, specifically after going to the gym, and aim to read at least a quarter of the readings for the next class. The goal includes a clear measurement of success (reading a quarter of the material) and a plan with actionable steps (finding a quiet spot, reading, and the humorous 'profit' step). This level of detail indicates a well-defined goal that is easy to track and evaluate."
request_5,2,"The student's goal is specific, measurable, and actionable. They clearly state the intention to watch a specific lecture, take notes, and complete exercises, which outlines a clear plan of action. The mention of evaluating progress on Friday adds a measurable component to the goal. Additionally, the context provided about the importance of passing the exam for their degree adds relevance to the goal, further enhancing its specificity."
request_6,2,"The student's goal is specific as it outlines clear actions: reading assigned pages before class on specific days (Monday, Wednesday, Thursday), taking notes, and preparing a day in advance. The student also identifies a method for measuring progress (examining notes and asking questions) and explains the importance of the goal in relation to past experiences and future academic performance. The step-by-step plan further enhances the specificity by detailing the preparation process, indicating a well-structured approach to achieving the goal."
request_7,2,"The student's goal is specific, measurable, and actionable. They clearly state their intention to read and summarize the literature for their Philosophy course before the tutorial meeting, which provides a clear deadline. The details about the number of pages and the method of measuring progress (counting pages read) add to the specificity. Additionally, the step-by-step plan outlines a structured approach to achieving the goal, indicating a well-thought-out strategy for managing their workload throughout the week."
request_8,1,"The student's goal of 'getting motivation back to study' is quite broad and lacks specific measurable outcomes. While the student elaborates on feelings and methods to improve motivation, such as being happier and more efficient, these are still somewhat vague and subjective. The step-by-step plan provides some structure, but it does not translate the overarching goal into specific, quantifiable targets. Therefore, the goal is somewhat specific but not fully measurable or actionable, leading to a score of 1."
request_9,2,"The student's goal is specific as it outlines a clear objective: to keep on track with the readings for tutorials. The elaboration provides context about the number of pages to read and the importance of being prepared for discussions. The student also details a measurable approach by deciding how many pages to study each day, which adds to the specificity. Furthermore, the step-by-step plan includes concrete actions such as counting pages, using a calendar, and setting daily goals, which demonstrates a structured approach to achieving the goal. Overall, the goal is well-defined and actionable, warranting a score of 2."
request_10,2,"The student's goal is specific as it clearly states the intention to read all 8 articles assigned for the week, which is a measurable and achievable target. The details provided about the distribution of articles across two topics and the importance of completing the readings for class participation further enhance the specificity. The plan includes actionable steps, such as creating a reading schedule and minimizing distractions, which indicates a clear strategy for achieving the goal. Overall, the goal is well-defined and structured, warranting a score of 2."


## Check annotation quality

The `kripp.alpha()` function from the `irr` package can be used to calculate agreement (i.e., Krippendorff's Alpha) of specificity scores between two raters (e.g., LLMs and human experts).

Let's check the agreement between the specificity scores we got from the LLM above and the human expert-coded specificity scores!

In [48]:
# create rating matrix (rows = raters, cols = items)
rating_matrix <- rbind(
  df |> slice(1:10) |> pull(score_specificity_human),
  response_df |> pull(specificity_score)
)

# compute agreement (0 - 1)
kripp.alpha(rating_matrix, method = "interval")

 Krippendorff's alpha

 Subjects = 10 
   Raters = 2 
    alpha = 0.222 

Not a great agreement score!

How about the agreement between the LLM specificity scores that already came with the dataset (i.e., column `score_specificity_llm`) and the human expert-coded scores?

Note that `score_specificity_llm` is based on prompts that were carefully engineered by Gabrielle.

In [3]:
rating_matrix <- rbind(
  df |> slice(1:10) |> pull(score_specificity_human),
  df |> slice(1:10) |> pull(score_specificity_llm)
)

kripp.alpha(rating_matrix, method = "interval")

ERROR: Error in pull(slice(df, 1:10), score_specificity_human): could not find function "pull"


Wow! Much better result after some careful prompt engineering!

## Exercise: Try different prompting techniques to get better results!

For example:

1. Improve clarity & specificity
2. Role-based prompting
3. Step-by-step reasoning (Chain-of-Thought Prompting)
4. Few-shot prompting
5. Output structuring
6. Self-consistency prompting

Use the `kripp.alpha` function to check the LLM's annotation quality.

In [2]:
# Let's write some code!

#2
# url where you can download our example data
data_url <- "https://sodascience.github.io/workshop_llm_data_collection/data/srl_data_example.csv"

# Read CSV into dataframe
df <- read_csv(data_url)

#3
head(df, 10)

#4
# Prompt user for API key
#openai_api_key <- readline(prompt = "Enter API key for OpenAI: ")
#Sys.setenv(OPENAI_API_KEY = openai_api_key)

#5
call_openai_api <- function(system_prompt, user_prompt) {
  chat <- chat_openai(
    model = "gpt-4o-mini",
    system_prompt = system_prompt,
    seed = 42,
    api_args = list(temperature = 0, max_tokens = 1000),
    echo = "none" #suppress the output from being printed
  )
  return(chat$chat(user_prompt))
}

#6
system_prompt <- "
You are an expert in educational assessment and goal evaluation, with
specialized expertise in applying deductive coding schemes to score the quality
and content of student goals.

##TASK##
A university student was given a series of prompts, guiding them through the
process of setting and elaborating on an academic goal for the coming week. You
will be provided with the entire conversation including the prompts, and the
student answers. Your objective is to assess the specificity of the student’s
goal on a scale of 0 to 2 based on the entire conversation. Do it as if you were
 a human expert annotator, which is rated as the gold standard!
"

#7
response <- call_openai_api(
  system_prompt = system_prompt,
  user_prompt   = df[["conversation"]][1]
)
cat(response)

#8
df |> slice(1) |> pull("conversation")

#9
# initialize output list
responses <- list()

for (i in 1:10) {
  # extract current info
  cur_convo <- df |> slice(i) |> pull("conversation")
  cur_id    <- df |> slice(i) |> pull("id")

  # get rating from llm
  response <- call_openai_api(
    system_prompt = system_prompt,
    user_prompt   = cur_convo
  )

  # assign to output list
  responses[[cur_id]] <- response

  # report progress (does not work well in colab)
  cat(i, "/ 10 completed.\n")
}

#10
# use map_dfr from purrr package to turn list into dataframe
response_df <- map_dfr(structured_responses, I, .id = "id")
response_df

#11
# use map_dfr from purrr package to turn list into dataframe
response_df <- map_dfr(structured_responses, I, .id = "id")
response_df


#12
# create rating matrix (rows = raters, cols = items)
rating_matrix <- rbind(
  df |> slice(1:10) |> pull(score_specificity_human),
  response_df |> pull(specificity_score)
)

# compute agreement (0 - 1)
kripp.alpha(rating_matrix, method = "interval")


ERROR: Error in read_csv(data_url): could not find function "read_csv"
