<a href="https://colab.research.google.com/github/jgbrenner/psychometrics/blob/main/C9.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Generating Psychometric Scale Items Using an Open Source LLM and Conducting Exploratory Graph Analysis**



# Methodology

# Overview

In this notebook, I will utilize an open-source large language model (LLM) to generate items for a psychometric scale measuring three dimensions of perfectionism: Rigid Perfectionism, Self-Critical Perfectionism, and Narcissistic Perfectionism. Following item generation, I will perform Exploratory Graph Analysis (EGA) to validate the structure and dimensionality of the generated items.


# Steps Involved

Item Generation:
I prompted the LLM to generate six items for each construct, specifying that five should be regular-keyed and one should be reverse-keyed. The output was formatted in JSON for easy processing.
The constructs targeted are:

- Rigid Perfectionism

- Self-Critical Perfectionism

- Narcissistic Perfectionism


# Data Preparation:

The generated items were compiled into a data frame, and constructs were extracted for further analysis.
An embeddings array was created to represent the items in a high-dimensional space suitable for EGA.

# Transfer to R:

The embeddings array and constructs list were transferred from Python to R using the rpy2 interface.

# Exploratory Graph Analysis (EGA):

In R, the embeddings array was converted into a matrix format, with appropriate row names assigned based on the constructs.
The EGAnet library was utilized to perform EGA on the embeddings matrix, allowing for an exploration of the underlying structure of the psychometric items.

Results were printed and visualized to assess the dimensionality of the constructs.

# Conclusion
This methodology enables a systematic approach to generating and validating psychometric scale items, ensuring that they are both relevant and statistically sound.

In [13]:
!pip install groq



In [14]:
# Load the R magic extension for rpy2
%load_ext rpy2.ipython

import os
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive', force_remount=True)

# Define the path where the R library will be saved in Google Drive
library_path = '/content/drive/MyDrive/R_libraries'

# Create the directory if it doesn't exist
if not os.path.exists(library_path):
    os.makedirs(library_path)

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython
Mounted at /content/drive


In [15]:
#verify rpy2 version as this is known to cause errors ( ver 3.4.2 works fine )
import rpy2
print(f"rpy2 version: {rpy2.__version__}")

rpy2 version: 3.4.2


In [16]:
%%R -i library_path

# Set the custom library path to persist during the Colab session
.libPaths(library_path)

# Set CRAN repository globally
options(repos = c(CRAN = "https://cloud.r-project.org"))

# List of required R packages
required_packages <- c("gmp", "Rmpfr", "CVXR", "fungible", "EGAnet")

# Install each package only if it's not already installed
for (pkg in required_packages) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
        install.packages(pkg, lib = library_path, dependencies = TRUE)
    }
}

# Load EGAnet library
library(EGAnet)


In [17]:
# Import necessary Python libraries
import openai
import pandas as pd
import numpy as np
import json
import requests
import re


from google.colab import userdata

#fetching the OpenAI API key from secrets
openai_api_key = userdata.get('OPENAI_API_KEY')

from groq import Groq

#fetching the Groq API key from secrets
groq_api_key = userdata.get('GROQ_API_KEY')

# Initialize the Groq client
client = Groq(api_key=groq_api_key)

# Prompting is everything.

 Using the llama3-8b-8192 model  with this particualr prompt I was able to get the results needed. For other models the prompt might need to be modified, as all models are non-deterministic.

In [18]:
# Define the messages for the LLM
messages = [
    {
        "role": "system",
        "content": "You are an expert psychometrician creating test items."
    },
    {
        "role": "user",
        "content": (
            "Please generate six items for each of the following constructs: "
            "Rigid Perfectionism, Self-Critical Perfectionism, and Narcissistic Perfectionism. "
            "For each construct, provide five regular-keyed and one reverse-keyed item. "
            "Make them concise and clear. "
            "Provide the output ONLY in JSON format as a list of dictionaries, "
            "without any additional text or explanation. "
            "Each dictionary should have keys 'construct', 'item', and 'type' (either 'regular-keyed' or 'reverse-keyed')."
        )
    }
]

try:
    # Create the completion request
    completion = client.chat.completions.create(
        model="llama3-8b-8192", # using opensource model for replication purposes
        messages=messages,
        temperature=1,
        max_tokens=2048,
        top_p=1,
        stream=False,  # Disable streaming for simplicity
        stop=None,
    )

    # Collect the response content
    response_content = completion.choices[0].message.content
    print("\nLLM Output:\n", response_content)

except Exception as e:
    print(f"An error occurred: {e}")
    response_content = ""



LLM Output:
 [
  {
    "construct": "Rigid Perfectionism",
    "item": "I have high standards for myself and expect to meet them all the time.",
    "type": "regular-keyed"
  },
  {
    "construct": "Rigid Perfectionism",
    "item": "I always strive to achieve perfection, no matter what the cost.",
    "type": "regular-keyed"
  },
  {
    "construct": "Rigid Perfectionism",
    "item": "I set very specific and rigorous goals for myself.",
    "type": "regular-keyed"
  },
  {
    "construct": "Rigid Perfectionism",
    "item": "I am not comfortable with imperfect or incomplete work.",
    "type": "regular-keyed"
  },
  {
    "construct": "Rigid Perfectionism",
    "item": "I often feel anxious or stressed when I don't meet my high standards.",
    "type": "regular-keyed"
  },
  {
    "construct": "Rigid Perfectionism",
    "item": "I don't tolerate mistakes well.",
    "type": "reverse-keyed"
  },
  {
    "construct": "Self-Critical Perfectionism",
    "item": "I often criticize mysel

In [19]:
# Parse the JSON response
try:
    generated_items = json.loads(response_content)
    items_df = pd.DataFrame(generated_items)
except json.JSONDecodeError as e:
    print(f"JSON decoding failed: {e}")
    # Attempt to extract JSON content from the response
    json_match = re.search(r'\[.*\]', response_content, re.DOTALL)
    if json_match:
        json_str = json_match.group(0)
        try:
            generated_items = json.loads(json_str)
            items_df = pd.DataFrame(generated_items)
        except json.JSONDecodeError as e2:
            print(f"Second JSON decoding attempt failed: {e2}")
            items_df = None
    else:
        print("No JSON content found in the response.")
        items_df = None

# Check if items_df is defined
if items_df is not None:
    # Save to CSV (optional)
    items_df.to_csv("perfectionism_items.csv", index=False)
    print("Items saved to 'perfectionism_items.csv'.")

    # Prepare the list of items
    item_texts = items_df["item"].tolist()

    # Ensure item_texts is clean and valid
    item_texts = [item for item in item_texts if item.strip()]  # Filter empty strings

    # Set up the API endpoint and headers for OpenAI embeddings
    embedding_endpoint = "https://api.openai.com/v1/embeddings"
    embedding_model = "text-embedding-3-small"  # Using latest optimized model

    headers = {
        "Authorization": f"Bearer {openai_api_key}",
        "Content-Type": "application/json"
    }

    # Prepare the data payload
    data = {
        "model": embedding_model,
        "input": item_texts
    }

    # Make the API request
    try:
        response = requests.post(embedding_endpoint, headers=headers, json=data)

        # Check if the request was successful
        if response.status_code != 200:
            print(f"Request failed with status code {response.status_code}: {response.text}")
        else:
            response_data = response.json()
            # Extract embeddings
            embeddings = [item['embedding'] for item in response_data['data']]
            embeddings_array = np.array(embeddings)
            print(f"Embeddings generated successfully. Shape: {embeddings_array.shape}")
            # Save the embeddings
            np.save("embeddings.npy", embeddings_array)
            print("Embeddings saved as 'embeddings.npy'.")
    except Exception as e:
        print(f"An error occurred during embedding generation: {e}")
else:
    print("Error: items_df is not defined. Cannot proceed with embedding generation.")


Items saved to 'perfectionism_items.csv'.
Embeddings generated successfully. Shape: (19, 1536)
Embeddings saved as 'embeddings.npy'.


In [21]:
# Prepare constructs list
items_constructs = items_df['construct'].tolist()

# Transfer data to R
%R -i embeddings_array -i items_constructs

In [None]:
%%R
# Convert embeddings_array to a matrix
embeddings_matrix <- as.matrix(embeddings_array)

# Optionally, assign row names to the matrix for clarity
rownames(embeddings_matrix) <- items_constructs

# Load necessary libraries (EGAnet already loaded)
library(EGAnet)

# Perform EGA analysis
ega_result <- EGA(embeddings_matrix)
print(ega_result)

# Plot the results
plot(ega_result)