Other thoughts:
- Comparing same axes from different BERT variants
- Researching embeddings data structure/shape (linear, linear truncated, circular, rectangular?)

### Setup DF from CSV

In [22]:
# !pip install --upgrade accelerate transformers

In [23]:
# install OpenAI api
# !pip install --upgrade openai

In [24]:
import pandas as pd
import time
import openai
import json

In [25]:
# load api key from secrets.json
try:
    with open("secrets.json") as f:
        secrets = json.load(f)
    my_api_key = secrets["openai"]
    print("API key loaded.")
    openai.api_key = my_api_key
except FileNotFoundError:
    print("Secrets file not found. YOU NEED THEM TO RUN THIS.")

API key loaded.


In [26]:
# Open training_embeddings.csv and load it into a dataframe
df = pd.read_csv('training_embeddings.csv')
df.head()

Unnamed: 0,word,0,1,2,3,4,5,6,7,8,...,758,759,760,761,762,763,764,765,766,767
0,companion,0.128,-0.082,0.109,-0.012,0.317,0.262,-0.016,0.128,0.357,...,0.395,-0.107,-0.026,-0.061,0.358,-0.138,-0.067,0.236,0.385,-0.115
1,toast,0.209,0.411,0.026,-0.0,-0.243,-0.325,0.148,0.238,-0.12,...,0.373,-0.174,0.119,-0.036,0.445,-0.029,0.145,0.169,0.364,-0.097
2,lounge,0.759,-0.116,0.116,0.113,0.46,-0.212,0.115,-0.032,0.137,...,0.259,0.07,-0.059,-0.114,0.441,-0.096,-0.074,0.196,0.215,-0.254
3,watch,0.401,-0.003,-0.061,-0.406,0.933,-0.14,-0.186,0.286,-0.17,...,0.459,-0.067,-0.266,-0.318,0.173,-0.109,0.219,-0.01,0.404,-0.161
4,haul,0.58,0.031,0.311,0.048,-0.102,0.081,0.047,0.423,-0.187,...,0.505,0.096,0.095,-0.009,0.453,0.003,0.337,0.259,0.064,0.092


In [27]:
# get slice of dataframe after first column
# get min and max values across all columns
df.iloc[:,1:].min().min(), df.iloc[:,1:].max().max()

(-5.21, 2.042)

In [28]:
# apply maximum and minimum truncation to dataframe
# this does not fundamentally change the data because it follows a normal distribution from -1 to 1
df.iloc[:,1:] = df.iloc[:,1:].clip(-1,1)

# get min and max values across all columns
df.iloc[:,1:].min().min(), df.iloc[:,1:].max().max()

(-1.0, 1.0)

In [29]:
df.head()

Unnamed: 0,word,0,1,2,3,4,5,6,7,8,...,758,759,760,761,762,763,764,765,766,767
0,companion,0.128,-0.082,0.109,-0.012,0.317,0.262,-0.016,0.128,0.357,...,0.395,-0.107,-0.026,-0.061,0.358,-0.138,-0.067,0.236,0.385,-0.115
1,toast,0.209,0.411,0.026,-0.0,-0.243,-0.325,0.148,0.238,-0.12,...,0.373,-0.174,0.119,-0.036,0.445,-0.029,0.145,0.169,0.364,-0.097
2,lounge,0.759,-0.116,0.116,0.113,0.46,-0.212,0.115,-0.032,0.137,...,0.259,0.07,-0.059,-0.114,0.441,-0.096,-0.074,0.196,0.215,-0.254
3,watch,0.401,-0.003,-0.061,-0.406,0.933,-0.14,-0.186,0.286,-0.17,...,0.459,-0.067,-0.266,-0.318,0.173,-0.109,0.219,-0.01,0.404,-0.161
4,haul,0.58,0.031,0.311,0.048,-0.102,0.081,0.047,0.423,-0.187,...,0.505,0.096,0.095,-0.009,0.453,0.003,0.337,0.259,0.064,0.092


### Define functions for auto interpretation and evaluation

In [30]:
def get_x_embedding(df, x):
  """
  Extract the x column from the dataframe and store it as a list
  """
  # Extract the word and x column from the dataframe and store them as a list of lists
  # index = 0
  words = df['word'].tolist()
  emd_index = df[str(x)].tolist()
  # data = [[words[i], emd_index[i]] for i in range(len(words))]
  # print(data[:5])

  # Concatenate the two lists into a list of lists
  data = [words,emd_index]
  return data

In [31]:
def get_axis_interp(model_num, axis_emb):
  """
  Get the interpretation with the highest confidence score for the given axis embedding
  Get the positive and negative words for the interpretation with the highest confidence score
  """

  run_interp = True

  while run_interp:

    completion = openai.ChatCompletion.create(
      model=model_num,
      messages=[
        {"role": "system", "content": "You are an expert transformer embeddings labeller."},
        {"role": "user", "content": f"Below are two lists. The first list contains words that have been put into DistilBERT. DistilBERT creates an embedding with 768 dimensions or axes. The second list contains the embedding value from DistilBERT for one axis across the words. By carefully comparing and considering the embedding values for each word, please interpret the likely linguistic binary feature that this embedding axis encodes. This binary interpretation must be consistent across all the words and must be expressed as 'x vs y', where 'x' relates to words with positive embedding values and 'y' relates to words with negative embedding values. Words that relate to neither 'x' nor 'y' will have embedding values close to 0. \n\n  The output must be a Python dictionary with five items, the key is a string containing the possible binary interpretation of the axis. The value is a float, representing the confidence score from 0 to 1. \n\n The output must be the dictionary only, which can be eval-ed into code. Here is the output format: {{<first x vs first y>:<first interpretation confidence score>, <second x vs second y>:<second interpretation confidence score>, <third x vs third y>:<third interpretation confidence score>, <fourth x vs fourth y>:<fourth interpretation confidence score>, <fifth x vs fifth y>:<fifth interpretation confidence score>}} \n\n {axis_emb[0]}\n\n {axis_emb[1]}"}
      ]
    )

    print(completion.choices[0].message)

    # log the stringified output into a txt file by appending it to the end of the file
    with open("auto_output.txt", "a") as f:
      f.write(str(completion))

    # convert the output string into a dictionary
    try:
      interp_dict = eval(completion.choices[0].message.content)
      print(interp_dict)

      # convert the dictionary keys into a list
      interp_keys = list(interp_dict.keys())

      

      # get the index of the key with the highest value
      interp_values = list(interp_dict.values())
      max_interp_index = interp_values.index(max(interp_values))
      print(max_interp_index)

      # get the key with the highest value
      max_interp_key = interp_keys[max_interp_index]
      print(max_interp_key)

      # split the key into two words using "vs" as the delimiter
      pos_neg = max_interp_key.split(" vs ")
      print(pos_neg)

      run_interp = False

    except:
      print("Error with interpreting axis. Trying again in 10 seconds...")
      time.sleep(10)
      continue
  
  with open("only_final_output.txt", "a") as f:
    f.write("\nChosen interpretation: " + str(max_interp_key) + "\n")
    f.write("Confidence score: " + str(max(interp_values)) + "\n")
  return max_interp_key, pos_neg

In [32]:
def get_score_list(model_num, axis_emb, max_interp_key, pos_neg):
  """
  Get the score list for the given axis embedding and interpretation
  """

  score_list = []
  step = 30

  for i in range((len(axis_emb[0])// step )+ 1):
    # test to make sure the list is being truncated correctly
    # print(i*step, (i+1)*step)
    trunc_word_list = axis_emb[0][i*step:(i+1)*step]
    # print(trunc_word_list)
    # print(len(trunc_word_list))

    is_length_correct = False

    while (is_length_correct == False):
      print("\n\nStep ", i + 1, " of ", (len(axis_emb[0])// step )+ 1)
      print(i+1, " Input length: ", len(trunc_word_list))

      completion = openai.ChatCompletion.create(
        model=model_num,
        messages=[
          {"role": "system", "content": "You are an expert word sense scorer."},
          {"role": "user", "content": f"For the list of words below, please assign it a score according to how much it relates to the following criteria: '{max_interp_key}' \n\n  The output must be a Python list of scores for each corresponding word in the provided list. The output must therefore have {len(trunc_word_list)} items, the same length as the provided list. The score is a float that ranges from -1 to 1. Positive scores suggest a strong relationship with the positive criterion, '{pos_neg[0]}', while negative scores suggest a strong relationship with the negative criterion, '{pos_neg[1]}'. Scores close to 0 suggest that the word is not related to both the positive and negative criteria. \n\n Here is an output sample: [<score for first word>, <score for second word>, ... , <score for second-last word>, <score for last word>] \n\n {trunc_word_list}"}
          # {"role": "user", "content": f"For the list of words below, please assign it a score according to how much it relates to the following criteria: {interp_keys[0]}  \n\n  The output must be a Python list of scores for each corresponding word in the provided list. The output must therefore have the same number of items as the provided list. The score is a float that ranges from -1 to 1. Positive scores suggest a high correlation to the criteria, while negative scores suggest a high opposite correlation. Scores closer to 0 suggest that the criterion is not applicable to the word. \n\n Here is an output sample: [<score for first word>, <score for second word>, ... , <score for second-last word>, <score for last word>] \n\n {axis_emb[0]}"}
        ]
      )

      # print(completion.choices[0].message)

      # log the stringified output into a txt file by appending it to the end of the file
      with open("auto_output.txt", "a") as f:
        f.write(str(completion))

      # convert the output string into a list
      try:
        scores = eval(completion.choices[0].message.content)
      except:
        print(i+1, "Error: ", completion.choices[0].message.content)
        print("Trying again in 10 seconds...")
        time.sleep(10)
        continue
      print(scores)

      # check if the length is correct
      print(i+1, "Output length: ", len(scores))
      print(i+1, "Are input output lengths the same? " ,len(scores) == len(trunc_word_list))

      if len(scores) == len(trunc_word_list):
        is_length_correct = True
      else:
        print("Input output lengths are not the same. Trying again...")
        continue

      # concatenate scores with score_list
      score_list += scores

      # giving it more time – does it lead to better results?
      time.sleep(10)

  with open("auto_output.txt", "a") as f:
    f.write("\nscore_list: " + str(score_list))
  with open("only_final_output.txt", "a") as f:
    f.write("\nscore_list: " + str(score_list))
  print("Length of embedding list: ", len(axis_emb[0]))
  print("Length of score list: ", len(score_list))

  return score_list

In [33]:
def get_diff_list(score_list, axis_emb):
  """
  Get the difference list by subtracting the score list from the axis embedding
  """

  # compare score_list with axis_emb[1] by subtracting them
  diff_list = [score_list[i] - axis_emb[1][i] for i in range(len(score_list))]
  print(diff_list[:5])

  # abs and round the difference list to 3 decimal places
  diff_list = [abs(round(diff, 3)) for diff in diff_list]

  # sum the difference list
  sum_diff = sum(diff_list)
  print("Sum of diff: ", sum_diff)

  # calculate the mean of the difference list
  mean_diff = sum(diff_list)/len(diff_list)
  print("Mean of diff: ", mean_diff)

  with open("auto_output.txt", "a") as f:
    f.write("\ndiff_list: " + str(diff_list) + "\nsum of diff: " + str(sum_diff) + "\nmean of diff: " + str(mean_diff) + "\n\n")
  with open("only_final_output.txt", "a") as f:
    f.write("\ndiff_list: " + str(diff_list) + "\nsum of diff: " + str(sum_diff) + "\nmean of diff: " + str(mean_diff) + "\n\n")

  return diff_list

In [34]:
def create_df_csv(axis_emb, score_list, diff_list, axis):
  """
  Create a dataframe with the word, embedding, score, and difference values
  Save the dataframe as a csv
  """

  # create dataframe of words, embedding values, and scores
  df = pd.DataFrame({'word': axis_emb[0], 'embedding': axis_emb[1], 'score': score_list, 'diff': diff_list})
  print(df.head())

  # save dataframe as csv
  df.to_csv(f"llm-outputs/matte_auto_emb{axis}.csv", index=False)

### Set up and run automated OpenAI interpretation and evaluation

In [35]:
# selected_axes = [1,]
selected_axes = [2, 3, 4, 200, 300, 400, 500, 600, 700,]

model_num = "gpt-4"

In [36]:
# automatically run entire OpenAI interpretation and evaluation

for axis in selected_axes:
  with open("auto_output.txt", "a") as f:
    f.write(f"\n\n--------------------- AXIS {axis} ---------------------\n\n")
  with open("only_final_output.txt", "a") as f:
    f.write(f"\n\n--------------------- AXIS {axis} ---------------------\n\n")

  axis_emb = get_x_embedding(df, axis)

  max_interp_key, pos_neg = get_axis_interp(model_num, axis_emb)

  score_list = get_score_list(model_num, axis_emb, max_interp_key, pos_neg)

  diff_list = get_diff_list(score_list, axis_emb)

  create_df_csv(axis_emb, score_list, diff_list, axis)
  

{
  "role": "assistant",
  "content": "{\"physical objects vs abstract concepts\": 0.85, \"action vs state\": 0.7, \"human activities vs non-human activities\": 0.65, \"concrete vs abstract\": 0.75, \"active undertakings vs passive states\": 0.7}"
}
{'physical objects vs abstract concepts': 0.85, 'action vs state': 0.7, 'human activities vs non-human activities': 0.65, 'concrete vs abstract': 0.75, 'active undertakings vs passive states': 0.7}
0
physical objects vs abstract concepts
['physical objects', 'abstract concepts']


Step  1  of  13
1  Input length:  30
[0, 1, 0.8, 1, 0.5, -0.4, -0.7, 0.1, 0.2, 1, -0.1, 0.3, 0.9, 0, -0.9, 0.2, 0.2, -0.5, 0.4, 0.8, -0.3, 0.2, 1, 0.3, 0.6, 0.6, 0.2, 0.1, 0.8, 0.4]
1 Output length:  30
1 Are input output lengths the same?  True


Step  2  of  13
2  Input length:  30
[0.1, -0.1, -0.4, 0.3, 0.3, -0.4, 0.1, -0.1, 0.3, 0.3, 0.3, -0.3, 0.5, 0.3, 0.3, 0.3, -0.5, -0.2, 0.3, 0.1, 0.4, -0.6, 0.2, -0.3, -0.1, -0.2, -0.3, -0.4, 0.6, 0.2]
2 Output length:  3

Problems / Opportunities with current approach
1. Problem with specificity — it's easy to come up with a catch-all general label and be confident with it. More difficult to find specific features.
2. Could there be ambiguity regarding the positive and negative labels because of how short they are? Is some definition required?
3. (A) Could an overly diverse dataset introduce too much noise into the process? Perhaps a much more intentional, designed dataset is required to tease out specific features?
3. (B) Is it possible to do some basic (numeric/lin alg) filtering to get basic categories / slices of the dataset? e.g. noun and verb 
    - variational learning approach
      - using noun vs verb with very similar meanings? e.g. "person, personalize", "airplane, fly", “time clock timing temporal old new history”
      - Creating premade datasets to tease out similar axes to begin from? e.g. “Red” “Blue “green”. 
      - Creating premade datasets to tease out polar axes (reducing sets to purely binary ones)? “good” vs “evil”,  “Bright light white vs dark darkness black” 
    - identify similar axes through manual numeric functions (using reduced dataset) then test it using OpenAI (using larger dataset)? Ie. reduce to a set of hypotheses/candidate axes first
      - numeric functions can be finding the largest variance in a predetermined category, e.g. "color", "animal", "shape", "emotions". can also be finding axes that remain the same to be used for cross-examination with other categories (see how they change across categories, if they stayed the same for that category)
      - use contextual words to force a specific word sense. e.g. “The artists used red paint because they like the color.” then test (big datasets)? Different then test?
4. Can there be an incremental algorithm that is better at detecting good candidates? Is there a less expensive and progressive way to **(1) identify, (2) interpret, and (3) evaluate** axes?
    - e.g. feeding the diff list back into the algorithm to re-interpret the axis that considers prior results?

Other ideas
- Use token embedding without positional embedding?
- try on Word2Vec? 
