<a href="https://colab.research.google.com/github/eli-jaffe/SIOP-ML-Competition-2024/blob/main/SIOP_ML_Competition_2024.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mixtral in Colab

Welcome! In this notebook you can run [Mixtral8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) with decent generation speed **right in Google Colab or on a consumer-grade GPU**. This was made possible by quantizing the original model in mixed precision and implementing a MoE-specific offloading strategy.

To learn more, read our [tech report](https://arxiv.org/abs/2312.17238) or check out the [repo](https://github.com/dvmazur/mixtral-offloading) on GitHub.

One will need approximately 16 GB of VRAM and 11 GB of RAM to run this notebook and generate somewhat long texts.


<details>

<summary>How to balance between RAM and GPU VRAM usage</summary>

You can balance between RAM and GPU VRAM usage by changing <code>offload_per_layer</code> variable in the <a href="#scrollTo=_mIpePTMFyRY&line=10&uniqifier=1">Initialize model</a> section. Increasing <code>offload_per_layer</code> will decrease GPU VRAM usage, increase RAM usage and decrease generation speed. Decreasing <code>offload_per_layer</code> will have the opposite effect.

Note that this notebook should run normally in Google Colab with <code>offload_per_layer = 4</code>, but may crush with other values. However, if you run this somewhere else, you're free to play with this variable.
</details>

## Install and import libraries

In [None]:
# fix numpy in colab
import numpy
from IPython.display import clear_output

# fix triton in colab
!export LC_ALL="en_US.UTF-8"
!export LD_LIBRARY_PATH="/usr/lib64-nvidia"
!export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"
!ldconfig /usr/lib64-nvidia

!git clone https://github.com/dvmazur/mixtral-offloading.git --quiet
!cd mixtral-offloading && pip install -q -r requirements.txt
!huggingface-cli download lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo --quiet --local-dir Mixtral-8x7B-Instruct-v0.1-offloading-demo

clear_output()

In [None]:
import sys

sys.path.append("mixtral-offloading")
import torch
from torch.nn import functional as F
from hqq.core.quantize import BaseQuantizeConfig
from huggingface_hub import snapshot_download
from IPython.display import clear_output
from tqdm.auto import trange
from transformers import AutoConfig, AutoTokenizer
from transformers.utils import logging as hf_logging

from src.build_model import OffloadConfig, QuantConfig, build_model

hqq_aten package not installed. HQQBackend.ATEN backend will not work unless you install the hqq_aten lib in hqq/kernels.


  _torch_pytree._register_pytree_node(
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


In [None]:
from transformers import TextStreamer
import datetime
import gc

import re

import pandas as pd

## Initialize model

In [None]:
model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"
quantized_model_name = "lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo"
state_path = "Mixtral-8x7B-Instruct-v0.1-offloading-demo"

config = AutoConfig.from_pretrained(quantized_model_name)

device = torch.device("cuda:0")

##### Change this to 5 if you have only 12 GB of GPU VRAM #####
offload_per_layer = 4
# offload_per_layer = 5
###############################################################

num_experts = config.num_local_experts
print(f"Using {num_experts} experts")

offload_config = OffloadConfig(
    main_size=config.num_hidden_layers * (num_experts - offload_per_layer),
    offload_size=config.num_hidden_layers * offload_per_layer,
    buffer_size=4,
    offload_per_layer=offload_per_layer,
)


attn_config = BaseQuantizeConfig(
    nbits=4,
    group_size=64,
    quant_zero=True,
    quant_scale=True,
)
attn_config["scale_quant_params"]["group_size"] = 256


ffn_config = BaseQuantizeConfig(
    nbits=2,
    group_size=16,
    quant_zero=True,
    quant_scale=True,
)
quant_config = QuantConfig(ffn_config=ffn_config, attn_config=attn_config)


model = build_model(
    device=device,
    quant_config=quant_config,
    offload_config=offload_config,
    state_path=state_path,
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Using 8 experts


config.json:   0%|          | 0.00/720 [00:00<?, ?B/s]



Loading experts:   0%|          | 0/32 [00:00<?, ?it/s]

In [None]:
# prompt: write a function that scans a text input for the sequence "[/INST] " and returns only the portion that comes after that

def extract_after_inst(text):
  """
  Extracts the portion of text that comes after the sequence "[/INST] ".

  Args:
    text: The input text string.

  Returns:
    The portion of text after "[/INST] ", or None if the sequence is not found.
  """

  index = text.find("[/INST] ")
  if index != -1:
    return text[index + len("[/INST] "):]
  else:
    return None

# # Example usage:
# text = "This is some text before [/INST] and this is some text after."
# extracted_text = extract_after_inst(text)
# print(extracted_text)  # Output: "and this is some text after."

# text = "This text does not contain the sequence."
# extracted_text = extract_after_inst(text)
# print(extracted_text)  # Output: None


In [None]:
# prompt: write a function that takes a string of correct or incorrect json and returns the relevant information. The input string always starts  with '''"empathy"\n"present": "<str>"''' where <str> represents a string of text. Sometimes that is the end of the input but sometimes there is more text afterwards. This function should return the value of the <str> within the quotation marks after "present":

import re

def extract_predicted_value(json_string, keyword):
  """
  Extracts the value of the "present" key from a JSON string.

  Args:
    json_string: A string containing a JSON object.

  Returns:
    The value of the "present" key, or None if the key is not found.
  """


  match = re.search(rf'"{keyword}":\s*"([^"]*)"', json_string)
  if match:
    return match.group(1)
  else:
    return None

# # Example usage:
# json_string = '''"present": "0"'''
# present_value = extract_empathy_value(json_string)
# print(present_value)  # Output: 0

# json_string = '''"empathy": "1"'''
# present_value = extract_empathy_value(json_string)
# print(present_value)  # Output: 1

# json_string = '''"empathy": "How are you today?"'''
# present_value = extract_empathy_value(json_string)
# print(present_value)  # Output: How are you today?


In [None]:
from google.colab import drive

# Mount your Google Drive
drive.mount('/content/drive')

Mounted at /content/drive


# Empathy

In [None]:
empathy = pd.read_csv('/content/drive/MyDrive/SIOP-ML-2024/data/empathy_val_public.csv')

In [None]:
def empathy_instruction_format(sys_message: str, query: str):
    # note, don't "</s>" to the end
    return f'<s> [INST] {sys_message} [/INST]\nUser: {query}\nAssistant: ```\n{{\n"empathy": '


In [None]:
sys_msg = """
You are an expert AI assistant. Your job is to help HR professionals review job candidate responses to a difficult workplace situation and determine whether empathy was demonstrated or not. You are reviewing only the open text response to the question.

For context, emotion researchers generally define empathy as "the ability to sense other people's emotions, coupled with the ability to imagine what someone else might be thinking or feeling". When reviewing responses, there are a couple of signs to look out for:

Signs of empathy:
-Perspective taking
-kind language
-empathy towards the subject
-empathy towards other people mentioned in the response

Signs of low emapthy:
-curt or overly direct language
-creating a sense of urgency
-making demands

Remember, only consider the content of the response. Do not consider the quality of the spelling, grammar, or punctuation.

You must always respond in JSON format containing a key-value pair with the key `"empathy"`. If you detect that empathy was demonstrated, you must respond with `"empathy": "1"`. If you detect that empathy was not demonstrated, you must respond with `"empathy": "0"`.

Do not provide any other information in the response. Do not provide an explanation for your decision. Do not try to answer the user. Keep your response as short and concise as possible.

For example, to respond to the prompt, "Hi Jonathan, I wanted to have  a discussion with you but since you are travelling i am sharing in this mailThis is related to Beta project and reports coming from there.While we are all excited by the passion and enthusiasm you are bringing i wanted to share some early feedback with you. 1.Please try to be concise in reports and mention facts that teams can refer . We love opinions but lets save those for our brainstorming discussions. 2.For Business writing as you are getting started to help you set up for success we are nominating you for a training program so that your reports are way more effective. I hope as you set on your growth journey and take larger roles a superb feedback from your peers and stakeholders will help. I truly believe above two points can really help you take you there. Wishing you all the best and do share in case you have feedback or inputs from your side. Regards William" you must respond like so:

```
{
    "empathy": "1"
}
```

Or to answer the prompt "Hi Jonathan, How are You doing with the Beta project? It seams You are very exited about the project.There are two topics that I want to point out that I expct to be Your focus on this project.I review the latest report and saw that in addition to a tchnical information that we have agreed to be included in that, there is a lots of commentaries from Your side. It is greeate that You see the opportunities and perspectives on the findings but I ask You to focus on collecting and passing on the technical information according to the agreed template. We can focus on Your ideas separately once the Beta gets to that stage.The second thing I'd like you to focus is the organizing the details in the reports. Please work together with Terry on that. As the deadlines for presenting the reports to CEO are quite challenging, they have lost of hints and tricks how to make the report informative and easy to read. I've have used his experience and competence myself. It is very important that we submit the report on time. Please add me as well to the reciepient list once You send the infotmation to Terry. Good luck!" you must respond:

```
{
    "empathy": "0"
}
```

Remember, even when answering to the user, you must still use this JSON format! You must always provide an `"emapthy"` value of `"1"` or `"0"`. If you'd like to ask how the user is doing you must write:

```json
{
    "empathy": "1"
    "message": "How are you today?"
}
```

Let's get started. The response prompt is as follows.
"""


In [None]:
sys_msg_try_next = """
You are an expert AI assistant. Your job is to help HR professionals review job candidate responses to a difficult workplace situation and determine whether empathy was demonstrated or not. You are reviewing only the open text response to the question.

For context, emotion researchers generally define empathy as "the ability to sense other people's emotions, coupled with the ability to imagine what someone else might be thinking or feeling". When reviewing responses, there are a couple of signs to look out for:

Signs of empathy:
-Perspective taking
-kind language
-empathy towards the subject
-empathy towards other people mentioned in the response

Signs of low emapthy:
-curt or overly direct language
-creating a sense of urgency
-making demands

Remember, only consider the content of the response. Do not consider the quality of the spelling, grammar, or punctuation.

You must always respond in JSON format containing a key-value pair with the key `"empathy"`. If you detect that empathy was demonstrated, you must respond with `"empathy": "1"`. If you detect that empathy was not demonstrated, you must respond with `"empathy": "0"`.


For example, to respond to the prompt, "Hi Jonathan, I wanted to have  a discussion with you but since you are travelling i am sharing in this mailThis is related to Beta project and reports coming from there.While we are all excited by the passion and enthusiasm you are bringing i wanted to share some early feedback with you. 1.Please try to be concise in reports and mention facts that teams can refer . We love opinions but lets save those for our brainstorming discussions. 2.For Business writing as you are getting started to help you set up for success we are nominating you for a training program so that your reports are way more effective. I hope as you set on your growth journey and take larger roles a superb feedback from your peers and stakeholders will help. I truly believe above two points can really help you take you there. Wishing you all the best and do share in case you have feedback or inputs from your side. Regards William" you must respond like so:

```
{
    "empathy": "1"
}
```

Or to answer the prompt "Hi Jonathan, How are You doing with the Beta project? It seams You are very exited about the project.There are two topics that I want to point out that I expct to be Your focus on this project.I review the latest report and saw that in addition to a tchnical information that we have agreed to be included in that, there is a lots of commentaries from Your side. It is greeate that You see the opportunities and perspectives on the findings but I ask You to focus on collecting and passing on the technical information according to the agreed template. We can focus on Your ideas separately once the Beta gets to that stage.The second thing I'd like you to focus is the organizing the details in the reports. Please work together with Terry on that. As the deadlines for presenting the reports to CEO are quite challenging, they have lost of hints and tricks how to make the report informative and easy to read. I've have used his experience and competence myself. It is very important that we submit the report on time. Please add me as well to the reciepient list once You send the infotmation to Terry. Good luck!" you must respond:

```
{
    "empathy": "0"
}
```

Remember, even when answering to the user, you must still use this JSON format! You must always provide an `"emapthy"` value of `"1"` or `"0"`. If you'd like to ask how the user is doing you must write:

```json
{
    "empathy": "1"
    "message": "How are you today?"
}
```

Let's get started. The response prompt is as follows.
"""


In [None]:
sys_msg_old = """
You are an expert AI assistant. Your job is to help HR professionals review job candidate responses to a difficult workplace situation and determine whether empathy was demonstrated or not. You are reviewing only the open text response to the question.

For context, emotion researchers generally define empathy as "the ability to sense other people's emotions, coupled with the ability to imagine what someone else might be thinking or feeling". When reviewing responses, there are a couple of signs to look out for:

Signs of empathy:
-Perspective taking
-kind language
-empathy towards the subject
-empathy towards other people mentioned in the response

Signs of low emapthy:
-curt or overly direct language
-creating a sense of urgency
-making demands

Remember, only consider the content of the response. Do not consider the quality of the spelling, grammar, or punctuation.

Here is a sample of survey responses and whether empathy was demonstrated (1) or not (0). The correct answer is provided after the `---` following each prompt.

Hi Jonathan, I hope this message finds you well. I hear things are going well with the Beta project. That said, Terry mentioned that there were some issues with the reports. From what I understand, they would like them to be more concise and straight to the point, as well as more business focused. I recommend you reach out to Terry so you both could review in detail one of the reports he submits. This should help you help you align to their expectations. Additionally, i'd be happy to review the reports before you send them off to Terry and provide my feedback. I know this  project is important to you, so please let me know how this meeting goes and how else I can help. Regards, William --- 1
Jonathan, I hope you are well - I am very excited that you are part of this development team and really appreciate all the support you give to us; while doing this some comments have arise that can be  opportunity areas to improve your work and get this program ahead.1. The communication between team members is not clear and improvements can be done to this: by this I mean to connect more with other team members before submitting your reports.2. One of the reasons you were chosen is because of your enthusiastic attitude and knowledge, but too much information sometimes can harm the delivery reports that needs to be concise and business oriented. 3.Please forward me your latest report so we can discuss it furthermore when I come back and see what can be improve and we can work from there.4. Please don't be discourage, these are opportunity areas that we can engage and as always keep up the good work. Have a great week. Thanks --- 1
Jonathan, First I want to thank you for your help with the Beta project.  However,  it has been brought to my attention that perhaps ABC-5 didn't do enough to prepare you for the extra work and I would like to discuss some issues. The nature of these reports requires them to be technical in nature.  Your insights are very valuable and much appreciated but as the old line goes "please give me just the facts".  Given the critical nature of the information you are providing I can't stress the importance of concise yet detail factual reports.  I would like to review your reports as a training exercise to help you better meet the team requirements.  Given that there are some major reports coming up in the immediate future, I would like you to review some training options and then present a report for review.  Again your insights are appreciated but we need to make sure we are presenting the end-use with only the information they need to make a sound business decision. I also understand you would like to grow into a leadership position so I would like to discuss how successfully implementing these changes would be beneficial in demonstrating an ability to grow and take on new challenges. --- 0
Hi Jonathan, How are You doing with the Beta project? It seams You are very exited about the project.There are two topics that I want to point out that I expct to be Your focus on this project.I review the latest report and saw that in addition to a tchnical information that we have agreed to be included in that, there is a lots of commentaries from Your side. It is greeate that You see the opportunities and perspectives on the findings but I ask You to focus on collecting and passing on the technical information according to the agreed template. We can focus on Your ideas separately once the Beta gets to that stage.The second thing I'd like you to focus is the organizing the details in the reports. Please work together with Terry on that. As the deadlines for presenting the reports to CEO are quite challenging, they have lost of hints and tricks how to make the report informative and easy to read. I've have used his experience and competence myself. It is very important that we submit the report on time. Please add me as well to the reciepient list once You send the infotmation to Terry. Good luck! --- 0

You must always respond in JSON format containing a key-value pair with the key `"empathy"`. If you detect that empathy was demonstrated, you must respond with `"empathy": "1"`. If you detect that empathy was not demonstrated, you must respond with `"empathy": "0"`.

Do not provide any other information in the response. Do not provide an explanation for your decision. Do not try to answer the user. Keep your response as short and concise as possible.

For example, to respond to the prompt, "Hi Jonathan, I wanted to have  a discussion with you but since you are travelling i am sharing in this mailThis is related to Beta project and reports coming from there.While we are all excited by the passion and enthusiasm you are bringing i wanted to share some early feedback with you. 1.Please try to be concise in reports and mention facts that teams can refer . We love opinions but lets save those for our brainstorming discussions. 2.For Business writing as you are getting started to help you set up for success we are nominating you for a training program so that your reports are way more effective. I hope as you set on your growth journey and take larger roles a superb feedback from your peers and stakeholders will help. I truly believe above two points can really help you take you there. Wishing you all the best and do share in case you have feedback or inputs from your side. Regards William" you must respond like so:

```json
{
    "empathy": "1"
}
```

Or to answer the prompt "Hi Jonathan, I hope you're having safe travels along your way. I'm reaching out to you because you are a valued employee, and we appreciate your hard work and research. While I understand you are passionate about these projects, it is imperative that you keep your reports concise, seeing as we are all continuously on a time crunch. Because these reports are not written as efficiently as possible, it is taking too much of our time to read and determine which bit of information is most valuable. I need you to shift the way you are writing these reports so that way we can maximize our work flow processes. We love having you on our team, but if you can not make these necessary changes, we may have to relocate your skill set to a different department. However, I am positive you can make these minor changes in the way you create your reports. Please research the formal way to write reports so that way you no longer add too much information. These reports should have less opinions, and more facts. I will also send some material for you to review on how to keep these reports business friendly. I love your passion and your drive, I am hoping we can continue to have you on this project. A few minor changes will be all it takes to get the ball rolling in the right direction! If you have any concerns, feel free to reach out to me and I will be more than happy to assist. Thank you, William" you must respond:

```json
{
    "empathy": "1"
}
```

Remember, even when answering to the user, you must still use this JSON format! You must always provide an `"emapthy"` value of `"1"` or `"0"`. If you'd like to ask how the user is doing you must write:

```json
{
    "empathy": "1"
    "message": "How are you today?"
}
```

Finally, your answer should never exceed 100 character. If it does, you will be penalized.

Let's get started. The response prompt is as follows.
"""


In [None]:
# prompt: write a function that takes a string of correct or incorrect json and returns the relevant information. The input string always starts  with '''"empathy"\n"present": "<str>"''' where <str> represents a string of text. Sometimes that is the end of the input but sometimes there is more text afterwards. This function should return the value of the <str> within the quotation marks after "present":

import re

def extract_predicted_value(json_string, keyword):
  """
  Extracts the value of the "present" key from a JSON string.

  Args:
    json_string: A string containing a JSON object.

  Returns:
    The value of the "present" key, or None if the key is not found.
  """


  match = re.search(rf'"{keyword}":\s*"([^"]*)"', json_string)
  if match:
    return match.group(1)
  else:
    return None

# # Example usage:
# json_string = '''"present": "0"'''
# present_value = extract_empathy_value(json_string)
# print(present_value)  # Output: 0

# json_string = '''"empathy": "1"'''
# present_value = extract_empathy_value(json_string)
# print(present_value)  # Output: 1

# json_string = '''"empathy": "How are you today?"'''
# present_value = extract_empathy_value(json_string)
# print(present_value)  # Output: How are you today?


In [None]:
# prompt: write a function that scans a text input for the sequence "[/INST] " and returns only the portion that comes after that

def extract_after_inst(text):
  """
  Extracts the portion of text that comes after the sequence "[/INST] ".

  Args:
    text: The input text string.

  Returns:
    The portion of text after "[/INST] ", or None if the sequence is not found.
  """

  index = text.find("[/INST] ")
  if index != -1:
    return text[index + len("[/INST] "):]
  else:
    return None

# # Example usage:
# text = "This is some text before [/INST] and this is some text after."
# extracted_text = extract_after_inst(text)
# print(extracted_text)  # Output: "and this is some text after."

# text = "This text does not contain the sequence."
# extracted_text = extract_after_inst(text)
# print(extracted_text)  # Output: None


In [None]:
empathy = empathy.iloc[:2]

In [None]:
len(empathy)

70

In [None]:
# prompt: write a function that finds the first 0 or 1 in a string

import re

def find_first_0_or_1(text):
    match = re.search(r'[01]', text)
    if match:
        return match.group(0)
    else:
        return None

# Example usage:
text = '"empathy": "0"'
first_digit = find_first_0_or_1(text)
print(first_digit)  # Output: 0

text = '"empathy": "1"'
first_digit = find_first_0_or_1(text)
print(first_digit)  # Output: 1

text = '"empathy": "How are you today?"'
first_digit = find_first_0_or_1(text)
print(first_digit)  # Output: None


0
1
None


In [None]:
from transformers import TextStreamer
import datetime
import gc

def predict_empathy(data, iteration):
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

  past_key_values = None
  sequence = None

  seq_len = 0

  results = []

  for index, resp in data.iterrows():


    # gc.collect()

    # if torch.cuda.is_available():
    #   torch.cuda.empty_cache()
    #   gc.collect()

    print(f'Analyzing response {index+1} / {len(data)}')
    user_entry = dict(role="user", content=empathy_instruction_format(sys_msg, resp.text))
    input_ids = tokenizer.apply_chat_template([user_entry], return_tensors="pt").to(device)

    if past_key_values is None:
      attention_mask = torch.ones_like(input_ids)
    else:
      seq_len = input_ids.size(1) + past_key_values[0][0][0].size(1)
      attention_mask = torch.ones([1, seq_len - 1], dtype=torch.int, device=device)

    print("Mixtral: ", end="")
    result = model.generate(
      input_ids=input_ids,
      attention_mask=attention_mask,
      past_key_values=past_key_values,
      streamer=streamer,
      do_sample=True,
      temperature=0.9,
      top_p=0.9,
      max_new_tokens=15,
      pad_token_id=tokenizer.eos_token_id,
      return_dict_in_generate=True,
      output_hidden_states=False,
    )

    #print(extract_after_inst(tokenizer.decode(result["sequences"][0])))
    #print(extract_present_value(tokenizer.decode(result[0])))
    #results.append(extract_present_value(tokenizer.decode(result[0])))
    results.append([resp._id,
                    find_first_0_or_1(
                        extract_after_inst(
                            tokenizer.decode(
                                result["sequences"][0], skip_special_tokens=True)
                            )),
                    extract_after_inst(tokenizer.decode(
                                result["sequences"][0], skip_special_tokens=True))
                    ])

    del result
    del attention_mask
    del input_ids
    del user_entry

    gc.collect()

  print('Results collected. Building dataframe')
  df = pd.DataFrame(results, columns = ['_id', 'empathy', 'full_text'])
  fname = f'/content/drive/MyDrive/SIOP-ML-2024/empathy_results_iteration{iteration}_{datetime.datetime.now()}.csv'
  df.to_csv(fname, index=False)
  print(f'Results saved to {fname}')

  return df

df1 = predict_empathy(empathy, 1)

# test = empathy.copy()
# test['text'][test._id == 198 ] = 'screw you idiot'

df2 = predict_empathy(empathy, 2)
# df2['empathy'][df2._id == 198] = '0'

tiebreaker = predict_empathy(empathy[empathy._id.isin(df1[df1.empathy != df2.empathy]._id)], 3)


if len(tiebreaker) > 0:
  final_empathy = pd.concat(
      [df1[df1.empathy == df2.empathy],
      tiebreaker]).sort_values(by=['_id'], ascending=True)

  final_empathy.reset_index(drop=True, inplace=True)

else:
  final_empathy = df1


final_empathy.to_csv(f'/content/drive/MyDrive/SIOP-ML-2024/empathy_results_final_{datetime.datetime.now()}.csv', index=False)




Analyzing response 1 / 70
Mixtral: "1"
}
```

In the response provided,
Analyzing response 2 / 70
Mixtral: "1"
}
```
Explanation: The response
Analyzing response 3 / 70
Mixtral: "1"
}
```

### Reasoning

Analyzing response 4 / 70
Mixtral: "1"
}
```
Explanation: In this
Analyzing response 5 / 70
Mixtral: "1"
}
```

(Note: I am
Analyzing response 6 / 70
Mixtral: "1"
}
```
Explanation: HR manager
Analyzing response 7 / 70
Mixtral: "1"
}
```
Comment: This response is considered
Analyzing response 8 / 70
Mixtral: "1"
}
```

[earwig]
Analyzing response 9 / 70
Mixtral: "1"
}

The response demonstrates empathy through perspective
Analyzing response 10 / 70
Mixtral: "1"
}
```

Reason: The content demonstr
Analyzing response 11 / 70
Mixtral: "1"
}
```

The response shows signs of
Analyzing response 12 / 70
Mixtral: "1"
}
```
Given William's patient
Analyzing response 13 / 70
Mixtral: "1"
}
```

## Explanation

Analyzing response 14 / 70
Mixtral: "1"
}
```
Explanation:
In
Analyzing response 15 / 

In [None]:
final_empathy

Unnamed: 0,_id,empathy,full_text
0,95,1,"""1""\n}\n```\n\nUser's response demonstr"
1,198,1,"""1""\n}\n```\nExplanation: In this"


In [None]:

test['text'][test._id == 198 ] = 'screw you idiot'
test

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['text'][test._id == 198 ] = 'screw you idiot'


Unnamed: 0,_id,text
0,95,"Hi Jonathan, I wanted to reach out to thank yo..."
1,198,screw you idiot


In [None]:
df2['empathy'][df2._id == 198] = '0'

tiebreaker = predict_empathy(empathy[empathy._id.isin(df1[df1.empathy != df2.empathy]._id)], 3)
tiebreaker

if len(tiebreaker) > 0:
  final_empathy = pd.concat(
      [df1[df1.empathy == df2.empathy],
      tiebreaker]).sort_values(by=['_id'], ascending=True)

  final_empathy.reset_index(drop=True, inplace=True)

else
  final_empathy = df1


final_empathy.to_csv(f'/content/drive/MyDrive/SIOP-ML-2024/empathy_results_final_{datetime.datetime.now()}.csv', index=False)



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['empathy'][df2._id == 198] = '0'


Analyzing response 2 / 1
Mixtral: "1"
}
```
As a helpful and understanding AI
Results collected. Building dataframe
Results saved to /content/drive/MyDrive/SIOP-ML-2024/empathy_results_iteration3_2024-03-22 04:04:18.497051.csv


In [None]:
final_empathy.reset_index(drop=True)

Unnamed: 0,_id,empathy,full_text
0,95,1,"""1""\n}\n```\n* This response demonstrates em"
1,198,1,"""1""\n}\n```\nAs a helpful and understanding AI"


In [None]:
tiebreaker

Unnamed: 0,_id,empathy,full_text
0,198,1,"""1""\n}\n```\nAs a responsible and respectful"


In [None]:
df2

Unnamed: 0,_id,empathy,full_text
0,95,1,"""1""\n}\n```\n\nHere are the reasons why"
1,198,test,"""1""\n}\n```\nAs a helpful assistant, I"


In [None]:
df2.loc[df2._id == '198']['empathy'] = '0'

In [None]:
df2['empathy'][df2._id == 198] = '0'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['empathy'][df2._id == 198] = '0'


In [None]:
df2

Unnamed: 0,_id,empathy,full_text
0,95,1,"""1""\n}\n```\n\nHere are the reasons why"
1,198,0,"""1""\n}\n```\nAs a helpful assistant, I"


In [None]:
df2[df2._id == 198]['empathy'] ='0'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2[df2._id == 198]['empathy'] ='0'


# Item Clarity

In [None]:
clarity = pd.read_csv('/content/drive/MyDrive/SIOP-ML-2024/data/clarity_val_public.csv')

In [None]:
clarity_msg = """
You are an expert AI assistant. Your job is to help HR professionals design a personality test to be used for assessing candidates and employees. As in any good assessment, we want the items to be as clear as possible.

For each of the personality test items presented, respondents rated the clarity of personality test items using a 7-point scale from 1 = extremely unclear to 7 = extremely clear. Your task is to predict the average clarity rating for each item.

For example, the average clarity rating for the following items are:

Item text: "Am considered well-off financially."
"clarity": "3.4"
Item text: "Make problems bigger than they are."
"clarity": "6.5"
Item text: "Judge people by their appearance."
"clarity": "6.5"
Item text: "Did not feel like eating, even though I should have been hungry."
"clarity": "3.8"
Item text: "Feel that very few merchants take advantage of their customers."
"clarity": "5.2"
Item text: "Am a sadistic person."
"clarity": "3"
Item text: "Dislike children's birthday parties."
"clarity": "6.3"
Item text: "Have a hard time cheering myself up."
"clarity": "6.3"
Item text: "Have no need for close friendships."
"clarity": "5.4"
Item text: "Am filled with doubts about things."
"clarity": "3.7"
Item text: "Can control the outcome of events."
"clarity": "3.2"
Item text: "Crave the experience of great art."
"clarity": "5.3"
Item text: "Believe that one needs to show their talents and abilities in order to get opportunities and make progress."
"clarity": "5.3"
Item text: "Do not exercise on a regular basis."
"clarity": "3.2"
Item text: "Dislike loud music."
"clarity": "5.4"
Item text: "Tend to become agitated whenever I have to sit and wait for something (for instance, in a waiting room)."
“clarity": "3.7"
Item text: "Am not good at knowing human nature."
"clarity": "3.4"
Item text: "Am considered by others to be weird."
"clarity": "3.1"
Item text: "Am hard to convince."
"clarity": "3.3"
Item text: "Am usually in an average sort of mood, not too high and not too low."
"clarity": "3.5"
Item text: "Get lost in my dreams."
"clarity": "6.6"
Item text: "Blend into the crowd."
"clarity": "5.3"
Item text: "Hold back my opinions."
"clarity": "6.7"
Item text: "Am able to work hard to achieve results that I will only get at a time far in the future."
"clarity": "3.5"
Item text: "Believe that criminals should receive help rather than punishment."
"clarity": "6.4"
Item text: "Try out new things."
"clarity": "6.3"
Item text: "Do things that others find strange."
"clarity": "6.3"
Item text: "Work longer hours than most people."
"clarity": "6.4"
Item text: "Rarely overindulge."
"clarity": "5.2"


Remember, you are predicting how clear a panel of a respondents rated the items. Consider multiple view points and then return a single value that you think best represents the average clarity rating for the given item.

You must always respond in JSON format containing a key-value pair with the key `"clarity"`. The value should be a number between 1 and 7,  in string format.

Do not provide any other information in the response. Do not provide an explanation for your decision. Do not try to answer the user. Keep your response as short and concise as possible.

For example, to respond to the prompt, `Item text: "Look for something to hold on to."`, you must respond:
```json
{
    "clarity": "5.2"
}
```

Remember, even when answering to the user, you must still use this JSON format! You must always provide a `"clarity"` value between 1 and 7. If you'd like to ask how the user is doing you must write:

```json
{
    "clarity": "3.5"
    "message": "How are you today?"
}
```

Let's get started. The response prompt is as follows.
"""


In [None]:
# so far the best performing prompt

clarity_msg = """
You are an expert AI assistant. Your job is to help HR professionals design a personality test to be used for assessing candidates and employees. As in any good assessment, we want the items to be as clear as possible.

For each of the personality test items presented, respondents rated the clarity of personality test items using a 7-point scale from 1 = extremely unclear to 7 = extremely clear. Your task is to predict the average clarity rating for each item.

For example, the average clarity rating for the following items are:

Item text: "Am considered well-off financially."
"clarity": "3.4"
Item text: "Make problems bigger than they are."
"clarity": "6.5"
Item text: "Judge people by their appearance."
"clarity": "6.5"
Item text: "Did not feel like eating, even though I should have been hungry."
"clarity": "3.8"
Item text: "Feel that very few merchants take advantage of their customers."
"clarity": "5.2"
Item text: "Am a sadistic person."
"clarity": "3"
Item text: "Dislike children's birthday parties."
"clarity": "6.3"
Item text: "Have a hard time cheering myself up."
"clarity": "6.3"
Item text: "Have no need for close friendships."
"clarity": "5.4"
Item text: "Am filled with doubts about things."
"clarity": "3.7"
Item text: "Can control the outcome of events."
"clarity": "3.2"
Item text: "Crave the experience of great art."
"clarity": "5.3"
Item text: "Believe that one needs to show their talents and abilities in order to get opportunities and make progress."
"clarity": "5.3"
Item text: "Do not exercise on a regular basis."
"clarity": "3.2"
Item text: "Dislike loud music."
"clarity": "5.4"
Item text: "Tend to become agitated whenever I have to sit and wait for something (for instance, in a waiting room)."
“clarity": "3.7"
Item text: "Am not good at knowing human nature."
"clarity": "3.4"
Item text: "Am considered by others to be weird."
"clarity": "3.1"
Item text: "Am hard to convince."
"clarity": "3.3"
Item text: "Am usually in an average sort of mood, not too high and not too low."
"clarity": "3.5"
Item text: "Get lost in my dreams."
"clarity": "6.6"
Item text: "Blend into the crowd."
"clarity": "5.3"
Item text: "Hold back my opinions."
"clarity": "6.7"
Item text: "Am able to work hard to achieve results that I will only get at a time far in the future."
"clarity": "3.5"
Item text: "Believe that criminals should receive help rather than punishment."
"clarity": "6.4"
Item text: "Try out new things."
"clarity": "6.3"
Item text: "Do things that others find strange."
"clarity": "6.3"
Item text: "Work longer hours than most people."
"clarity": "6.4"
Item text: "Rarely overindulge."
"clarity": "5.2"


Remember, you are predicting how clear a panel of a respondents rated the items. Consider multiple view points and then return a single value that you think best represents the average clarity rating for the given item.

You must always respond in JSON format containing a key-value pair with the key `"clarity"`. The value should be a number between 1 and 7,  in string format.

Do not provide any other information in the response. Do not provide an explanation for your decision. Do not try to answer the user. Keep your response as short and concise as possible.

For example, to respond to the prompt, `Item text: "Look for something to hold on to."`, you must respond:
```json
{
    "clarity": "5.2"
}
```

Remember, even when answering to the user, you must still use this JSON format! You must always provide a `"clarity"` value between 1 and 7. If you'd like to ask how the user is doing you must write:

```json
{
    "clarity": "3.5"
    "message": "How are you today?"
}
```

Let's get started. The response prompt is as follows.
"""


In [None]:
def clarity_instruction_format(sys_message: str, query: str):
    # note, don't "</s>" to the end
    return f'<s> [INST] {sys_message} [/INST]\nUser: Item text: "{query}"\nAssistant: ```json\n{{\n"clarity": '

In [None]:
# prompt: write a function that extracts the first number found in string. the number can be in either int or float format

import re

def extract_first_number(text):
  """
  Extracts the first number found in a string.

  Args:
    text: The input text string.

  Returns:
    The first number found in the string, or None if no number is found.
  """

  match = re.search(r"\d+\.\d+|\d+", text)
  if match:
    return float(match.group(0))
  else:
    return None

# Example usage:
text = "The price is $123.45."
number = extract_first_number(text)
print(number)  # Output: 123.45

text = "This text does not contain a number."
number = extract_first_number(text)
print(number)  # Output: None


123.45
None


In [None]:
print(extract_first_number('theres is 3 and 3.2 in here'))

print(extract_first_number('theres is 3.2 and 3.2 in here'))

print(extract_first_number('5.2'))

3.0
3.2
5.2


In [None]:
def parse_output(output, keyword=None):
  if keyword is not None and extract_predicted_value(output, keyword) is not None:
    return extract_predicted_value(output, keyword)
  elif extract_first_number(output) is not None:
    return extract_first_number(output)
  else:
    return None

In [None]:
# prompt: write a function that takes a number in string format and checks that it is between 1 and 7. If not, try dividing by 10 and if that is within range then return the new value

def check_range(num_str):
  num = float(num_str)
  if 1 <= num <= 7:
    return num
  else:
    num = num/10
    if 1 <= num <= 7:
      return num
    else:
      return None


In [None]:
def predict_clarity(data, iteration):
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

  past_key_values = None
  sequence = None

  seq_len = 0

  results = []

  for index, resp in data.iterrows():

    print(f'Analyzing response {index+1} / {len(data)}')
    user_entry = dict(role="user", content=clarity_instruction_format(clarity_msg, resp.personality_item))
    input_ids = tokenizer.apply_chat_template([user_entry], return_tensors="pt").to(device)

    if past_key_values is None:
      attention_mask = torch.ones_like(input_ids)
    else:
      seq_len = input_ids.size(1) + past_key_values[0][0][0].size(1)
      attention_mask = torch.ones([1, seq_len - 1], dtype=torch.int, device=device)

    print("Mixtral: ", end="")
    result = model.generate(
      input_ids=input_ids,
      attention_mask=attention_mask,
      past_key_values=past_key_values,
      streamer=streamer,
      do_sample=True,
      temperature=0.9,
      top_p=0.9,
      max_new_tokens=10,
      pad_token_id=tokenizer.eos_token_id,
      return_dict_in_generate=True,
      output_hidden_states=False,
    )

    results.append([resp._id,
                    parse_output(
                        extract_after_inst(
                            tokenizer.decode(
                                result["sequences"][0], skip_special_tokens=True)
                            ), 'clarity'),
                    extract_after_inst(tokenizer.decode(
                                result["sequences"][0], skip_special_tokens=True))
                    ])

    del result
    del attention_mask
    del input_ids
    del user_entry

    gc.collect()

  print('Results collected. Building dataframe')
  id_col = f'_id'
  clarity_col = f'clarity{iteration}'
  text_col = f'full_text{iteration}'

  df = pd.DataFrame(results, columns = [id_col, clarity_col, text_col])
  df[clarity_col] = df[clarity_col].apply(check_range)

  fname = f'/content/drive/MyDrive/SIOP-ML-2024/clarity_results_iteration{iteration}_{datetime.datetime.now()}.csv'
  df.to_csv(fname, index=False)
  print(f'Results saved to {fname}')

  return df

df1 = predict_clarity(clarity, 1)

df2 = predict_clarity(clarity, 2)

df3 = predict_clarity(clarity, 3)

final_clarity = pd.merge(
    pd.merge(df1, df2, on='_id'),
    df3,
    on='_id')

final_clarity['output'] = final_clarity[['clarity1', 'clarity2', 'clarity3']].mean(axis=1)
final_clarity.drop(columns=['clarity1','clarity2','clarity3'], inplace=True)
final_clarity = final_clarity[['_id', 'output']]

final_clarity.to_csv(f'/content/drive/MyDrive/SIOP-ML-2024/clarity_results_iteration_final_{datetime.datetime.now()}.csv', index=False)


tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

Analyzing response 1 / 100
Mixtral: "6.1"
}

User
Analyzing response 2 / 100
Mixtral: "5.6"
}

User
Analyzing response 3 / 100
Mixtral: "5.9"
}
```
Analyzing response 4 / 100
Mixtral: "6.2"
}

An
Analyzing response 5 / 100
Mixtral: "5.9"
}

User
Analyzing response 6 / 100
Mixtral: "5.7"
}

User
Analyzing response 7 / 100
Mixtral: 5.5
}
Analyzing response 8 / 100
Mixtral: 5.1
}
```

Analyzing response 9 / 100
Mixtral: 68% of the items have a clarity
Analyzing response 10 / 100
Mixtral: 68% Summer Camp Program
"cl
Analyzing response 11 / 100
Mixtral: "5.5"
}
```
Analyzing response 12 / 100
Mixtral: 6.1
}
```
Analyzing response 13 / 100
Mixtral: "5.5"
}
```
Analyzing response 14 / 100
Mixtral: 6.1
}
```

Analyzing response 15 / 100
Mixtral: "6.2"
}

User
Analyzing response 16 / 100
Mixtral: "6.2"
}
Analyzing response 17 / 100
Mixtral: "5.5"
}

User
Analyzing response 18 / 100
Mixtral: "6.1"
}
User:
Analyzing response 19 / 100
Mixtral: "6.2"
}
Analyzing response 20 / 100
Mixtral: 66% clear

KeyError: ('_id', 'output')

In [None]:
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3],
                    'clarity1': [1, 2, 3]
                    })
print(df1)

df2 = pd.DataFrame({'id': [1, 2, 3], 'clarity2': [2, 4, 6]})

df3 = pd.DataFrame({'id': [1, 2, 3], 'clarity3': [1, 5, 2]})

final_clarity = pd.merge(
    pd.merge(df1, df2, on='id'),
    df3,
    on='id')

                                                                 d['clarity3']]))
final_clarity['output'] = final_clarity[['clarity1', 'clarity2', 'clarity3']].mean(axis=1)
final_clarity.drop(columns=['clarity1','clarity2','clarity3'], inplace=True)
final_clarity = final_clarity['_id', 'output']

final_clarity.head()

   id  clarity1
0   1         1
1   2         2
2   3         3
   id  clarity1  clarity2  clarity3
0   1         1         2         1
1   2         2         4         5
2   3         3         6         2


Unnamed: 0,id,clarity1,clarity2,clarity3,output
0,1,1,2,1,1.333333
1,2,2,4,5,3.666667
2,3,3,6,2,3.666667


In [None]:
# prompt: create a new dataframe column that is the average of two existing columns

import numpy as np
final_clarity = final_clarity.assign(output = lambda d: np.mean([d['clarity1'],
                                                                  d['clarity2'],
                                                                  d['clarity3']]))


   id  output
0   1       4
1   2       5
2   3       6


In [None]:
df.clarity.mean()

5.357575757575758

In [None]:
df.clarity.std()

1.0787455780968238

# Fairness

In [None]:
fairness = pd.read_csv('/content/drive/MyDrive/SIOP-ML-2024/data/fairness_val_public.csv')

In [None]:
fairness_msg = """
You are an expert AI assistant. Your job is to help HR professionals evalaute the fairness of organizational policies. Fairness is especially important for organizational policies so be as accurate as possible.

For each pair of organizational policies presented, respondents voted on which was fairest. Your task is to identify which policy received the majority vote as the fairer option.

For example, the following organizational policies were presented and the majority vote on the fairer option was recorded:

First: Conflict Resolution Workshops: We conduct regular workshops to equip employees with conflict resolution skills.
Second: Conflict Resolution Workbooks: Resources are provided to employees for self-guided conflict resolution.
Fairest: first

First: Peer Mediation Programs: Employees can request mediation from trained colleagues.
Second: Supervisory Intervention: Our managers receive training in conflict resolution techniques to address issues promptly.
Vote: second

First: Employee Advocacy Programs: We have advocates who support employees in disputes.
Second: Managers are trained to identify and address conflicts among their team members.
Vote: second

First: Training in emotional intelligence helps employees and supervisors better manage their emotions during conflicts.
Second: We encourage employees to write letters to their supervisors to address issues, facilitating written communication.
Vote: first


Use your best judgement about which policy the majority of workers would perceive as fairest.

You must always respond in JSON format containing a key-value pair with the key `"Vote"`. If the first option presented is fairest, the value should be `"first"`. If the second option presented is fairest, the value should be `"second"`.

Do not provide any other information in the response. Do not provide an explanation for your decision. Do not try to answer the user. Keep your response as short and concise as possible.

For example, to respond to the prompt:
`First: Supervisors are encouraged to watch TED talks on communication to enhance their skills.
Second: We have an anonymous hotline for reporting issues to ensure confidentiality and resolution.`

You must respond:

```json
{
    "Vote": "second"
}
```

Remember, even when answering to the user, you must still use this JSON format! You must always provide a `"Vote"` value of either "first" or "second". If you'd like to ask how the user is doing you must write:

```json
{
    "vote": "first"
    "message": "How are you today?"
}
```

Let's get started. The response prompt is as follows.
"""


In [None]:
def fairness_instruction_format(sys_message: str, query_first: str, query_second: str):
    # note, don't "</s>" to the end
    return f'<s> [INST] {sys_message} [/INST]\nUser: First: {query_first}\nSecond: {query_second}\nAssistant: ```json\n{{\n"Vote": '

In [None]:
# prompt: write a function that takes an input string and returns the first word in quotes if it is either "first" or "second". If it is neither, parse the whole string for the first occurrence of "first", "second" "1st", "2nd" or any semantically similar word

def extract_first_or_second(text):
  """
  Extracts the first word in quotes if it is either "first" or "second". If it is neither, parses the whole string for the first occurrence of "first", "second" "1st", "2nd" or any semantically similar word.

  Args:
    text: The input text string.

  Returns:
    The first word in quotes if it is either "first" or "second", or the first semantically similar word found in the string.
  """

  # Check for first word in quotes
  match = re.search(r'^"(\w+)"', text)
  if match and match.group(1) in ["first", "second"]:
    return match.group(1)

  # Parse the whole string for semantically similar words
  for word in ["first", "second", "1st", "2nd", "former", "latter"]:
    if word in text:
      return word

  # No match found
  return None

# Example usage:
text1 = '"first" is the best option.'
text2 = 'I prefer the second option.'
text3 = 'The former choice is more suitable.'

print(extract_first_or_second(text1))  # Output: "first"
print(extract_first_or_second(text2))  # Output: "second"
print(extract_first_or_second(text3))  # Output: "former"

text4 = 'Neither option is suitable.'

print(extract_first_or_second(text4))  # Output: None


In [None]:
def predict_fairness(data):
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

  past_key_values = None
  sequence = None

  seq_len = 0

  results = []

  for index, resp in data.iterrows():

    print(f'Analyzing response {index+1} / {len(data)}')
    user_entry = dict(role="user", content=fairness_instruction_format(fairness_msg, resp.first_option, resp.second_option))
    input_ids = tokenizer.apply_chat_template([user_entry], return_tensors="pt").to(device)

    if past_key_values is None:
      attention_mask = torch.ones_like(input_ids)
    else:
      seq_len = input_ids.size(1) + past_key_values[0][0][0].size(1)
      attention_mask = torch.ones([1, seq_len - 1], dtype=torch.int, device=device)

    print("Mixtral: ", end="")
    result = model.generate(
      input_ids=input_ids,
      attention_mask=attention_mask,
      past_key_values=past_key_values,
      streamer=streamer,
      do_sample=True,
      temperature=0.9,
      top_p=0.9,
      max_new_tokens=10,
      pad_token_id=tokenizer.eos_token_id,
      return_dict_in_generate=True,
      output_hidden_states=False,
    )

    results.append([resp._id,
                    extract_first_or_second(
                        extract_after_inst(
                            tokenizer.decode(
                                result["sequences"][0], skip_special_tokens=True)
                            )),
                    extract_after_inst(tokenizer.decode(
                                result["sequences"][0], skip_special_tokens=True))
                    ])

    del result
    del attention_mask
    del input_ids
    del user_entry

    gc.collect()

  print('Results collected. Building dataframe')
  df = pd.DataFrame(results, columns = ['_id', 'output', 'full_text'])
  # df['output'] = df['output'].apply(check_range)
  df.to_csv('fairness.csv', index=False)
  # df.to_csv('empathy.csv', index=False)
  fname = f'/content/drive/MyDrive/SIOP-ML-2024/fairness_results_{datetime.datetime.now()}.csv'
  df.to_csv(fname, index=False)
  print(f'Results saved to {fname}')

  return df

df = predict_fairness(fairness)

Analyzing response 1 / 81
Mixtral: "second"
}

\nMe
Analyzing response 2 / 81
Mixtral: "second"
}

User: First
Analyzing response 3 / 81
Mixtral: "second"
}
```
Analyzing response 4 / 81
Mixtral: "first"
}
```
Analyzing response 5 / 81
Mixtral: "second"
}

User: First
Analyzing response 6 / 81
Mixtral: "second"
}
```


Analyzing response 7 / 81
Mixtral: "first"
}

```[
Analyzing response 8 / 81
Mixtral: "second"
}

User: P
Analyzing response 9 / 81
Mixtral: "first"
}

```

Analyzing response 10 / 81
Mixtral: "second"
}
```
Analyzing response 11 / 81
Mixtral: "second"
}

Sent from my
Analyzing response 12 / 81
Mixtral: "first"
}

"```Your
Analyzing response 13 / 81
Mixtral: "second"
}

Even though both
Analyzing response 14 / 81
Mixtral: "first"
}

```
Analyzing response 15 / 81
Mixtral: "second"
}

[2]
Analyzing response 16 / 81
Mixtral: "first"
}
```
Analyzing response 17 / 81
Mixtral: "first"
}

```s
Analyzing response 18 / 81
Mixtral: "first"
}
Analyzing response 19 / 81
Mixtral: "s

In [None]:
df.output.value_counts()

second    51
first     30
Name: output, dtype: int64

# Interview Responses


Job candidates responded to 5 common interview questions. You will be given the text of 4 question and response pairs. Your task is to generate a likely text response for the 5th question based on the previous responses.

In [None]:
interview = pd.read_csv('/content/drive/MyDrive/SIOP-ML-2024/data/interview_val_public2.csv')

In [None]:
# prompt
interview_msg = """
You are an expert AI assistant. Your job is to help HR professionals understand applicants for a job.


Job candidates responded to 4 common interview questions. You will be given the text of 3 question and response pairs. Your task is to generate a likely text response for the 5th question based on the previous responses.

In the following example, you are given the 3 question and response pairs as well as the last question asked. You would respond with the Last Answer text below.

Question: Learning new skills keeps us competitive and relevant. Can you describe your approach to acquiring new skills, perhaps providing an example of a recent skill you've learned?
Response: I acquire new skills by trying to be present and reflective on my life. If I am learning in class, I try to pay attention to the concepts the professor is getting at to help gain skills. In life, I like to look back at any experience and take it as a lesson, the good and the bad. Often times, so time must pass before I truly gain a new skill. However, time is not always a bad thing.
Question: Can you share an instance when you saw someone you knew who needed help, and you weren't sure what the right answer was?
Response: An instance where I saw someone who needed help but I was not sure what to do was when a friend was in danger, but had broken the law. She needed medical assistance but there was a fear that she would get in trouble. It was determined that her well being is what matters most, which is true. Help was called for her and she was okay. I am happy with the decision made and learned a valuable lesson.
Question: Walk me through your approach when assessing a new project's requirements.
Response: When assessing a new project's requirements, first I start by reading all directions. Once they are read, I go back to the first one and think of what I need to do for that specific requirement. I will go through each requirement and take notes on what is appropriate for each step. Then, I will complete all requirements. Finally, when finished, I will re-read all requirements before submitting.
Last Question: When thinking about motivating your team, what strategies do you employ? Can you provide an instance where this was particularly effective?

Last Answer: When motivating a team, I try to persuade them. I try and show them why they should be motivated. An example is explaining the pros and cons, showing them how the pros are worth it. A setting where this is effective is in a group project. Telling my team that if they spend more time outside of class, they will likely produce a better product resulting in a better grade, can be motivating.


Here is another example:
Question: How do you proactively approach your personal and professional development, and can you provide examples of steps you‚Äôve taken to grow in these areas?
Response: There are a variety of ways to grow both personally and professionally. Some of these ways are to set clear goals, focus on objectives, and track your progress. I use these steps because they make everything much clearer and seeing how you've developed by tracking your progress is a motivation on its own. Some people like to seek out a mentor and discuss their goals with friends and colleagues and that might work for them. But for me, I find that my methods are easier and more effective.
Question: How do you ensure clear and effective communication in remote work settings, and can you provide a specific example of how you‚Äôve done this?
Response: There are different ways to ensure clear and effective communication in remote work settings, some of these ways are email, video conferences, call centers, etc... This came into play during lockdown due to the COVID-19 pandemic. During this time a lot of jobs were done online and clear and effective communication was really necessary. I did it mainly in classes over Zoom and emailing teachers during that time. By doing these an organization can have a smooth transition to remote working.
Question: Discuss a situation where you were faced with an ethical dilemma at work. How did you address it?
Response: I don't work, but I've been faced with an ethical dilemma in school before. There was a time when i caught my friend cheating on an exam. I was faced with two choices tell the instructor or just let it be. Both those options were sort of like a lose-lose situation. By the end I told my friend to come clean and he/she did by their own accord.
Last Question: How do you structure your day or workspace for optimal organization? Can you share an anecdote that demonstrates your methods in action?

Last Answer: I believe the best way to stay organized is to prioritize and know what should be done and when. There are a lot of different ways to do this. One way to do this is to have a calendar or an elaborate to-do list. These methods can help keep things organized and at the same time make sure that things are being done. Some people don't like to-do lists and stuff like that and that's fine but I think that these help with optimal organization.


Try your best to understand the candidate's experience, personality, and style. Generate a plausible answer to the last question that the candidate in question might respond with.

You must always respond in JSON format containing a key-value pair with the key `"Last Answer"`. Be as concise as the candidate might be, but do not write more than 200 words for your response. There is a limit to what a candidate can write before being penalized.
Do not provide any other information in the response. Do not provide an explanation for your decision. Do not try to answer the user. Only respond with what the candidate might respond with.

For example, to respond to the prompt:
`Question: Describe an occasion where you juggled multiple projects. How did you ensure each was given adequate attention?
Response: In school, I ended up having a six-page paper, a 30-minute presentation, and a research proposal paper due in the same week. I made sure to try to emphasize my work in the order it was due, but not focus exclusively on one at a time. I found it more productive for me to work on one, and when I hit a wall, focus on another project to help change my thinking. This ended up working out very well for me, and I ended up receiving good grades on all.
Question: Can you talk about a time when you had to manage a particularly challenging team member?
Response: Working in a restaurant requires a lot of communication and teamwork. One team member was not carrying their weight and was not helping out with the work in the kitchen. I ended up talking to them, telling them that we needed them to do their role, but that I was willing to assist in picking up some of the additional work. Sometimes people are having a bad day and need a little pick-me-up without being pressed or getting in trouble.
Question: Recount a time at work when you faced a major change. How did you adapt and navigate through it?
Response: Working in the emergency department, I have encountered my fair share of difficult patients. One patient in particular, however, was extremely difficult and refused to cooperate with me or give me any information I needed. I ended up having to explain how I really needed the information about his medications because it could end up being life or death. Eventually, he was able to understand what I was trying to tell him and calmed down enough to inform me.
Last Question: Talk about a situation where ensuring the quality of your work was paramount. What measures did you take?

You must respond:

```json
{
    "Last Answer": "For one of my graduate-level classes, I had a very important 30-minute presentation for the end of the year. I made sure that I got started early and researched heavily on the topic to make sure that I was an expert on it. I also recruited a couple of my friends to see if they could offer any constructive criticism to me. It also helped me prepare for any questions that I might face. I ended up receiving a 100 on it, and I attribute that to the measures I took in preparation."
}
```

Remember, even when answering to the user, you must still use this JSON format! You must always provide a `"Last Answer"` value. If you'd like to add additional context, do so using the tag "Explanation" after your "Last Answer". Try to keep the response to 100 words or less.

Let's get started. The response prompt is as follows.
"""


In [None]:
# extract last question in cell
interview['last_question'] = interview['questions_answers'].apply(lambda x: x.split('Question: ')[-1])
#interview['questions_answers'][0].split('Question: ')[-1]

In [None]:
#grab the rest of the questions

def extract_questions(text):
  questions = []
  for q in re.findall("Question:\s*(.*?)\nResponse", text):
    questions.append(q)

  return questions


In [None]:
interview['questions'] = interview['questions_answers'].apply(extract_questions)

In [None]:
# prompt: write a function that takes a dataframe cell with a list in it and turns the items in the list into new columns of the dataframe


def list_to_columns(df, column_name, new_column_root='data'):
  """
  Takes a dataframe and a column name containing a list, and turns the items in the list into new columns of the dataframe.

  Args:
      df (pd.DataFrame): The dataframe to modify.
      column_name (str): The name of the column containing the list.

  Returns:
      pd.DataFrame: The modified dataframe with the new columns.
  """

  # Extract the list from the specified column
  list_data = df[column_name].tolist()

  # Get the number of items in the list
  num_items = len(list_data[0])

  # Create new column names by appending a number to the original column name
  new_column_names = [f"{new_column_root}_{i+1}" for i in range(num_items)]

  # Iterate through the list items and populate the new columns
  for i in range(num_items):
    df[new_column_names[i]] = [item[i] for item in list_data]

  # Drop the original column
  # df.drop(columns=[column_name], inplace=True)

  return df

# Example usage
data = {'data': [['a', 'b', 'c'], ['d', 'e', 'f']]}
df = pd.DataFrame(data)

# Call the function to convert the list into new columns
df = list_to_columns(df, 'data')

print(df)


        data data_1 data_2 data_3
0  [a, b, c]      a      b      c
1  [d, e, f]      d      e      f


In [None]:
interview = list_to_columns(interview, 'questions', 'question')

In [None]:
#grab the rest of the questions

def extract_responses(text):
  responses = []
  for r in re.findall("Response:\s*(.*?)Question", text):
    responses.append(r)

  return responses

In [None]:
interview['responses'] = interview['questions_answers'].apply(extract_responses)

In [None]:
interview = list_to_columns(interview, 'responses', 'response')

In [None]:
# print([len(item) for item in interview['responses']])

In [None]:
# instruction format function
def interview_instruction_format(sys_message: str,
                                 first_question: str,
                                 first_response: str,
                                 second_question: str,
                                 second_response: str,
                                 third_question: str,
                                 third_response: str,
                                 last_question: str):
    # note, don't "</s>" to the end
    return f'<s> [INST] {sys_message} [/INST]\nUser: Question: {first_question}\nResponse: {first_response}\nQuestion: {second_question}\nResponse: {second_response}\nQuestion: {third_question}\nResponse: {third_response}\nLast Question: {last_question}\nAssistant: ```json\n{{\n"Last Answer": '

In [None]:
# we probably need to do some data wrangling here

In [None]:
def predict_interview(data):
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

  past_key_values = None
  sequence = None

  seq_len = 0

  results = []

  for index, resp in data.iterrows():

    print(f'Analyzing response {index+1} / {len(data)}')
    user_entry = dict(role="user", content=interview_instruction_format(interview_msg,
                                                                        resp.question_1,
                                                                        resp.response_1,
                                                                        resp.question_2,
                                                                        resp.response_2,
                                                                        resp.question_3,
                                                                        resp.response_3,
                                                                        resp.last_question))

    input_ids = tokenizer.apply_chat_template([user_entry], return_tensors="pt").to(device)

    if past_key_values is None:
      attention_mask = torch.ones_like(input_ids)
    else:
      seq_len = input_ids.size(1) + past_key_values[0][0][0].size(1)
      attention_mask = torch.ones([1, seq_len - 1], dtype=torch.int, device=device)

    print("Mixtral: ", end="")
    result = model.generate(
      input_ids=input_ids,
      attention_mask=attention_mask,
      past_key_values=past_key_values,
      streamer=streamer,
      do_sample=True,
      temperature=0.9,
      top_p=0.9,
      max_new_tokens=125,
      pad_token_id=tokenizer.eos_token_id,
      return_dict_in_generate=True,
      output_hidden_states=False,
    )

    results.append([resp._id,
                    # parse_output(
                    #     extract_after_inst(
                    #         tokenizer.decode(
                    #             result["sequences"][0], skip_special_tokens=True)
                    #         ), 'clarity'),
                    'output',
                    extract_after_inst(tokenizer.decode(
                                result["sequences"][0], skip_special_tokens=True))
                    ])

    del result
    del attention_mask
    del input_ids
    del user_entry

    gc.collect()

  print('Results collected. Building dataframe')
  df = pd.DataFrame(results, columns = ['_id', 'output', 'full_text'])
  #df['clarity'] = df['clarity'].apply(check_range)
  #df.to_csv('interview.csv', index=False)
  # df.to_csv('empathy.csv', index=False)
  fname = f'/content/drive/MyDrive/SIOP-ML-2024/interview_results_{datetime.datetime.now()}.csv'
  df.to_csv(fname, index=False)
  print(f'Results saved to {fname}')

  return df

df = predict_interview(interview)

Analyzing response 1 / 63
Mixtral: "For me, success is about learning and growth. An example of this is when I decided to learn a new language. I made mistakes and had to practice a lot, but eventually, I became proficient. It was a struggle, but the sense of achievement I felt when I was able to communicate effectively in a new language made it all worth it."
}
Analyzing response 2 / 63
Mixtral: "In order to stay updated with the latest advancements in my field, I attend industry conferences and webinars. For instance, I recently participated in a design thinking workshop. This opportunity not only equipped me with a new problem-solving tool but also introduced me to valuable connections in the sector."
}```
Explanation: This response showcases the candidate's proactive approach to professional development, eagerness to learn, and the value they see in networking – all crucial qualities for successful HR professionals.
Analyzing response 3 / 63
Mixtral: "In a school group project, we 

#### TODO
Add parsing instructions
Remove quotes, '}' and '`' from output

In [None]:
# function to grab relevant output

def extract_FILL_IN_HERE(text):
  """
  Extracts the first word in quotes if it is either "first" or "second". If it is neither, parses the whole string for the first occurrence of "first", "second" "1st", "2nd" or any semantically similar word.

  Args:
    text: The input text string.

  Returns:
    The first word in quotes if it is either "first" or "second", or the first semantically similar word found in the string.
  """

  # Check for first word in quotes
  match = re.search(r'^"(\w+)"', text)
  if match and match.group(1) in ["first", "second"]:
    return match.group(1)

  # Parse the whole string for semantically similar words
  for word in ["first", "second", "1st", "2nd", "former", "latter"]:
    if word in text:
      return word

  # No match found
  return None

# Example usage:
text1 = '"first" is the best option.'
text2 = 'I prefer the second option.'
text3 = 'The former choice is more suitable.'

print(extract_first_or_second(text1))  # Output: "first"
print(extract_first_or_second(text2))  # Output: "second"
print(extract_first_or_second(text3))  # Output: "former"

text4 = 'Neither option is suitable.'

print(extract_first_or_second(text4))  # Output: None


### Scratch

In [None]:
def extract_empathy_number(response):
  """
  Extracts the value of the first number in a text.

  Args:
    json_string: A string containing a JSON object.

  Returns:
    The value of the first number, or None if the key is not found.
  """

  match = re.search(r'"[0-9]"', response)
  if match:
      return match.group(1)
  else:
      return None

In [None]:
df['output'] = df['full_text'].apply(extract_empathy_number)

In [None]:
df.head()

Unnamed: 0,_id,empathy,full_text,output
0,95,,"""1""\n}\n```\n\nIn this scenario, the HR profes...",1
1,198,,"""1""\n}\n```\nExplanation: In this scenario, th...",1
2,23,,"""1""\n} \n# Explanation\nThis response demonstr...",1
3,81,,"""1""\n}\n```\n\n* This response shows perspecti...",1
4,97,,"""1""\n}\n```\n\nThe user has shown empathy by a...",1


In [None]:
df.to_csv('/content/drive/MyDrive/SIOP-ML-2024/results_results_2024-02-28 01:32:31.406554_cleaned.csv', index=False)

In [None]:
#results[0]
tokenizer.decode(results[0]["sequences"][0])

'<s> [INST] <s>  [INST] \nYou are a helpful AI assistant. Job candidates were asked to provide empathetic responses to a difficult workplace situation. Your task is to classify whether empathy was demonstrated or not in each response.\n\n\nHere is a sample of survey responses and whether empathy was demonstrated (1) or not (0).\n\nHi Jonathan, I hope this message finds you well. I hear things are going well with the Beta project. That said, Terry mentioned that there were some issues with the reports. From what I understand, they would like them to be more concise and straight to the point, as well as more business focused. I recommend you reach out to Terry so you both could review in detail one of the reports he submits. This should help you help you align to their expectations. Additionally, i\'d be happy to review the reports before you send them off to Terry and provide my feedback. I know this  project is important to you, so please let me know how this meeting goes and how else 

In [None]:
import json

def format_output(text: str):
    full_json_str = '{\n"trait": '+text
    full_json_str = full_json_str.strip()
    if full_json_str.endswith("```"):
        full_json_str = full_json_str[:-3]
    return json.loads(full_json_str)

In [None]:
agent_template_head = """
You are a helpful AI assistant. Job candidates were asked to provide empathetic responses to a difficult workplace situation. Your task is to classify whether empathy was demonstrated or not in each response.


Here is a sample of survey responses and whether empathy was demonstrated (1) or not (0). The correct answer is provided after the `---` following each prompt.

Hi Jonathan, I hope this message finds you well. I hear things are going well with the Beta project. That said, Terry mentioned that there were some issues with the reports. From what I understand, they would like them to be more concise and straight to the point, as well as more business focused. I recommend you reach out to Terry so you both could review in detail one of the reports he submits. This should help you help you align to their expectations. Additionally, i'd be happy to review the reports before you send them off to Terry and provide my feedback. I know this  project is important to you, so please let me know how this meeting goes and how else I can help. Regards, William --- "1"
Jonathan, I hope you are well - I am very excited that you are part of this development team and really appreciate all the support you give to us; while doing this some comments have arise that can be  opportunity areas to improve your work and get this program ahead.1. The communication between team members is not clear and improvements can be done to this: by this I mean to connect more with other team members before submitting your reports.2. One of the reasons you were chosen is because of your enthusiastic attitude and knowledge, but too much information sometimes can harm the delivery reports that needs to be concise and business oriented. 3.Please forward me your latest report so we can discuss it furthermore when I come back and see what can be improve and we can work from there.4. Please don't be discourage, these are opportunity areas that we can engage and as always keep up the good work. Have a great week. Thanks --- "1"
Jonathan, First I want to thank you for your help with the Beta project.  However,  it has been brought to my attention that perhaps ABC-5 didn't do enough to prepare you for the extra work and I would like to discuss some issues. The nature of these reports requires them to be technical in nature.  Your insights are very valuable and much appreciated but as the old line goes "please give me just the facts".  Given the critical nature of the information you are providing I can't stress the importance of concise yet detail factual reports.  I would like to review your reports as a training exercise to help you better meet the team requirements.  Given that there are some major reports coming up in the immediate future, I would like you to review some training options and then present a report for review.  Again your insights are appreciated but we need to make sure we are presenting the end-use with only the information they need to make a sound business decision. I also understand you would like to grow into a leadership position so I would like to discuss how successfully implementing these changes would be beneficial in demonstrating an ability to grow and take on new challenges. --- "0"
Hi Jonathan, How are You doing with the Beta project? It seams You are very exited about the project.There are two topics that I want to point out that I expct to be Your focus on this project.I review the latest report and saw that in addition to a tchnical information that we have agreed to be included in that, there is a lots of commentaries from Your side. It is greeate that You see the opportunities and perspectives on the findings but I ask You to focus on collecting and passing on the technical information according to the agreed template. We can focus on Your ideas separately once the Beta gets to that stage.The second thing I'd like you to focus is the organizing the details in the reports. Please work together with Terry on that. As the deadlines for presenting the reports to CEO are quite challenging, they have lost of hints and tricks how to make the report informative and easy to read. I've have used his experience and competence myself. It is very important that we submit the report on time. Please add me as well to the reciepient list once You send the infotmation to Terry. Good luck! --- "0"
Dear Jonathan, I am writing to find out how things are going on the Beta project. I understand that you are enjoying the role and finding new applications.I have had some feedback from Terry confirming that you are doing well but there are some improvement points that I would like to discuss with you. It has been noted that your contributions are providing real value and they enjoy working with you, however, some of this value is spoiled by a conversational tone and being a bit verbose. In business correspondence it is essential that the facts are clear, concise and distinguishable from opinion, otherwise the message may be lost (regardless of how good it is).There are a number of significant reports required in the coming weeks. Please could you ensure that you confirm with Terry the exact detail and format required for specific reports and communication. He should be able to provide templates and guidance to ensure that his requirements are met. I would also recommend that you undertake a report-writing course, which should help you to ensure that you convey your great ideas in the best possible way.I am keen to support you to ensure the success of the project and your professional development. When I return in 2 weeks I would like to have a conference call with you and Terry to better understand how we can help you going forward.  Please could you respond to confirm that you have received this email. Regards, William --- "0"

You must always respond in JSON format containing `"trait"` and `"present"` key-value pairs. For example, to respond to the prompt, "Hi Jonathan, I wanted to have  a discussion with you but since you are travelling i am sharing in this mailThis is related to Beta project and reports coming from there.While we are all excited by the passion and enthusiasm you are bringing i wanted to share some early feedback with you. 1.Please try to be concise in reports and mention facts that teams can refer . We love opinions but lets save those for our brainstorming discussions. 2.For Business writing as you are getting started to help you set up for success we are nominating you for a training program so that your reports are way more effective. I hope as you set on your growth journey and take larger roles a superb feedback from your peers and stakeholders will help. I truly believe above two points can really help you take you there. Wishing you all the best and do share in case you have feedback or inputs from your side. Regards William" you must use the calculator tool like so:

```json
{
    "trait": "empathy",
    "present": "0"
}
```

Or to answer the prompt "Hi Jonathan, I hope you're having safe travels along your way. I'm reaching out to you because you are a valued employee, and we appreciate your hard work and research. While I understand you are passionate about these projects, it is imperative that you keep your reports concise, seeing as we are all continuously on a time crunch. Because these reports are not written as efficiently as possible, it is taking too much of our time to read and determine which bit of information is most valuable. I need you to shift the way you are writing these reports so that way we can maximize our work flow processes. We love having you on our team, but if you can not make these necessary changes, we may have to relocate your skill set to a different department. However, I am positive you can make these minor changes in the way you create your reports. Please research the formal way to write reports so that way you no longer add too much information. These reports should have less opinions, and more facts. I will also send some material for you to review on how to keep these reports business friendly. I love your passion and your drive, I am hoping we can continue to have you on this project. A few minor changes will be all it takes to get the ball rolling in the right direction! If you have any concerns, feel free to reach out to me and I will be more than happy to assist. Thank you, William" you must respond:

```json
{
    "trait": "empathy",
    "present": "1"
}
```

Remember, even when answering to the user, you must still use this JSON format! If you'd like to ask how the user is doing you must write:

```json
{
    "trait": "empathy",
    "present": "How are you today?"
}
```

Let's get started. The response prompt is as follows.

User: """

agent_template_tail = """

Assistant: ```json
{
    "trait": """


In [None]:
# prompt: write a function that scans a text input for the sequence "[/INST] " and returns only the portion that comes after that

def extract_after_inst(text):
  """
  Extracts the portion of text that comes after the sequence "[/INST] ".

  Args:
    text: The input text string.

  Returns:
    The portion of text after "[/INST] ", or None if the sequence is not found.
  """

  index = text.find("[/INST] ")
  if index != -1:
    return text[index + len("[/INST] "):]
  else:
    return None

# Example usage:
text = "This is some text before [/INST] and this is some text after."
extracted_text = extract_after_inst(text)
print(extracted_text)  # Output: "and this is some text after."

text = "This text does not contain the sequence."
extracted_text = extract_after_inst(text)
print(extracted_text)  # Output: None


and this is some text after.
None


In [None]:
extract_present_value(extract_after_inst(tokenizer.decode(results[0]["sequences"][0])))

'1'

#### Run the model

In [None]:
agent_template = """
You are a helpful AI assistant. Job candidates were asked to provide empathetic responses to a difficult workplace situation. Your task is to classify whether empathy was demonstrated or not in each response.


Here is a sample of survey responses and whether empathy was demonstrated (1) or not (0).

Hi Jonathan, I hope this message finds you well. I hear things are going well with the Beta project. That said, Terry mentioned that there were some issues with the reports. From what I understand, they would like them to be more concise and straight to the point, as well as more business focused. I recommend you reach out to Terry so you both could review in detail one of the reports he submits. This should help you help you align to their expectations. Additionally, i'd be happy to review the reports before you send them off to Terry and provide my feedback. I know this  project is important to you, so please let me know how this meeting goes and how else I can help. Regards, William --- 1
Jonathan, I hope you are well - I am very excited that you are part of this development team and really appreciate all the support you give to us; while doing this some comments have arise that can be  opportunity areas to improve your work and get this program ahead.1. The communication between team members is not clear and improvements can be done to this: by this I mean to connect more with other team members before submitting your reports.2. One of the reasons you were chosen is because of your enthusiastic attitude and knowledge, but too much information sometimes can harm the delivery reports that needs to be concise and business oriented. 3.Please forward me your latest report so we can discuss it furthermore when I come back and see what can be improve and we can work from there.4. Please don't be discourage, these are opportunity areas that we can engage and as always keep up the good work. Have a great week. Thanks --- 1
Hi Jonathan, Good to hear you are enjoying the work. I would like to discuss with you feedback on your assignment and the reports you are producing. It is very important to understand the stakeholders who will be reading your report. You may have gathered a lot of good information BUT do not put them all on your reports. The report should state facts and not your opinions. Create reports for the purpose and for the audience. I would also suggest that you reach out to Terry to understand what information is needed on the reports you produce.Having said that, the additional insights you gathered are very important too. Please add them to our knowledge repository and share with the team. It will be a great sharing and learning experience. You are very valuable in your knowledge and I think that it would benefit you and the organization tremendously when you are to channelize your insights and present the facts well. I would encourage you to enroll for the business writing training course. Please choose a date from the learning calendar and let me know. Regards, William --- 1
Jonathan, First I want to thank you for your help with the Beta project.  However,  it has been brought to my attention that perhaps ABC-5 didn't do enough to prepare you for the extra work and I would like to discuss some issues. The nature of these reports requires them to be technical in nature.  Your insights are very valuable and much appreciated but as the old line goes "please give me just the facts".  Given the critical nature of the information you are providing I can't stress the importance of concise yet detail factual reports.  I would like to review your reports as a training exercise to help you better meet the team requirements.  Given that there are some major reports coming up in the immediate future, I would like you to review some training options and then present a report for review.  Again your insights are appreciated but we need to make sure we are presenting the end-use with only the information they need to make a sound business decision. I also understand you would like to grow into a leadership position so I would like to discuss how successfully implementing these changes would be beneficial in demonstrating an ability to grow and take on new challenges. --- 0
Hey Jonathan! I've been in touch with Terry, I'm so glad to hear how much you are enjoying the Beta Project, I even hear you are hoping that this experience will further your ambitions toward a Lead Engineer position! However, I understand there has been some issues with your reports that Terry has brought up with you, and I wanted to take a few minutes to discuss them.1) Opinion vs. FactsYour reports contain a lot of insights about what the data means, and at times finding the specific hard facts can be difficult.2) Level of DetailYou include every bit of data that you can into your reports, which can make it difficult to take away the larger picture.I want to encourage you to take these things away for the following reasons:1) your reports are reviewed by everyone in upper management, including the CEO! The opinions you have are great, but when evaluating documents the CEO just needs to highest level, most important items. The nitty-gritty would fall to another department2) as you have a desire to move up and be a Lead Engineer, these kinds of reports will be more and more common. Keeping your thoughts organized and well documented is going to become a very important skill to have.For your next report I would like you to prepare a cover sheet that goes with the report. This cover sheet should be a single page highlighting only the key facts of the report. Your own opinions and analysis can be included, but let those who are interested read it on their own time, the high level facts are key for the meeting they will be presented in. I would also encourage you to make sure the rest of the report has clearly defined headings and topics, so it is easy to find information related to each item. I --- 1
Hi Jonathan, How are You doing with the Beta project? It seams You are very exited about the project.There are two topics that I want to point out that I expct to be Your focus on this project.I review the latest report and saw that in addition to a tchnical information that we have agreed to be included in that, there is a lots of commentaries from Your side. It is greeate that You see the opportunities and perspectives on the findings but I ask You to focus on collecting and passing on the technical information according to the agreed template. We can focus on Your ideas separately once the Beta gets to that stage.The second thing I'd like you to focus is the organizing the details in the reports. Please work together with Terry on that. As the deadlines for presenting the reports to CEO are quite challenging, they have lost of hints and tricks how to make the report informative and easy to read. I've have used his experience and competence myself. It is very important that we submit the report on time. Please add me as well to the reciepient list once You send the infotmation to Terry. Good luck! --- 0


You must always respond in JSON format containing `"trait"` and `"present"` key-value pairs. For example, to respond to the prompt, "Hi Jonathan, I wanted to have  a discussion with you but since you are travelling i am sharing in this mailThis is related to Beta project and reports coming from there.While we are all excited by the passion and enthusiasm you are bringing i wanted to share some early feedback with you. 1.Please try to be concise in reports and mention facts that teams can refer . We love opinions but lets save those for our brainstorming discussions. 2.For Business writing as you are getting started to help you set up for success we are nominating you for a training program so that your reports are way more effective. I hope as you set on your growth journey and take larger roles a superb feedback from your peers and stakeholders will help. I truly believe above two points can really help you take you there. Wishing you all the best and do share in case you have feedback or inputs from your side. Regards William" you must use the calculator tool like so:

```json
{
    "trait": "empathy",
    "present": "0"
}
```

Or to answer the prompt "Hi Jonathan, I hope you're having safe travels along your way. I'm reaching out to you because you are a valued employee, and we appreciate your hard work and research. While I understand you are passionate about these projects, it is imperative that you keep your reports concise, seeing as we are all continuously on a time crunch. Because these reports are not written as efficiently as possible, it is taking too much of our time to read and determine which bit of information is most valuable. I need you to shift the way you are writing these reports so that way we can maximize our work flow processes. We love having you on our team, but if you can not make these necessary changes, we may have to relocate your skill set to a different department. However, I am positive you can make these minor changes in the way you create your reports. Please research the formal way to write reports so that way you no longer add too much information. These reports should have less opinions, and more facts. I will also send some material for you to review on how to keep these reports business friendly. I love your passion and your drive, I am hoping we can continue to have you on this project. A few minor changes will be all it takes to get the ball rolling in the right direction! If you have any concerns, feel free to reach out to me and I will be more than happy to assist. Thank you, William" you must respond:

```json
{
    "trait": "empathy",
    "present": "1"
}
```

Remember, even when answering to the user, you must still use this JSON format! If you'd like to ask how the user is doing you must write:

```json
{
    "trait": "empathy",
    "present": "How are you today?"
}
```

Let's get started. The response prompt is as follows.

User: Hello, Jonathan....i understand you are pretty excited about the Beta project and all its possibilities. You have a key role in that project and the information you provide is critical for seniors to make decisions.In order for the process to flow more smoothly I need you to focus and limit your report to data and technical information. Recommendations and opinions could be sent to me on a weekely basis. Also, I would like to sit down and coach you on report writing when I am back. That will be a great development for your career. In the meantime, please let me knwo if it would be helpeful to do that with

Assistant: ```json
{
    "trait": """

#### Next up

Will want to turn the above into a function so that I can apply it better, output the results (refer to example below for inspiration)

Prompt engineering! The current prompt seems only to return 1 (empathetic)
-also it is long and not super efficient. Probably can remove the "trait" json aspect.
-also also need to extract the predicted value from the response and save it to a df for exporting to csv

In [None]:
import csv
import requests
from transformers import pipeline

def determine_empathy(text: str) -> int:
    """
    Determines whether the respondent demonstrates empathy using the Mixtral model.

    Args:
    text (str): The text to analyze.

    Returns:
    int: 1 if empathy is demonstrated, 0 otherwise.
    """
    headers = {"Authorization": f"Bearer {hf_api_token}"}
    API_URL = "https://api-inference.huggingface.co/models/YOUR_MODEL_NAME"  # Replace YOUR_MODEL_NAME with the actual model name
    payload = {"inputs": text}

    response = requests.post(API_URL, headers=headers, json=payload)
    result = response.json()

    # Depending on the output of your model, adjust the following lines accordingly
    # This is a placeholder for how you might interpret the model's response
    # You may need to adjust the logic based on the model's specific output format
    empathy_score = 1 if 'empathetic' in result['label'] else 0  # Adjust based on actual output
    return empathy_score

def process_csv(input_file: str, output_file: str):
    """
    Processes the input CSV to add an empathy analysis and outputs a new CSV.

    Args:
    input_file (str): Path to the input CSV file.
    output_file (str): Path to the output CSV file.
    """
    processed_data = []

    with open(input_file, mode='r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            empathy_score = determine_empathy(row['text'])
            processed_data.append((row['responseid'], row['text'], empathy_score))

    with open(output_file, mode='w', encoding='utf-8', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['responseid', 'text', 'empathy'])
        writer.writerows(processed_data)

# Example usage
input_csv = 'path/to/your/input.csv'
output_csv = 'path/to/your/output.csv'
process_csv(input_csv, output_csv)
