### Instruction

The is a notebook for conducting thematic analysis. The prompts can be found in the paper. 

Before you start using this notebook, please make sure you have your OpenAI API key in `chat_modes.py`.

There are four steps.
1. Extracted the codes
2. Grouped the codes the themes
3. Label the responses
4. Evaluation



Please adhere to the instructions provided in each step. 

Possible Errors:
1. Format Error: Due to the GPT may output the format of the result differently, sometimes you may encounter a format error. Please re-run the cell to tackle this issue.
2. maximum context length: If you have a large number of free-text responses, the total token count may exceed the maximum context length (16k for GPT-3.5 and 32k for GPT-4). If you encounter this issue, please consider the solution we implemented in our work on the password manager dataset. You can find detailed information in our paper.

In [None]:
import os
import json
import chat_mods

### Step 1. Extracted the codes

We provided 4 exemplars to ChatGPT.
ChatGPT will generated the codes.


<u>Please provide some exemplars (under Exemplars)in the prompt. We suggested 4-8 examplars will be good.</u>

Here is the example:
```
Task: The aim of this project is to investigate which aspects of the music listening experience are prioritized by people listening to music on their personal devices.

Here are 4 examples for how to generate the codes. For each example, you will see one response with the codes step by step.

These responses are the answer of the question: What’s the first thing that comes into your mind about this track?

Each generated code have the format like:
'quote' refers to /mentions 'definition of the code'. Therefore, we got a code: 'code'. 

Examplars:

(1)
Response: Nostalgia, friends, sing along. 

Action: Let's code the response.
"Nostalgia, friends, sing along" mentions Nostalgia. Therefore, we got a code: Nostalgia.

(2)
Response: I don't actually know or remember it.
Action: Let's code the response.
"I don't actually know or remember it" refers to something not known or recognized. Therefore, we got a code: unfamiliar.


(3)
Response: Emotion I like the song and I really like the singers voice.

Action: Let's code the response.
"I like the song and I really like the singers" refers to “constructive, optimistic, or confident”. Therefore, we got a code: Positive.
"I like the song and I really like the singers voice" refers to the singer’s voice. Therefore, we got a code: vocals.


(4)
Response: I remember really liking this song when it came out and dancing to it.

Action: Let's code the response.
“I remember really liking this song when it came out and dancing to it” refers to “constructive, optimistic, or confident”. Therefore, we got a code: Positive
“I remember really liking this song when it came out and dancing to it” refers to “memory”. Therefore, we got a code: Memory
“I remember really liking this song when it came out and dancing to it” mentions “dancing”. Therefore, we got a code: Dancing

You need to generate the codes with the format:
'quote' refers to /mentions 'definition of the code'. Therefore, we got a code: 'code'. 
```

In [12]:
exemplars_for_codes = '''Task: [Put the your task description here]

Here are 4 examples for how to generate the codes. For each example, you will see one response with the codes step by step.

These responses are the answer of the question: What’s the first thing that comes into your mind about this track?

Each generated code have the format like:
'quote' refers to /mentions 'definition of the code'. Therefore, we got a code: 'code'. 

Exemplars:
[Put your exemplars here]

You need to generate the codes with the format:
'quote' refers to /mentions 'definition of the code'. Therefore, we got a code: 'code'. 
'''

In [None]:
prompt_for_codes, text_for_machine = chat_mods.load_data("your_free_text_responses.txt")

init_codes, msgs, content = chat_mods.init_code_gen(exemplars_for_codes, prompt_for_codes)

In [None]:
# The cell is for generating the summary of the result. It might take a while to finish
sum_codes, codes = chat_mods.sum_code_gen(init_codes, msgs)

#### Codes Review
You can review the generated codes below.

In [None]:
print(codes)

### Step 2 Group the codes to themes

The result will be the grouped themes. 

In [None]:
question = "" # Put the question of the survey here. e.g. How do you manage your password?
machine_themes, msgs_theme = chat_mods.codes_to_themes(sum_codes, codes, question)

The themes have been stored in the file called <b>revised_theme.json</b>. Please find and edit the file.

If you find that the results are unsatisfactory, it is recommended to restart the kernel and begin again from Step 1. This will ensure a fresh start and potentially improve the outcome.

You have the ability to edit the content, including changing the theme name, adding/deleting/merging themes, and relocating the code to another theme.
#### Please remember to save the file as "revised_theme.json" once you have finished.


You need to write down your actions in action_description (in the next cell)
Here are the examples.
```
action_description = """
1. I merged the "Relatability and Connection" into "Personal Connection" since they are similar.
2. I deleted "Specific Elements" since it is too specific.
3. I merged "Sensual and Sexual", "Catchiness and Appeal", and "Dream-like and Ethereal" into "Emotional Response" since these are the emotional response.
4. I deleted "Political Activism" since it is too specific.
5. I merged "Association and Context" into "Familiarity and Memory" since it belongs to that theme.
"""
```

Your task is to provide feedback to the machine coder. The loop will continue until saturation is reached. The end result will be the code book.

##### Discussion Starting Point

In [None]:
action_description = """

"""
### Write your actions above.

After you edited the themes and codes in revised_theme, please run the cell below

In [None]:
msgs_theme = chat_mods.discussion(machine_themes, msgs_theme, action_description)

### Continue or Complete?
#### -Continue

To initiate a new round of discussion, execute the cell below.
Return to the <b>Discussion Starting Point</b> to repeat the process.


If you believe that saturation has been reached, please proceed to the <b>-Complete</b> step.

#### -Complete
Final Code book Generation

Run the cell below

In [None]:
machine_theme, msgs_theme = chat_mods.diss_process(msgs_theme)
final_code_book = chat_mods.final_codebook_gen("codebook.json", msgs_theme)

### Step 3. label the responses based on the final code book.
Please code the free-text responses based on the final code book <u>codebook.json</u>.

In the the codebook.json file, each code is labeled with a number. Please label the responses using the corresponding code numbers. If a response contains multiple codes, please separate them with a <b>space</b>.

For instance:
```
{
"3": {
    "cultural": "the association of the song with a specific country"
     }
},
{
"4": {
    "music genre": "the association of the song with a specific music genre"
     }
```

If you would like to label the response as "culture" and "music genre," please assign them the labels 3 and 4, respectively.

In [None]:
code_result = chat_mods.label(final_code_book, text_for_machine)
print("Completed")

### Step 4. Evaluation
To calculate the Inner-Annotator-Agreement (IAA), you can refer to the NLTK package (https://www.nltk.org/_modules/nltk/metrics/agreement.html)
The machine coded result is stored as coded_result.json

### Finished
You should get the following files after you complete the whole process.

* codebook: codebook.json
* machine coded result: coded_result.json
* (revised_theme.json)