# Installs

In [None]:
# Install package from PyPI
!pip install TA_using_LLMs

Collecting TA_using_LLMs
  Downloading ta_using_llms-0.0.4-py3-none-any.whl.metadata (1.8 kB)
Collecting chromadb==0.6.3 (from TA_using_LLMs)
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting datasets==3.3.2 (from TA_using_LLMs)
  Downloading datasets-3.3.2-py3-none-any.whl.metadata (19 kB)
Collecting fuzzywuzzy==0.18.0 (from TA_using_LLMs)
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl.metadata (4.9 kB)
Collecting langchain-chroma==0.2.2 (from TA_using_LLMs)
  Downloading langchain_chroma-0.2.2-py3-none-any.whl.metadata (1.3 kB)
Collecting langchain-community==0.3.18 (from TA_using_LLMs)
  Downloading langchain_community-0.3.18-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core==0.3.37 (from TA_using_LLMs)
  Downloading langchain_core-0.3.37-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-experimental==0.3.4 (from TA_using_LLMs)
  Downloading langchain_experimental-0.3.4-py3-none-any.whl.metadata (1.7 kB)
Collecting langchain-google-gen

In [None]:
# Install from source
import git

repo_url = "https://github.com/nbarnett19/Thematic_Analysis_using_LLMs"
local_path = "local_repo"

# Clone the repository
git.Repo.clone_from(repo_url, local_path)
print(f"Repository cloned to {local_path}")

# Access the folder you need
folder_path = f"{local_path}/data"

# List files in the folder
import os
files = os.listdir(folder_path)
print("Files:", files)

Repository cloned to local_repo
Files: ['focus_group_4.txt', 'focus_group_1.txt', 'focus_group_2.txt', 'reflections.txt', 'focus_group_3.txt', 'focus_group_5.txt']


# Setup LLM

The package can be used interchangeably with OpenAI or Gemini Through the ModelManager. For this demo, we will be using Gemini 1.5 Pro. You will need your own API tokens to follow along.

In [None]:
# Initiate ModelManager
from TA_using_LLMs.logic import ModelManager

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
# Initialize the ModelManager
model_manager = ModelManager(model_choice='gemini-1.5-pro', temperature=0.5, top_p=0.5)
# model_manager = ModelManager(model_choice='gpt-4o')

Provide your Google API Key: ··········
Provide your OpenAI API Key: ··········


# Load Data

The data to be analyzed should be stored as .txt files in a folder. Here we use the example data from the github repo. The loader will convert the files into Langchain docs and then slice the data into smaller chunks for analysis.

In [None]:
# Initiate FolderLoader
from TA_using_LLMs.logic import FolderLoader

In [None]:
loader = FolderLoader(folder_path)

In [None]:
docs = loader.load_txt()

100%|██████████| 6/6 [00:00<00:00, 12.58it/s]

['local-repo/data/focus_group_4.txt', 'local-repo/data/focus_group_1.txt', 'local-repo/data/focus_group_2.txt', 'local-repo/data/reflections.txt', 'local-repo/data/focus_group_3.txt', 'local-repo/data/focus_group_5.txt']





In [None]:
# Split documents into chunks
chunks = loader.split_text(docs, chunk_size=1000, chunk_overlap=500)

In [None]:
# Number of chunks generated
len(chunks)

341

In [None]:
# Check size of one chunk
chunks[256].page_content

'TN11: Well, I do believe that it can be useful. I think, yes, it simply requires that you look at the figures and think about them. And I think that\'s something important or. If you\'re not aware of it or you\'re not conscious of it, then you\'re not going to pay that much attention to it. Ehm, so I don\'t think it can have a harmful effect, I think it\'s more useful if you actively think about it. Of course, with us, we have a bit of background knowledge and everything, so it\'s a bit easier to perhaps find the numbers more exciting and everything and... prefer to observe and everything, but I think that just has to come with it, doesn\'t it? I think just sticking one on someone or somehow applying a sensor and then saying, "Yes, watch your numbers." I don\'t think that\'s enough. Exactly.\nGA: Yes. That means it should be accompanied by instruction and then also...\nTN11: Sure, yes. And that you then also discuss and evaluate it and everything, yes, I think that\'s part of it.'

In [None]:
# Chunks are stored as a list
type(chunks)

list

In [None]:
# See where each chunk originates
chunks[0].metadata.get("source")

'local-repo/data/focus_group_4.txt'

# Summarize the Data

Before performing the analysis, create a summary of the data to understand the primary topics discussed.

In [None]:
# Initiate ThematicAnalysis
from TA_using_LLMs.logic import ThematicAnalysis

In [None]:
# Define the research questions of the study
rqs = """Explore and describe experiences of internal medicine doctors after wearing a
glucose sensor with focus on two research questions:
1. How can self-tracking with a glucose sensor influence residents’ understanding of glucose metabolism?
2. How can self-tracking with a glucose sensor improve residents’ awareness, appreciation, and
understanding of patients with diabetes?"""

In [None]:
prompt = ThematicAnalysis(llm=model_manager.llm, docs=docs, chunks=chunks, rqs=rqs)

In [None]:
# Generate a summary
summary = prompt.generate_summary()

In [None]:
# Create a fucntion to display the summary in Markdown for easier readability
from IPython.display import Markdown
import textwrap

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [None]:
to_markdown(summary)

> ## Overall Summary: Internal Medicine Doctors' Experiences Wearing Glucose Sensors
> 
> This qualitative study, encompassing five focus groups, explored the impact of self-tracking with glucose sensors on internal medicine doctors' understanding of glucose metabolism and their empathy for patients with diabetes. 
> 
> **Key Findings:**
> 
> **1. Enhanced Understanding of Glucose Metabolism:**
> 
> * **Dynamic Fluctuations:** Doctors gained firsthand experience with the dynamic nature of glucose levels, observing how factors like food, stress, sleep, exercise, and even illness influence blood sugar (FG1, FG2, FG3, FG4, FG5). 
> * **Individual Variability:**  The experience highlighted the significant individual variability in glucose responses to food and other stimuli, challenging assumptions about "normal" levels and emphasizing the need for personalized care (FG2, FG4).
> * **Beyond Textbook Knowledge:**  Wearing the sensor provided a practical understanding of glucose metabolism that extended beyond theoretical knowledge, deepening their appreciation for the complexity of diabetes management (FG1, FG3).
> 
> **2. Increased Awareness and Empathy for Diabetes Patients:**
> 
> * **Shared Experience Fosters Empathy:**  The act of wearing a sensor fostered a profound sense of empathy for the challenges faced by patients with diabetes, including the constant monitoring, dietary restrictions, potential for hypoglycemia, and social stigma associated with wearing a medical device (FG1, FG2, FG3, FG4, FG5).
> * **Improved Communication and Patient Care:**  Doctors felt better equipped to communicate with patients, anticipating their concerns, offering more relatable advice, and engaging in more empathetic and informed discussions about diabetes management (FG2, FG3, FG4).
> * **Patient-Centered Approach:**  The experience prompted a shift towards a more patient-centered approach, recognizing the importance of individual needs, shared decision-making, and patient empowerment in diabetes care (FG4).
> 
> **Additional Insights:**
> 
> * **Benefits for Medical Education:** Participants strongly advocated for incorporating glucose sensor experiences into medical training to enhance empathy, improve diabetes care, and bridge the gap between theoretical knowledge and lived experience (FG2, FG3, FG5).
> * **Potential Drawbacks:**  The study also acknowledged potential downsides of self-tracking, such as anxiety, over-monitoring, and the risk of feeling defined by the device, highlighting the need for patient education and responsible use of technology (FG2, FG3).
> * **Future Research:**  Future research could explore the long-term impact of this experience on clinical practice, patient interactions, and the integration of dietary tracking alongside sensor data.
> 
> **Overall Conclusion:**
> 
> This qualitative research suggests that self-tracking with glucose sensors can be a valuable tool for medical education, enhancing doctors' understanding of glucose metabolism and fostering empathy for patients with diabetes. This experience has the potential to translate into more empathetic, patient-centered, and effective diabetes care. 


# Perform a simple Thematic Analysis

This analysis completes the entire thematic analysis in one step. The output can be saved as a json or csv file.

In [None]:
# The prompt is printed after completion
simple_TA_analysis = prompt.zs_control_gemini(filename="simple_TA_analysis.json")

You are a qualitative researcher doing
        inductive (latent/semantic) reflexive Thematic analysis according to the
        book practical guide from Braun and Clark (2022). Review the given transcripts
        to identify excerpts (or quotes) that address the research questions.
        Generate codes that best represent each of the excerpts identified. Each
        code should represent the meaning in the excerpt. The excerpts must exactly
        match word for word the text in the transcripts.
        Based on the research questions provided, you must identify a maximum of 6 distinct themes.
        Each theme should include:
        1. A theme definition
        2. A sub-theme if needed
        3. Each sub-theme should have a definition
        4. Supporting codes for each sub-theme
        5. Each code should be supported with a word for word excerpt from the
        transcript and excerpt speaker from the text.
        When defining the themes and subthemes, please look for 

In [None]:
# Show results in a pandas dataframe
import pandas as pd
pd.json_normalize(simple_TA_analysis)

Unnamed: 0,theme,theme_definition,subthemes,subtheme_definitions,codes,supporting_quotes,speaker
0,Impact of Self-Tracking on Understanding Gluco...,This theme explores how self-tracking with a g...,[Personal Experiences with Glucose Fluctuation...,[This subtheme focuses on the residents' perso...,"[Unexpected Glucose Stability, Impact of Stres...","[But on a normal day, for example, I get up, I...","[TN17, TN19, TN17, TN17, TN9]"
1,Enhanced Empathy and Understanding of Diabetes...,This theme explores how self-tracking with a g...,[Appreciation for the Challenges of Diabetes M...,[This subtheme focuses on the residents' newfo...,"[Constant Monitoring and Decision-Making, Impa...",[It simply takes a lot of self-discipline. So ...,"[TN13, TN15, TN21, TN7, TN10]"
2,Ethical Considerations and Potential Risks of ...,This theme explores the ethical considerations...,"[Health Anxiety and Obsessive Tracking, Overre...",[This subtheme focuses on the potential for gl...,"[Risk of Health Obsession, Overemphasis on Dat...",[And that many more athletes are now always me...,"[TN13, TN9, TN7]"
3,Benefits and Limitations of Sensor Use in Medi...,This theme explores the potential benefits and...,"[Enhanced Learning and Understanding, Practica...",[This subtheme focuses on how personal experie...,"[Improved Understanding of Glucose Metabolism,...","[Yes, I think so. Especially with the endocrin...","[TN22, TN20, TN13, TN25]"
4,Importance of Patient-Centered Care and Shared...,This theme highlights the importance of patien...,"[Individualized Treatment Approaches, Importan...",[This subtheme emphasizes the need for individ...,"[Need for Personalized Care, Patient Education...",[I think it depends on the type of patient we ...,"[TN16, TN25, TN13]"
5,Technological Advancements and Future Directio...,This theme explores the technological advancem...,"[Evolution of Glucose Monitoring Technology, I...",[This subtheme focuses on the advancements in ...,"[Advancements in Sensor Technology, Closed-Loo...","[Well, that has already changed, there is now ...","[GA, TN7, TN10]"


# In-Depth Thematic Analysis

## Code Generation

Alternatively, codes can be generated from the data separately using three different promting styles: zero-shot, few-shot and chain-of-thought. The codes are generated by iterating through each data chunk and asking the LLM to notate the text as a qualitative researcher. Below we use the chain-of-thought prompt to generate codes.

In [None]:
# Initiate GenerateCodes
from TA_using_LLMs.logic import GenerateCodes

In [None]:
code = GenerateCodes(llm=model_manager.llm, docs=docs, chunks=chunks, rqs=rqs)

In [None]:
# Save the file in a csv or json format
# The prompt is printed at the end of the output
cot_codes = code.cot_coding(filename="cot_codes.json")

Processing chunk 1
Model output: [{'code': 'Technological advancements in diabetes management', 'code_description': 'The participant highlights the advancements in diabetes management, specifically mentioning continuous glucose monitoring and insulin pumps.', 'excerpt': "Ah yes, monitoring has simply become a bit different, hasn't it? You can now measure blood glucose almost continuously, you have pumps, you can actually replace basal insulin with a base that is injected continuously.", 'speaker': 'TN17'}, {'code': 'Continuous Glucose Monitoring (CGM)', 'code_description': 'The participant specifically mentions continuous glucose monitoring as a significant change in diabetes management.', 'excerpt': 'You can now measure blood glucose almost continuously', 'speaker': 'TN17'}, {'code': 'Insulin Pump Therapy', 'code_description': 'The participant mentions insulin pumps as a technological advancement in diabetes treatment.', 'excerpt': 'you have pumps, you can actually replace basal insul

In [None]:
# Convert to a pandas dataframe
pd.DataFrame(cot_codes)

Unnamed: 0,code,code_description,excerpt,speaker,chunk_analyzed,source
0,Technological advancements in diabetes management,The participant highlights the advancements in...,"Ah yes, monitoring has simply become a bit dif...",TN17,Focus Group 5\nInformation about the appointme...,/content/drive/MyDrive/Master_Thesis/data/focu...
1,Continuous Glucose Monitoring (CGM),The participant specifically mentions continuo...,You can now measure blood glucose almost conti...,TN17,Focus Group 5\nInformation about the appointme...,/content/drive/MyDrive/Master_Thesis/data/focu...
2,Insulin Pump Therapy,The participant mentions insulin pumps as a te...,"you have pumps, you can actually replace basal...",TN17,Focus Group 5\nInformation about the appointme...,/content/drive/MyDrive/Master_Thesis/data/focu...
3,Confirmation of Hypoglycemia,The participant highlights the importance of c...,"Yes, ha, to have confirmation.",GA,"GA: Ah, did she also have a sensor?\nTN17: Yes...",/content/drive/MyDrive/Master_Thesis/data/focu...
4,Sensor as an Early Indicator,The participant describes the glucose sensor a...,It's just a good indication of that. Yes.,TN17,"GA: Ah, did she also have a sensor?\nTN17: Yes...",/content/drive/MyDrive/Master_Thesis/data/focu...
...,...,...,...,...,...,...
1169,Delayed hypoglycemia after alcohol consumption,The participant experienced delayed hypoglycem...,Apparently the liver was unable to adequately ...,,"hadn't eaten for a long time, I could assume t...",/content/drive/MyDrive/Master_Thesis/data/refl...
1170,Alcohol's Impact on Glucose Levels with GLP-1,The participant observed unexpectedly low bloo...,"After the lecture, it got late: we went to a w...",,"lecture. After the lecture, it got late: we we...",/content/drive/MyDrive/Master_Thesis/data/refl...
1171,Objective Measurement vs. Subjective Feeling,The participant highlighted the discrepancy be...,I learned how little I subjectively feel the l...,,"Apart from this event, 99% of my BG value was ...",/content/drive/MyDrive/Master_Thesis/data/refl...
1172,Consistent Regulation and Ease of Use,The participant expressed surprise at the body...,I was amazed at the consistency of the regulat...,,"Apart from this event, 99% of my BG value was ...",/content/drive/MyDrive/Master_Thesis/data/refl...


## Theme Generation

Perform a detailed thematic analysis on the previously generated codes. This also can utilize zero-shot, few-shot and chain-of-thought prompting. Below we use chain-of-thought for this demo.

In [None]:
# Initiate GenerateThemes
from TA_using_LLMs.logic import GenerateThemes

In [None]:
theme = GenerateThemes(llm=model_manager.llm, rqs=rqs, json_codes_list=cot_codes)

In [None]:
# Save the file in a csv or json format
# The prompt is printed at the end of the output
cot_themes = theme.cot_themes("cot_themes.json")


        Objective: You are a qualitative researcher and are doing inductive
        (latent/semantic) reflexive Thematic analysis according to the book practical
        guide from Braun and Clark (2022).

        Steps:
        1. Group codes into subthemes: Organize related codes into subthemes, if needed, that
        capture shared meanings across the codes based on the research questions provided.
        When subthemes are present, provide a definition for each subtheme.

        2. Group subthemes into themes: Organize related subthemes (if present) or codes into a
        maximum of 6 distinct themes that capture shared meanings across the subthemes
        based on the research questions provided. A subtheme sits under a theme.
        It focuses on one particular aspect of that theme; it brings analytic attention
        and emphasis on this aspect. Use subthemes only when they are needed to
        bring emphasis to one particular aspect of a theme.
                        

In [None]:
# Convert to a pandas dataframe
pd.json_normalize(cot_themes)

Unnamed: 0,theme,theme_definition,subthemes,subtheme_definitions,supporting_quotes
0,Enhanced Understanding of Glucose Metabolism,This theme captures the profound impact of sel...,"[Impact of Diet and Lifestyle, Individual Vari...",[This subtheme highlights the residents' obser...,"[Yes, in the beginning I looked quite closely ..."
1,Increased Empathy and Understanding of Patient...,This theme reflects the transformative effect ...,[Appreciation for the Burden of Diabetes Manag...,[This subtheme highlights the residents' reali...,"[I was already aware of it, but it's like, I s..."


# Thematic Analysis with RAG

Perform the Thematic analysis with additional supporting matierial through a RAG database. The LLM will  pull relevant information from the database before begining the analysis. This package uses a chroma database.

## Load the RAG files

Files should be in pdf format stored in a folder. This demo uses the files found in the github repo.

In [None]:
# Access the folder you need from the repo
folder_path = f"{local_path}/RAG_files"

In [None]:
# Load the files
loader = FolderLoader(folder_path)
rag_docs = loader.load_pdf()

['local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and impact on quality of life.pdf', 'local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and impact on quality of life.pdf', 'local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and impact on quality of life.pdf', 'local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and impact on quality of life.pdf', 'local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and impact on quality of life.pdf', 'local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and impact on quality of life.pdf', 'local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and impact on quality of life.pdf', 'local_repo/RAG_files/Flash glucose monitoring (FGM) A clinical review on glycaemic outcomes and

In [None]:
# Divide docs into smaller chunks for easy retrieval
rag_chunks = loader.split_text(rag_docs, chunk_size=1024, chunk_overlap=512)
len(rag_chunks)

2057

For scanned pdfs, text must be extracted with OCR technology. For this, we use a separate method as seen below.

In [None]:
# Initiate ScannedPDFLoader
from TA_using_LLMs.logic import ScannedPDFLoader

In [None]:
# Access the folder you need from the repo
folder_path = f"{local_path}/ScannedPDFs_for_RAG"

In [None]:
# If not already available, install:
!apt-get install poppler-utils
!sudo apt install tesseract-ocr
!sudo apt install libtesseract-dev

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  poppler-utils
0 upgraded, 1 newly installed, 0 to remove and 29 not upgraded.
Need to get 186 kB of archives.
After this operation, 696 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 poppler-utils amd64 22.02.0-2ubuntu0.6 [186 kB]
Fetched 186 kB in 1s (212 kB/s)
Selecting previously unselected package poppler-utils.
(Reading database ... 124947 files and directories currently installed.)
Preparing to unpack .../poppler-utils_22.02.0-2ubuntu0.6_amd64.deb ...
Unpacking poppler-utils (22.02.0-2ubuntu0.6) ...
Setting up poppler-utils (22.02.0-2ubuntu0.6) ...
Processing triggers for man-db (2.10.2-1) ...


In [None]:
# Create an instance of ScannedPDFLoader
pdf_loader = ScannedPDFLoader(folder_path)

# Iterate over the loaded documents and print the extracted text
scanned_docs = pdf_loader.lazy_load()

In [None]:
len(scanned_docs)

44

In [None]:
# Check that documents were extracted properly

In [None]:
scanned_docs[5].page_content

"TIC ANALYSIS\n\nlevel. Sometimes it might be more conceptual\nor latent. Conceptual pattern themes from the\nchildfree dataset include it’s making a choice that’s\nimportant or compensatory kids. These themes are\nconceptual because they dig down below surface\n| are united around an idea that isn’t necessarily obviously evident in the data.\n>th of these themes further in this chapter, and in Chapters Five and Seven.\nrel at which you're exploring shared meaning can vary dramatically, some\nt contain data extracts that on the surface appear quite dissimilar. And indeed,\na contradiction or dichotomisation might form\noo, the basis for a theme itself, if the theme focuses\nINT A contradiction In, OF on that dichotomisation; if that contradiction is\non of meaning can form the pattern (see ‘But what about contradiction’\n, and Box 4.7 later in this chapter).\nUnderstanding that basic definition and\nt's time to move on to the theme development phases. In these phases, you\n{ alliances 

In [None]:
# Divide into smaller chunks
scanned_chunks = pdf_loader.split_text(scanned_docs, chunk_size=1024, chunk_overlap=512)
len(scanned_chunks)

148

## Set up ChromaDB

This demo uses an OpenAI embedding model to store the RAG documents.

In [65]:
# set up embedding model
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import OpenAIEmbeddings
openai_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

In [None]:
# Initiate ChromaVectorStoreManager
from TA_using_LLMs.logic import ChromaVectorStoreManager

In [None]:
# Initialize with DB with specific parameters
openai_recursive_db = ChromaVectorStoreManager(
    collection_name="openai_collection",
    embeddings=openai_embeddings,
    persist_directory="ChromaDB" # choose a local directory to store DB
)

In [None]:
# Add documents to the store
openai_recursive_db.add_documents(rag_chunks, empty_db=True)

# Add documents to the store
openai_recursive_db.add_documents(scanned_chunks, empty_db=False)

Added 2057 documents to the collection 'openai_collection'.
Added 148 documents to the collection 'openai_collection'.


## Run Analysis with RAG

The RAG pipeline can implement MMR ranking and/or decomposition if desired, both of which are used in this demo. Otherwise, the thematic analysis prompt will be used as the retrieval query in the RAG database.

In [62]:
# Decomposition Prompt which uses the research questions as the input
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {questions} \n
Output (3 queries):"""

### Code Generation

In [63]:
# Initiate the retriever with MMR technology
retriever = openai_recursive_db.vector_store.as_retriever(search_type="mmr")
# Initialize GenerateCodes instance for RAG
code_generator = GenerateCodes(
    llm=model_manager.llm,
    docs=docs,  # or a broader document structure if needed
    chunks=chunks,
    rqs=rqs,
    examples=None,  # You can pass example codes here for few-shot prompting
    vector_db=openai_recursive_db.vector_store,
    retriever=retriever
)

In [None]:
# The end of the printed output shows the prompt including the decomposition queries
RAG_cot_codes = code_generator.cot_coding(use_rag=True, rag_query=template, filename = "RAG_cot_codes.json")

Processing chunk 1
Retrieved documents: ['* To address the other part of the research questions asking about the resonance in science, an \ninternational literature review and analysis was conducted in September and October 2015 (search-ing the databases of PubMed, MedPilot and Greenpilot) aiming to identify life sciences, health or medical literature that refers to data or knowledge produced through QS activities. In addition, expert interviews were carried out in October and November 2015 with six German health scien-tists and medical experts on the quality, relevance and (potential) benefits of self-tracking data and knowledge for the respondents’ medical or scientific practice. Recruited through email or personal contact, they were selected as leading experts in the fields of digital medical practice, medical informatics, mobile health, sports science, sleep research and epidemiology.\nGiven the focus on the knowledge produced by self-tracking research, this article primarily [Sour



Retrieved documents: ['* 6. Has it changed your behaviour?\n7. Did you tell people about your experience?\nAttitude 8. What is your opinion of these devices?\n9. Do you feel anything negative about monitoring your health in this \nway?\n10. Do you feel these devices are reliable and safe?\n11. Do you perceive any stigma associated with these devices?TABLE 4\u2003Summary of interview guide \nfor focus group participants.\nTABLE 5\u2003Summary of interview guide for non-  focus group \nparticipants.\nSection Question\nAttitude 1. Are you familiar with activity-  tracking devices, \nhave you ever used one?\n2. What is your view about using activity-  tracking \ndevices to monitor health?\n3. What do you think about the idea of nurses \nwearing activity-  tracking devices to monitor \ntheir health; how practical  and acceptable  do \nyou think this would be? [Source: /content/drive/MyDrive/Master_Thesis/RAG_files/Wearable activity trackers for nurses  health  A qualitative acceptability st



Error occurred while processing chunk 167 in /content/drive/MyDrive/Master_Thesis/data/focus_group_3.txt: 500 An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting
Processing chunk 167




Retrieved documents: ['* PUBMED\n 30. Nittas V , Lun P, Ehrler F, Puhan MA, Mütsch M. Electronic patient-generated health data to facilitate \ndisease prevention and health promotion: scoping review. J Med Internet Res  2019;21(10):e13320. \nPUBMED | CROSSREF\n 31. Kim HS. Apprehensions about excessive belief in digital therapeutics: points of concern excluding merits. \nJ Korean Med Sci  2020;35(45):e373. \nPUBMED | CROSSREF\n 32. Kim SK, Kim HJ, Kim T, Hur KY, Kim SW , Lee MK, et al. Effectiveness of 3-day continuous glucose \nmonitoring for improving glucose control in type 2 diabetic patients in clinical practice. Diabetes Metab J  \n2014;38(6):449-55. \nPUBMED | CROSSREF\n 33. Cappon G, Vettoretti M, Sparacino G, Facchinetti A. Continuous glucose monitoring sensors for diabetes \nmanagement: a review of technologies and applications. Diabetes Metab J  2019;43(4):383-97. \nPUBMED | CROSSREF\n 34. Kim HS, Shin JA, Chang JS, Cho JH, Son HY, Yoon KH. Continuous glucose monitoring: cur



Model output: [{'code': 'Sensor Adaptation', 'code_description': 'The participant discussed the initial discomfort of wearing the sensor and the eventual acclimation to it.', 'excerpt': "I think you get used to wearing them over time, but I'm glad I don't have to.", 'speaker': None}, {'code': 'Enhanced Understanding of Diabetes Management', 'code_description': 'The participant acknowledged that wearing the sensor provided a clearer picture of the challenges faced by diabetic patients in managing their condition.', 'excerpt': 'The picture of a diabetic patient and the necessary measures for successful therapy become much clearer when you wear the sensor', 'speaker': None}, {'code': 'Scientific Detachment', 'code_description': 'The participant admitted to approaching the experience from a scientific perspective, which hindered their ability to fully empathize with the patient experience.', 'excerpt': "but I saw it from a more scientific point of view and therefore I didn't fully empathiz

In [None]:
# convert to pandas dataframe
pd.DataFrame(RAG_cot_codes).head()

Unnamed: 0,code,code_description,excerpt,speaker,chunk_analyzed,source,RAG_query,retrieved_documents
0,Technological advancements in diabetes care,The participant highlights the evolution of di...,"Ah yes, monitoring has simply become a bit dif...",TN17,Focus Group 5\nInformation about the appointme...,/content/drive/MyDrive/Master_Thesis/data/focu...,## Sub-Questions and Search Queries:\n\nHere a...,[* To address the other part of the research q...
1,Technological advancements in diabetes care,The participant describes the current state of...,"Ah yes, monitoring has simply become a bit dif...",TN17,"GA: Someone can start.\nTN17: Yes, simply by s...",/content/drive/MyDrive/Master_Thesis/data/focu...,Here are 3 sub-questions and related search qu...,[* post- it notes with the \nhelp of the group...
2,Experience with patients using glucose sensors,The participant confirms having experience tre...,"Yes, just now on the ward yes.",TN17,"GA: Someone can start.\nTN17: Yes, simply by s...",/content/drive/MyDrive/Master_Thesis/data/focu...,Here are 3 sub-questions and related search qu...,[* post- it notes with the \nhelp of the group...
3,Exposure to sensor technology prior to the study,The participant reveals their familiarity with...,"Even before the study, yes.",TN17,"GA: Someone can start.\nTN17: Yes, simply by s...",/content/drive/MyDrive/Master_Thesis/data/focu...,Here are 3 sub-questions and related search qu...,[* post- it notes with the \nhelp of the group...
4,Confirmation of clinical suspicion,The participant used the glucose sensor data t...,"Yes, and she was in a hypo beforehand. So and ...",TN17,"GA: Yes, and have you already looked after pat...",/content/drive/MyDrive/Master_Thesis/data/focu...,Here are 3 sub-questions and related search qu...,[* self-testing and the related risk factors i...


### Theme Generation

In [None]:
# Initialize GenerateThemes instance for RAG
theme_generator = GenerateThemes(
    llm=model_manager.llm,
    rqs=rqs,
    json_codes_list=RAG_cot_codes,
    examples=None,  # You can pass example codes here if needed
    vector_db=openai_recursive_db.vector_store,
    retriever=retriever
)

In [None]:
# The end of the printed output shows the prompt including the decomposition queries
RAG_cot_themes = theme_generator.cot_themes(use_rag=True,
                                       rag_query=template,
                                       filename="RAG_cot_themes.json")

Context: ['* research programmes. A genuine research subject are the (intended) effects of using QS technologies and users’ experiences in their everyday lives and in clinical practice (e.g. by doing self-monitoring). In addi-tion, one can observe the scientific use of QS data as well as the use of QS technologies and QS methods to collect data. Finally, there is a technical-scientific discourse about the development and quality checks \nof new sensors, systems, algorithms, platforms and apps in the QS field.\nReferences\nAbend P and Fuchs M (eds) (2016) Quantified selves and statistical bodies. Digital Culture & Society 2(1). \nBielefeld: Transcript.\nAltman LK (1987) Who Goes First? The Story of Self-Experimentation in Medicine. Berkeley, CA: University \nof California Press. [Source: /content/drive/MyDrive/Master_Thesis/RAG_files/heyen-2019-from-self-tracking-to-self-expertise-the-production-of-self-related-knowledge-by-doing-personal-science.pdf]', '* adherence, patient satisfactio

In [None]:
# Convert to a pandas dataframe
pd.json_normalize(RAG_cot_themes)

Unnamed: 0,theme,theme_definition,subthemes,subtheme_definitions,supporting_quotes
0,Self-Tracking Enhances Understanding of Glucos...,This theme captures the impact of self-trackin...,[Unexpected Glucose Fluctuations and Influenci...,[This subtheme highlights the residents' surpr...,"[But on a normal day, for example, I get up, I..."
1,Self-Tracking Fosters Empathy and Understandin...,This theme explores how self-tracking with a g...,"[Shared Experience and Increased Empathy, Real...",[This subtheme captures the increased empathy ...,[I could show the patient a lot more understan...
2,"Sensor Technology: Advantages, Limitations, an...",This theme encompasses the residents' perspect...,"[Convenience and Improved User Experience, Tec...",[This subtheme focuses on the perceived advant...,"[The advantage, at least as I perceived it, is..."
3,Patient-Specific Considerations and the Import...,This theme emphasizes the importance of consid...,"[Diverse Patient Needs and Preferences, Patien...",[This subtheme acknowledges the wide range of ...,[I think it depends on the type of patient we ...
4,"The Future of Diabetes Management: Technology,...",This theme explores the residents' perspective...,"[Advancements in Technology and Automation, Po...",[This subtheme focuses on the rapid advancemen...,[I think it's certainly getting easier. So I h...
5,Self-Tracking as a Learning Tool in Medical Ed...,This theme explores the potential benefits and...,[Experiential Learning and Enhanced Understand...,[This subtheme emphasizes the value of experie...,[So I also think it gives you a certain securi...
