https://yunfeixie233.github.io/MedTrinity-25M/ \
https://github.com/UCSC-VLAA/MedTrinity-25M/tree/master \
https://huggingface.co/yunfeixie/LLaVA-Tri-Pretrained \
https://huggingface.co/yunfeixie/LLaVA-Tri-PathVQA

In [1]:
import os
import pandas as pd
import json

#### Reading the Pathology Test Data

In [2]:
questions = pd.read_csv("../pathology_test_data/questions.csv")
questions

Unnamed: 0,image_name,question,original_responses
0,ck_PTGC_2x.jpeg,How can you best describe the low-power patter...,The lymph node shows a mixed follicular and in...
1,ck_PTGC_2x.jpeg,What are three main differential diagnostic co...,"Based on the low-power pattern, the primary co..."
2,ck_PTGC_5x.jpeg,What is the morphologic alteration being depic...,"The image shows a central enlarged, somewhat i..."
3,ck_PTGC_5x.jpeg,What is the expected immunoarchitecture of the...,This image shows an enlarged secondary follicl...
4,ck_serositis_4x.jpg,What is the most common source for the change ...,"The changes here show extensive serositis, wit..."
5,ck_serositis_4x.jpg,What is the specific anatomic region shown in ...,The right half shows the muscularis propria an...
6,ck_steatohepatitis_100x.jpg,If these histologic changes included conspicuo...,There is fatty liver disease and if concurrent...
7,ck_steatohepatitis_100x.jpg,What would the primary histologic feature to s...,Wilson's disease has many non-specific finding...
8,ck_steatohepatitis_200x.jpg,In an overweight adolescent with mildly increa...,"In the background of steatosis, there is incre..."


In [3]:
images_path = os.path.join(os.getcwd(), "../pathology_test_data/images")
images_list = os.listdir(images_path)
images_list

['ck_steatohepatitis_100x.jpg',
 'ck_PTGC_2x.jpeg',
 'ck_serositis_4x.jpg',
 'ck_steatohepatitis_200x.jpg',
 'ck_PTGC_5x.jpeg']

#### Creating a message structure

In [4]:
messages = []
for index, row in questions.iterrows():
    image_name = row["image_name"]
    question = row["question"]
    question_id = index
    
    message = {
                "id": question_id,
                "image": image_name,
                "conversations": [{"value":question}]
              }
    
    messages.append(message)

#### Creating a jsonl file with each line mentioning a specific question/image

In [5]:
os.makedirs("./temp_files", exist_ok=True)
file_name = "medtrinity_llava_tri_8b.jsonl"
file_path = os.path.join("./temp_files", file_name)
with open(file_path, "w") as f:
    json.dump(messages, f, indent=4)

#### Running the inference code from the command line

In [None]:
!export PYTHONPATH=/data/mn27889/path-open-data/MedTrinity-25M:$PYTHONPATH

#### LLaVA-Tri Pretrained

In [None]:
!CUDA_VISIBLE_DEVICES=0 python llava/eval/model_vqa.py \
--model-path yunfeixie/LLaVA-Tri-Pretrained \
--image-folder /data/mn27889/path-open-data/pathology_test_data/images \
--question-file /data/mn27889/path-open-data/vlm_pathology_test_data_responses/temp_files/medtrinity_llava_tri_8b.jsonl \
--answers-file /data/mn27889/path-open-data/vlm_pathology_test_data_responses/temp_files/medtrinity_llava_tri_pretrained_8b_resp.jsonl

#### Reading the Responses file

In [6]:
resp_file_name = "medtrinity_llava_tri_pretrained_8b_resp.jsonl"
resp_file_path = os.path.join("./temp_files", resp_file_name)

indices = []
llava_responses = []
with open(resp_file_path, "r") as f:
    for line in f:
        json_data = json.loads(line)
        indices.append(json_data["question_id"])
        llava_responses.append(json_data["text"])

In [7]:
responses = pd.Series(llava_responses, index=indices)
responses

0    The low-power pattern observed in this image o...
1    If the lymph node was sampled from a young tee...
2    The morphologic alteration depicted in the enl...
3    The expected immunoarchitecture of the mantle ...
4    The most common source for the change depicted...
5    The right half of the image shows the anatomic...
6    If the histologic changes included conspicuous...
7    The primary histologic feature to suggest a di...
8    The portal changes in the histology image sugg...
dtype: object

In [8]:
os.makedirs("./responses", exist_ok=True)
questions['medtrinity-llava-tri-pretrained-8b-response'] = responses

#### LLaVA-Tri PathVQA

In [None]:
!CUDA_VISIBLE_DEVICES=0 python llava/eval/model_vqa.py \
--model-path yunfeixie/LLaVA-Tri-PathVQA \
--image-folder /data/mn27889/path-open-data/pathology_test_data/images \
--question-file /data/mn27889/path-open-data/vlm_pathology_test_data_responses/temp_files/medtrinity_llava_tri_8b.jsonl \
--answers-file /data/mn27889/path-open-data/vlm_pathology_test_data_responses/temp_files/medtrinity_llava_tri_pathvqa_8b_resp.jsonl

#### Reading the Responses file

In [9]:
resp_file_name = "medtrinity_llava_tri_pathvqa_8b_resp.jsonl"
resp_file_path = os.path.join("./temp_files", resp_file_name)

indices = []
llava_responses = []
with open(resp_file_path, "r") as f:
    for line in f:
        json_data = json.loads(line)
        indices.append(json_data["question_id"])
        llava_responses.append(json_data["text"])

In [10]:
responses = pd.Series(llava_responses, index=indices)
responses

0         merging capsules and large areas of necrosis
1                                   malignant lymphoma
2                          a possible precursor lesion
3                         a complete hydatidiform mole
4                             hyaline membrane disease
5                               the region of interest
6    the characteristic perisinusoidal chicken wire...
7    the characteristic perisinusoidal chicken wire...
8                                         fatty change
dtype: object

In [11]:
os.makedirs("./responses", exist_ok=True)
questions['medtrinity-llava-tri-pathvqa-8b-response'] = responses

In [12]:
questions.to_csv("./responses/medtrinity_llava_tri_8b_responses.csv", index=False)