You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you so much for the Excellent package. I can not exactly keep the code and objective here, but the below code is almost same and images replicate the issue clearly.
I am trying to do a batch inference on PDF using an LLM named Mistral. I have faced issue while doing it. Let's say, I want to identify the tone/other_objective in Each page of the pdf is my usecase.
I have referenced below links to implement the code:
Anyone can easily replicate the code to check the issue
A sample pdf with text is the input(My pdfs are more than 200 pages.)
Current pdf that raised issue has 231 pages. you can see this in logs.
Code:
import os
import re
import gc
import ray
import time
import torch
import pdfplumber
import numpy as np
import pandas as pd
from tqdm import tqdm
from typing import Dict
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
def load_large_model():
model_name_or_path = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="cuda:0",
trust_remote_code=False,
revision="main",
load_in_4bit=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
use_fast=True)
return model, tokenizer
def get_data(pdf):
all_pages_text = []
for pg in range(len(pdf.pages)):
result = {}
page = pdf.pages[pg]
text = page.extract_text(layout = False)
text = text.strip("\n")
prompt = "Some Random/Example Prompt here :{}"
new_prompt = prompt.format(text)
template=f'''<s>[INST] {new_prompt} '~'[/INST]'''
all_pages_text.append(template)
return all_pages_text
def document_data_style(pdf, model, tokenizer):
output = []
all_pages_text = get_data(pdf)
ds = ray.data.from_numpy(np.asarray(all_pages_text))
print("total items :", ds.count())
print("printing dataset inspection execution stats:", ds.stats())
BATCH_SIZE = 5 # maximum batch size that can fit into memory
class LLM_classifier:
def __init__(self):
self.detector = pipeline("text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.3,
top_p=0.95,
top_k=40,
repetition_penalty=1.1,
)
self.detector.tokenizer.pad_token_id = model.config.eos_token_id #padding for batching to same size
def __call__(self, batch: Dict[str, np.ndarray]):
outputs = self.detector(list(batch["data"]), batch_size = BATCH_SIZE)
batch["result"] = [output[0]['generated_text'] for output in outputs]
return batch # returning dictionary
predictions = ds.map_batches(LLM_classifier,
compute=ray.data.ActorPoolStrategy(size=4), # Use 4 GPUs so we will have 4 actors. Change this number based on the number of GPUs in your cluster.
num_gpus=1, # Specify 1 GPU per model/actor replica.
batch_size = BATCH_SIZE # Use the largest batch size that can fit on our GPUs
)
final_predictions = []
for pg_no,prediction in enumerate(predictions.take_all()):
response = prediction["result"]
split_result = response.split('~')[-1]
final_predictions.append(split_result)
# removing ds ray variable from memory
del ds
return final_predictions
def identify_tone(pdf):
ray.init(num_gpus=torch.cuda.device_count())
model, tokenizer = load_large_model()
doc_style = document_data_style(pdf, model, tokenizer)
del model
del tokenizer
return doc_style
start = time.time()
pdf_path = 'sample_pdf.pdf' # I have pdfs of 300 to 500 pages.
pdf = pdfplumber.open(pdf_path)
doc_styles = identify_tone(pdf)
end = time.time()
print("time:", end-start)
print(doc_styles)
# List all PyTorch tensors and their sizes currently allocated on GPU
print(torch.cuda.memory_summary())
print(torch.cuda.memory_allocated())
print(torch.cuda.memory_reserved())
ray.shutdown()
gc.collect()
torch.cuda.empty_cache()
Logs:
Since the logs are a little long, I have added them in text file and shared the drive link here
Dependencies:
Using all the required packages like transformers, accelerate, deepspeed, ray with Recent stable versions as of date.
GPU Configuration:
By the time of screenshot, there is a small process running in cuda:0, but please consider them empty.
Issue:
Unable to run or parallelize the pdf page type checking. pasting first lines of logs here. but please check the logs above.
Initially it used all the 4 gpus, but later it only started using one GPU only. check the logs for more please, since the issue displays at end of the logs.
How to run the code in all 4 gpus paralelly. what am i missing here though I followed the official docs.
Does Anyone have Any idea, or can help me to understand/resolve the issue.
Thanks in advance.
Thank you
The text was updated successfully, but these errors were encountered:
Hi,
Thank you so much for the Excellent package. I can not exactly keep the code and objective here, but the below code is almost same and images replicate the issue clearly.
I am trying to do a batch inference on PDF using an LLM named Mistral. I have faced issue while doing it. Let's say, I want to identify the tone/other_objective in Each page of the pdf is my usecase.
I have referenced below links to implement the code:
Note:
Code:
Logs:
Since the logs are a little long, I have added them in text file and shared the drive link here
Dependencies:
GPU Configuration:
Issue:
Unable to run or parallelize the pdf page type checking. pasting first lines of logs here. but please check the logs above.
Initially it used all the 4 gpus, but later it only started using one GPU only. check the logs for more please, since the issue displays at end of the logs.
How to run the code in all 4 gpus paralelly. what am i missing here though I followed the official docs.
Does Anyone have Any idea, or can help me to understand/resolve the issue.
Thanks in advance.
Thank you
The text was updated successfully, but these errors were encountered: