for idx, sample in tqdm(enumerate(dataloader), total=len(dataloader)): for prediction #51

shayal01 · 2023-09-09T00:51:09Z

if we are just using a single pdf, the sample is a list and inference is expecting a tensor of an image ,so the below code will not work so we should make it into sample[0].where sample[0] is the tensor which is stored in the 0th index of the list
model_output = model.inference(image_tensors=sample)

this a function where i passed a single pdf file. and made predictions for each page
def predict():
model=NougatModel.from_pretrained("C:/Users/sshamsu/Documents/New folder/nougat weights").to(torch.bfloat16)#getting nougat pretrained model
if torch.cuda.is_available():
model.to("cuda")

dataset=LazyDataset("C:/Users/sshamsu/Downloads/research paper for Nought.pdf",  #it should be the file path of the pdf 
        partial(model.encoder.prepare_input,random_padding=False),
    )#object of the class LazyDataset 
dataloader = torch.utils.data.DataLoader(
        dataset,
        batch_size=1,
        shuffle=False,
        collate_fn=LazyDataset.ignore_none_collate,
        
    )
prediction=[]
for page_num,page_as_tensor in tqdm(enumerate(dataloader)):
    model_output = model.inference(image_tensors=page_as_tensor[0])
    output = markdown_compatible(model_output["predictions"][0])
    prediction.append(output)

final_mmd="".join(prediction).strip()

return final_mmd

The text was updated successfully, but these errors were encountered:

lukas-blecher · 2023-09-09T10:29:38Z

What is the issue exactly?
Your code only works for batch size = 1

shayal01 · 2023-09-09T21:03:07Z

for page_num,page_as_tensor in tqdm(enumerate(dataloader)):
model_output = model.inference(image_tensors=page_as_tensor[0])

If i don't mention the index 0 in page_as_tensor ,an error pops because page_as_tensor is a list.May be because i am doing it for just one paper .but in the predict.py and app.py files ,they didn't mention the index.So is it issue too when using multiple pdfs?

lukas-blecher closed this as completed Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for idx, sample in tqdm(enumerate(dataloader), total=len(dataloader)): for prediction #51

for idx, sample in tqdm(enumerate(dataloader), total=len(dataloader)): for prediction #51

shayal01 commented Sep 9, 2023

lukas-blecher commented Sep 9, 2023

shayal01 commented Sep 9, 2023

for idx, sample in tqdm(enumerate(dataloader), total=len(dataloader)): for prediction #51

for idx, sample in tqdm(enumerate(dataloader), total=len(dataloader)): for prediction #51

Comments

shayal01 commented Sep 9, 2023

lukas-blecher commented Sep 9, 2023

shayal01 commented Sep 9, 2023