Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filling up RAM #244

Open
v-artur opened this issue Sep 22, 2023 · 0 comments
Open

Filling up RAM #244

v-artur opened this issue Sep 22, 2023 · 0 comments

Comments

@v-artur
Copy link

v-artur commented Sep 22, 2023

I am using keras-ocr to extract the amount of words and the words themselves from images.

My dataset has around 58000 images and I feed them one by one into the pipeline, using a for loop, and I expand 2 lists with the results along the way. The images are loaded into the disk memory of colab.

For some reason after about 2.5-3 thousand images, colab's 12 GB of RAM just gets used up.

For this reason after every 1000 iterations I dump out a json file with the current lists into my drive.

I restart the kernel and load the lists from json backup saves and they don't even tickle the RAM.

It's quite annoying to reastart the kernel after every 2.5k images, and gc.collect() does nothing.

Is it my code or does the ocr fill up RAM despite any attempts to clear it?

here are the important parts of the code:

pipeline = keras_ocr.pipeline.Pipeline()

t0 = time()
list_of_wc = []
list_of_words = []

output_path = "..."

for k, img in enumerate(image_df.image_name):
if k%500 == 0:
print(time() - t0, k)

img_cv = cv2.imread(r"..." + img, cv2.IMREAD_COLOR)
image_rgb = cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB)

prediction_groups = pipeline.recognize([image_rgb])

list_of_wc.append(len(prediction_groups[0]))

list_of_words.append([words[0] for words in prediction_groups[0]])

if k == 999:
with open(output_path + f'kerasocr_1000.json','w') as f:
json.dump({'num_of_words':list_of_wc, 'the_words': list_of_words}, f)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant