Skip to content

RandomOverSampler().fit_resample seems to consume more memory in the new version #923

@Piecer-plc

Description

@Piecer-plc

Describe the bug

  • Hello, I have found that RandomOverSampler().fit_resample has different memory usage in different versions. In my program, when the imblearn version was 0.8.1, the peak memory of RandomOverSampler().fit_resample was 1232MB, but when I changed the imblearn to the 0.7.1, the memory consumption increased to 616MB.
  • I used tracemalloc to locate the API that was causing the memory increase and eventually found that it was the RandomOverSampler().fit_resample API provided by imblearn. the other APIs take up a constant amount of memory from version to version.
    My question is, why is there a new version that consumes more memory and is there a way to fix it?
imblearn Version Memory(MB) Python Version
0.8.1 1232 3.7.10
0.7.1 616 3.7.10
0.9.1 1232 3.8.13

Steps/Code to Reproduce

click me download dataset

import pandas as pd
import numpy as np
import imageio
df = pd.read_csv( 'train.csv' )
def load_images( df, folder ):
    images = np.zeros(( len( df ), 32, 32, 3 ), dtype=np.float64 )
    for i, file in enumerate( df.id ):
        images[i] = imageio.imread( folder + '/' + file )
    return ( images - 128 ) / 64
images = load_images( df, 'train' )
from imblearn.over_sampling import RandomOverSampler
import tracemalloc
data = images.reshape( 17_500, -1 )
tracemalloc.start()
data, target = RandomOverSampler().fit_resample( data, df.has_cactus )
current3, peak3 = tracemalloc.get_traced_memory()
print("Get_dummies memory usage is {",current3 /1024/1024,"}MB; Peak memory was :{",peak3 / 1024/1024,"}MB")

Expected Results

The memory usages on different versions are same.

Actual Results

0.8.1 & 0.9.1 take more memory usage.

Versions

I test this code on imblearn 0.9.1, 0.8.1 and 0.7.0.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions