Crashes my RAM #27

abhigenie92 · 2015-10-25T22:02:50Z

I am using oja's rule on dataset of size 400x156300. It seems to crash my RAM. I am not sure what is causing this. Please help.
I have 12 GB of RAM.
Tried using memmap still crash!!

convert memmap and reduce precision

[num_sample,num_feat]=train_data.shape
filename = path.join(mkdtemp(), 'train_data.dat')
memmap_train = np.memmap(filename, dtype='float32', mode='w+', shape=(num_sample,num_feat))
memmap_train[:] = train_data[:]
del train_data,test_data

apply oja's rule

ojanet = algorithms.Oja(minimized_data_size=1250,step=1e-10,verbose=True,show_epoch=1)
ojanet.train(memmap_train, epsilon=1e-3,epochs=10000)
red_train_data = ojanet.predict(memmap_train)
ojanet.plot_errors(logx=False)
pdb.set_trace()

itdxer · 2015-10-26T07:50:03Z

Hi,
I think problem could be a big number of outputs in terminal. Could you please try to set up value show_epoch equal to 1000?

That is a good issue, I will put some blocker that control number of outputs per second or something like that.

itdxer · 2015-10-26T08:54:34Z

But on second thought this operation shouldn't be so fast for such a big matrix. When I run some memory tests I write the results.

abhigenie92 · 2015-10-26T11:21:59Z

Thanks for response!!

I tried with show_epoch=1000, still seems to crash after a few iterations ~15. Are you sure that is the issue?
Speed:
Yeah I know, in matlab my code takes 60 secs for iteration, This code takes 11secs for an iteration. So it is relatively fast. Please let me know when you fix the bug.

itdxer · 2015-10-26T11:24:19Z

I identify some memory leakage , but I'm not quit sure about the main source of the problem. I will notice you a soon as I will fix it.

itdxer · 2015-10-26T13:30:48Z

Python deallocate useless objects from memory after a while and it seams like it didn't have time to do this because the memory overflow.
Could you please install library from the bug-27-oja-mem-leakage branch and check it? Code below works fine for me.

import numpy as np
from neupy import algorithms

data = np.random.randn(400, 156300)
ojanet = algorithms.Oja(
    minimized_data_size=1250,
    step=1e-10,
    verbose=True,
    show_epoch=10,
    shuffle_data=False,
)
ojanet.train(data, epsilon=1e-3, epochs=300)

abhigenie92 · 2015-10-26T21:17:15Z

Thanks! Works now!
Though when for same dataset size when I try minimized_data_size=4000, it seems to crash my RAM again. If possible can you share some details regarding the memory requirements.

itdxer · 2015-10-27T08:29:30Z

Here is some information about the object sizes in GB (numbers that divided by 1024 ** 3):

>>> import numpy as np
>>> data = np.random.random((400, 156300))
>>> data.shape
(400, 156300)
>>> import sys
>>> sys.getsizeof(data)
500160112
>>> sys.getsizeof(data) / 1024 ** 3
0.4658104032278061
>>> weights = np.random.random((156300, 4000))
>>> sys.getsizeof(weights) / 1024 ** 3
4.658103093504906

matrix 400 x 156300 weight 0.47 GB and weight matrix 10 times bigger (because one size is 10 times bigger) which is approximately 4.7 GB.

Instead of these two matrices you also should build reconstructed matrix that have size the same as input data, so memory size probably should be almost the same. And you also should store weight delta that you use after each iteration. Weight delta should have the same size as weight matrix. From my point of view some good rule of some for this algorithm should be:

2 * (Input matrix memory size + Weight matrix memory size) = 2 * (0.47 GB + 4.7 GB) = 10.34 GB

It's not a super accurate formula, but it can give you some intuition about your memory limits.

P.S. Actually formula for the memory limits should give you lower possible bound, because there exists more variables that are less significant. For example there exist one matrix which is 400 x 4000 and store minimized data (in this case it should be approximately 12 Mb). For the other data sample that have more rows that columns this parameter should give a more significant value.

Little bit more general formula should be:

2 * (Input matrix memory size + Weight matrix memory size) + Minimized matrix memory size

itdxer · 2015-10-27T08:30:58Z

This problem can be solved by partial matrix multiplication where you work only with a part of matrix at a time. That could be a good issue for future. I will think about the memory efficient solution for this task that will help with compute big matrices with less amount of memory.

itdxer · 2015-10-27T09:06:52Z

I will add this fix in v0.1.4

itdxer · 2015-10-27T11:03:37Z

A small clue to make your computation more efficient and reduce matrices memory size.

>>> import numpy as np
>>> import sys
>>> sys.getsizeof(np.random.random((400, 153600)).astype(np.float16)) / 1024 ** 3
0.11444102227687836
>>> sys.getsizeof(np.random.random((4000, 153600)).astype(np.float64)) / 1024 ** 3
4.577636823058128
>>> sys.getsizeof(np.random.random((4000, 153600)).astype(np.float16)) / 1024 ** 3
1.1444092839956284
>>> 4.58 / 1.14
4.017543859649123

As you can see, now you need 4 times less memory space for this problem. One problem is that you instead of 11 bits exponent, 52 bits mantissa for the float number have only 5 bits exponent and 10 bits mantissa.

Note: numpy.float32 should have twice smaller size than numpy.float64 and it's faster as well. numpy.float16 matrix have a smaller size in memory but it would be very slow in matrix operations

itdxer added the bug label Oct 26, 2015

itdxer self-assigned this Oct 26, 2015

itdxer closed this as completed Oct 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashes my RAM #27

Crashes my RAM #27

abhigenie92 commented Oct 25, 2015

itdxer commented Oct 26, 2015

itdxer commented Oct 26, 2015

abhigenie92 commented Oct 26, 2015

itdxer commented Oct 26, 2015

itdxer commented Oct 26, 2015

abhigenie92 commented Oct 26, 2015

itdxer commented Oct 27, 2015

itdxer commented Oct 27, 2015

itdxer commented Oct 27, 2015

itdxer commented Oct 27, 2015

Crashes my RAM #27

Crashes my RAM #27

Comments

abhigenie92 commented Oct 25, 2015

convert memmap and reduce precision

apply oja's rule

itdxer commented Oct 26, 2015

itdxer commented Oct 26, 2015

abhigenie92 commented Oct 26, 2015

itdxer commented Oct 26, 2015

itdxer commented Oct 26, 2015

abhigenie92 commented Oct 26, 2015

itdxer commented Oct 27, 2015

itdxer commented Oct 27, 2015

itdxer commented Oct 27, 2015

itdxer commented Oct 27, 2015