New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes my RAM #27
Comments
Hi, That is a good issue, I will put some blocker that control number of outputs per second or something like that. |
But on second thought this operation shouldn't be so fast for such a big matrix. When I run some memory tests I write the results. |
Thanks for response!! I tried with show_epoch=1000, still seems to crash after a few iterations ~15. Are you sure that is the issue? |
I identify some memory leakage , but I'm not quit sure about the main source of the problem. I will notice you a soon as I will fix it. |
Python deallocate useless objects from memory after a while and it seams like it didn't have time to do this because the memory overflow. import numpy as np
from neupy import algorithms
data = np.random.randn(400, 156300)
ojanet = algorithms.Oja(
minimized_data_size=1250,
step=1e-10,
verbose=True,
show_epoch=10,
shuffle_data=False,
)
ojanet.train(data, epsilon=1e-3, epochs=300) |
Thanks! Works now! |
Here is some information about the object sizes in GB (numbers that divided by 1024 ** 3): >>> import numpy as np
>>> data = np.random.random((400, 156300))
>>> data.shape
(400, 156300)
>>> import sys
>>> sys.getsizeof(data)
500160112
>>> sys.getsizeof(data) / 1024 ** 3
0.4658104032278061
>>> weights = np.random.random((156300, 4000))
>>> sys.getsizeof(weights) / 1024 ** 3
4.658103093504906 matrix 400 x 156300 weight 0.47 GB and weight matrix 10 times bigger (because one size is 10 times bigger) which is approximately 4.7 GB. Instead of these two matrices you also should build reconstructed matrix that have size the same as input data, so memory size probably should be almost the same. And you also should store weight delta that you use after each iteration. Weight delta should have the same size as weight matrix. From my point of view some good rule of some for this algorithm should be: 2 * (Input matrix memory size + Weight matrix memory size) = 2 * (0.47 GB + 4.7 GB) = 10.34 GB It's not a super accurate formula, but it can give you some intuition about your memory limits. P.S. Actually formula for the memory limits should give you lower possible bound, because there exists more variables that are less significant. For example there exist one matrix which is 400 x 4000 and store minimized data (in this case it should be approximately 12 Mb). For the other data sample that have more rows that columns this parameter should give a more significant value. Little bit more general formula should be: 2 * (Input matrix memory size + Weight matrix memory size) + Minimized matrix memory size |
This problem can be solved by partial matrix multiplication where you work only with a part of matrix at a time. That could be a good issue for future. I will think about the memory efficient solution for this task that will help with compute big matrices with less amount of memory. |
I will add this fix in v0.1.4 |
A small clue to make your computation more efficient and reduce matrices memory size. >>> import numpy as np
>>> import sys
>>> sys.getsizeof(np.random.random((400, 153600)).astype(np.float16)) / 1024 ** 3
0.11444102227687836
>>> sys.getsizeof(np.random.random((4000, 153600)).astype(np.float64)) / 1024 ** 3
4.577636823058128
>>> sys.getsizeof(np.random.random((4000, 153600)).astype(np.float16)) / 1024 ** 3
1.1444092839956284
>>> 4.58 / 1.14
4.017543859649123 As you can see, now you need 4 times less memory space for this problem. One problem is that you instead of 11 bits exponent, 52 bits mantissa for the float number have only 5 bits exponent and 10 bits mantissa. Note: |
I am using oja's rule on dataset of size 400x156300. It seems to crash my RAM. I am not sure what is causing this. Please help.
I have 12 GB of RAM.
Tried using memmap still crash!!
convert memmap and reduce precision
[num_sample,num_feat]=train_data.shape
filename = path.join(mkdtemp(), 'train_data.dat')
memmap_train = np.memmap(filename, dtype='float32', mode='w+', shape=(num_sample,num_feat))
memmap_train[:] = train_data[:]
del train_data,test_data
apply oja's rule
ojanet = algorithms.Oja(minimized_data_size=1250,step=1e-10,verbose=True,show_epoch=1)
ojanet.train(memmap_train, epsilon=1e-3,epochs=10000)
red_train_data = ojanet.predict(memmap_train)
ojanet.plot_errors(logx=False)
pdb.set_trace()
The text was updated successfully, but these errors were encountered: