Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
MemoryError when using .loc or .ix #4280
Comments
|
pretty sure this is fixed in 0.12; can u try on master? |
BAM-BAM-BAM
commented
Jul 17, 2013
In [145]: pandas.__version__ Out[145]: '0.12.0rc1' is rc1 not master? |
|
yes can u provide your dataset (a link via dropbox of something like that) and specs: python,numpy versions, os (32//64 bit), memory size thanks |
BAM-BAM-BAM
commented
Jul 17, 2013
import sys
print(sys.version)
2.7 (r27:82500, Jan 10 2013, 09:03:02)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)]
print numpy.__version__
1.7.1
!free -g
total used free shared buffers cached
Mem: 39 35 3 0 0 17
-/+ buffers/cache: 17 21
Swap: 11 10 1
!cat /etc/*-release
Scientific Linux release 6.1 (Carbon)
!uname -a
Linux analyticsdev1 2.6.32-220.4.1.el6.x86_64 #1 SMP Mon Jan 23 17:20:44 CST 2012 x86_64 x86_64 x86_64 GNU/Linux
Unfortunately I'm not sure I can provide a dataset, we have contracts with our clients which prevent us from sharing specifics. Will have to check on that. |
|
how much memory do you have in GB total? |
|
@BAM-BAM-BAM You are using the commit marked @jreback This might be a general issue with using |
|
or is that above in GB? |
|
|
jreback
referenced
this issue
Jul 18, 2013
Merged
BUG: Fixed non-unique indexing memory allocation issue with .ix/.loc (GH4280) #4283
|
ok...all fixed up...this was a bug in the way was allocating memory for determinig non-unique indexers (which only showed up with a large frame and a large number of locations to index) closed by #4283 thanks for the report! |
jreback
closed this
in #4283
Jul 18, 2013
BAM-BAM-BAM
commented
Jul 18, 2013
|
Great thanks! |
|
if u can pull down master and give a try would e great |
BAM-BAM-BAM commentedJul 17, 2013
from pandas import * df = read_csv(open('mydata.csv.gz', 'r'), compression='gzip', index_col=False) df = df[(df.land != 1)] print df # # Int64Index: 977579 entries, 0 to 1100398 # Data columns (total 89 columns): # # sample 100,000 rows, only use some of the columns rows = np.random.choice(df.index.values, 100000) keep_cols = ['sq_ft', 'zip', 'year', 'bathrooms', 'bedrooms', 'floors'] sampled_df = df.ix[rows, keep_cols] sampled_df.loc[sampled_df.year.notnull()].year # works fine sampled_df.loc[sampled_df.year.notnull(),['year']] # MemoryError --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) in () 1 #sampled_df.loc[sampled_df['year'].notnull(),['year']] 2 sampled_df.loc[sampled_df.year.notnull()].year ----> 3 sampled_df.loc[sampled_df.year.notnull(),['year']] /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key) 695 def __getitem__(self, key): 696 if type(key) is tuple: --> 697 return self._getitem_tuple(key) 698 else: 699 return self._getitem_axis(key, axis=0) /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup) 260 # ugly hack for GH #836 261 if self._multi_take_opportunity(tup): --> 262 return self._multi_take(tup) 263 264 # no shortcut needed /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/core/indexing.pyc in _multi_take(self, tup) 300 index = self._convert_for_reindex(tup[0], axis=0) 301 columns = self._convert_for_reindex(tup[1], axis=1) --> 302 return self.obj.reindex(index=index, columns=columns) 303 elif isinstance(self.obj, Panel4D): 304 conv = [self._convert_for_reindex(x, axis=i) /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/core/frame.pyc in reindex(self, index, columns, method, level, fill_value, limit, copy, takeable) 2623 if index is not None: 2624 frame = frame._reindex_index(index, method, copy, level, -> 2625 fill_value, limit, takeable) 2626 2627 return frame /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/core/frame.pyc in _reindex_index(self, new_index, method, copy, level, fill_value, limit, takeable) 2703 new_index, indexer = self.index.reindex(new_index, method, level, 2704 limit=limit, copy_if_needed=True, -> 2705 takeable=takeable) 2706 return self._reindex_with_indexers(new_index, indexer, None, None, 2707 copy, fill_value) /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/core/index.pyc in reindex(self, target, method, level, limit, copy_if_needed, takeable) 930 raise ValueError("cannot reindex a non-unique index " 931 "with a method or limit") --> 932 indexer, _ = self.get_indexer_non_unique(target) 933 934 return target, indexer /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer_non_unique(self, target, **kwargs) 843 tgt_values = target.values 844 --> 845 indexer, missing = self._engine.get_indexer_non_unique(tgt_values) 846 return Index(indexer), missing 847 /home/jprior/Scratch/VENV1/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_indexer_non_unique (pandas/index.c:5049)() MemoryError:Sorry I haven't figured out how to reproduce the error with a toy example.