Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where/mask methods for Series #2337

Closed
wants to merge 7 commits into from

Conversation

@jreback
Copy link
Contributor

commented Nov 23, 2012

add where and mask methods for Series, analagous to DataFrame methods added for 0.9.1
passes all tests

where is equivalent to: s[cond].reindex_like(s).fillna(other)

In [7]: s = pd.Series(np.random.rand(5))

In [8]: s
Out[8]: 
0    0.638664
1    0.574688
2    0.460510
3    0.641840
4    0.044129

In [10]: s[0:2] = -s[0:2]

In [11]: s
Out[11]: 
0   -0.638664
1   -0.574688
2    0.460510
3    0.641840
4    0.044129

boolean selection

In [12]: s[s>0]
Out[12]: 
2    0.460510
3    0.641840
4    0.044129

In [13]: s.where(s>0)
Out[13]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

In [14]: s.where(s>0,-s)
Out[14]: 
0    0.638664
1    0.574688
2    0.460510
3    0.641840
4    0.044129

In [15]: s.mask(s<=0)
Out[15]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

support setting as well (though not used anywhere explicity)

In [16]: s2 = s.copy()

In [17]: s2.where(s2>0,inplace=True)
Out[17]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

In [18]: s2
Out[18]: 
0         NaN
1         NaN
2    0.460510
3    0.641840
4    0.044129

jreback added 4 commits Nov 15, 2012
changes in pandas/io/pytables.py
  1. added __str__ (to do __repr__)
  2. row removal in tables is much faster if rows are consecutive
  3. added Term class, refactored Selection (this is backdwards compatible)
     Term is a concise way of specifying conditions for queries, e.g.

        Term(dict(field = 'index', op = '>', value = '20121114'))
        Term('index', '20121114')
        Term('index', '>', '20121114')
        Term('index', ['20121114','20121114'])
        Term('index', datetime(2012,11,14))
        Term('index>20121114')

     updated tests for same

  this should close GH #1996
a store would fail if appending but the a put had not been done befor…
…e (see test_append)

this the result of incompatibility testing on the index_kind
added create_table_index to index tables
  think about doing this automagically for tables
@thisch

This comment has been minimized.

Copy link

commented on doc/source/io.rst in 0fcae82 Nov 15, 2012

it should be 'on-disk tables', right?

jreback added 2 commits Nov 15, 2012
added min_itemsize parameter and checks in pytables to allow setting …
…of index columns minimum size

changed pytables version test for indexing around a bit
added Col class to manage the column conversions
added alias to the Term class; you can specify the nomial indexers (e.g. index in DataFrame, major_axis/minor_axis or alias in Panel)
updated docs for pytables to reflect these changes
updated docs for indexing to incorporate whatsnew 0.9.1 for where and mask
@jreback

This comment has been minimized.

Copy link
Owner Author

commented on doc/source/conf.py in fadcdd1 Nov 16, 2012

IGNORE THIS.....makes my sphinx work!

add where and mask methods to Series. where returns a series evaluate…
…d for the cond with a shape like the original

@jreback jreback closed this Nov 23, 2012

@jreback jreback reopened this Nov 23, 2012

changhiskhan added a commit that referenced this pull request Nov 24, 2012
@changhiskhan

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2012

I cherry-picked this.
I changed the implementation of where a little bit as _set_value would have failed for non-scalar other.
I also added a few more test cases.

Thanks for the PR!

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 24, 2012

added docs for these in commit: 2d57979

changhiskhan pushed a commit that referenced this pull request Nov 24, 2012
@changhiskhan

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2012

cherry-picked. Thank you!

@durden

This comment has been minimized.

Copy link
Contributor

commented Nov 28, 2012

Are the where and mask methods supposed to be included in 0.9.1? I'm not seeing them, but maybe I've got something setup incorrectly?

>>> pandas.__version__
'0.9.1'
>>> df['col2'].where
Traceback (most recent call last):
  File "<ipython-input-19-885524901cf2>", line 1, in <module>
    df['col2'].where
AttributeError: 'Series' object has no attribute 'where'
>>> dir(df['col2'])
['T', '_AXIS_ALIASES', '_AXIS_NAMES', '_AXIS_NUMBERS', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_wrap__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__delslice__', '__dict__', '__div__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', '__init__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lshift__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setslice__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__weakref__', '__xor__', '_agg_by_level', '_binop', '_can_hold_na', '_check_bool_indexer', '_constructor', '_get_axis', '_get_axis_name', '_get_axis_number', '_get_repr', '_get_val_at', '_get_values', '_get_values_tuple', '_get_with', '_index', '_ix', '_reindex_indexer', '_repr_footer', '_set_labels', '_set_values', '_set_with', '_tidy_repr', 'abs', 'add', 'align', 'all', 'any', 'append', 'apply', 'argmax', 'argmin', 'argsort', 'asfreq', 'asof', 'astype', 'at_time', 'autocorr', 'base', 'between', 'between_time', 'byteswap', 'choose', 'clip', 'clip_lower', 'clip_upper', 'combine', 'combine_first', 'compress', 'conj', 'conjugate', 'copy', 'corr', 'count', 'cov', 'ctypes', 'cummax', 'cummin', 'cumprod', 'cumsum', 'data', 'describe', 'diagonal', 'diff', 'div', 'dot', 'drop', 'dropna', 'dtype', 'dump', 'dumps', 'fill', 'fillna', 'first', 'first_valid_index', 'flags', 'flat', 'flatten', 'from_array', 'from_csv', 'get', 'get_value', 'getfield', 'groupby', 'head', 'hist', 'idxmax', 'idxmin', 'iget', 'iget_value', 'imag', 'index', 'interpolate', 'irow', 'isin', 'isnull', 'item', 'itemset', 'itemsize', 'iteritems', 'iterkv', 'ix', 'keys', 'kurt', 'last', 'last_valid_index', 'load', 'mad', 'map', 'max', 'mean', 'median', 'min', 'mul', 'name', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'notnull', 'nunique', 'order', 'pct_change', 'plot', 'prod', 'ptp', 'put', 'quantile', 'rank', 'ravel', 'real', 'reindex', 'reindex_like', 'rename', 'reorder_levels', 'repeat', 'replace', 'resample', 'reset_index', 'reshape', 'resize', 'round', 'save', 'searchsorted', 'select', 'set_value', 'setasflat', 'setfield', 'setflags', 'shape', 'shift', 'size', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'std', 'str', 'strides', 'sub', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_csv', 'to_dict', 'to_sparse', 'to_string', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'valid', 'value_counts', 'values', 'var', 'view', 'weekday']
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Nov 28, 2012

in 0.9.1 they only exist in a DataFrame
0.10.0 will support Series as well

docs will explain the slight difference

eg

s[cond] will return a subset of rows
but df[cond] returns a same sized frame
if s is a series and df is a frame

On Nov 28, 2012, at 3:22 PM, Luke Lee notifications@github.com wrote:

Are the where and mask methods supposed to be included in 0.9.1? I'm not seeing them, but maybe I've got something setup incorrectly?

pandas.version
'0.9.1'
df['col2'].where
Traceback (most recent call last):
File "", line 1, in
df['col2'].where
AttributeError: 'Series' object has no attribute 'where'
dir(df['col2'])
['T', '_AXIS_ALIASES', '_AXIS_NAMES', '_AXIS_NUMBERS', 'abs', 'add', 'and', 'array', 'array_finalize', 'array_interface', 'array_prepare', 'array_priority', 'array_struct', 'array_wrap', 'class', 'contains'
, 'copy', 'deepcopy', 'delattr', 'delitem', 'delslice', 'dict', 'div', 'divmod', 'doc', 'eq', 'float', 'floordiv', 'format', 'ge', 'getattribute', 'getitem', 'getslice', 'gt', 'hash', 'hex', 'iadd', 'iand', 'idiv', 'ifloordiv', 'ilshift', 'imod', 'imul', 'index', 'init', 'int', 'invert', 'ior', 'ipow', <
span class="s">'irshift', 'isub', 'iter', 'itruediv', 'ixor', 'le', 'len', 'long', 'lshift', 'lt', 'mod', 'module', 'mul', 'ne', 'neg', 'new', 'nonzero', <
span class="s">'oct', 'or', 'pos', 'pow', 'radd', 'rand', 'rdiv', 'rdivmod', 'reduce', 'reduce_ex', 'repr', 'rfloordiv', 'rlshift', 'rmod', 'rmul', 'ror', 'rpow',
'rrshift', 'rshift', 'rsub', 'rtruediv', 'rxor', 'setattr', 'setitem', 'setslice', 'setstate', 'sizeof', 'str', 'sub', 'subclasshook', 'truediv', 'weakref', 'xor', '
_agg_by_level', '_binop', '_can_hold_na', '_check_bool_indexer', '_constructor', '_get_axis', '_get_axis_name', '_get_axis_number', '_get_repr', '_get_val_at', '_get_values', '_get_values_tuple', '_get_with', '_index', '_ix', '_reindex_indexer', 'repr
footer', '_set_labels', '_set_values', '_set_with', '_tidy_repr', 'abs', 'add', 'align', 'all', 'any', 'append', 'apply', 'argmax', 'argmin', 'argsort', 'asfreq', 'asof', 'astype', 'at_time', 'autocorr', 'base', 'between', 'between_time', 'byteswap', 'choose', 'clip', 'clip_lower', 'clip_upper', 'combine', 'combine_first', 'compress', 'conj', 'conjugate', 'copy', 'corr', 'count', 'cov', 'ctypes', 'cummax', 'cummin', 'cumprod', 'cumsum', 'data', 'describe', 'diagonal', 'diff', 'div', 'dot', 'drop', 'dropna', 'dtype', 'dump', 'dumps', 'fill', 'fillna', 'first', 'first_valid_index', 'flags', 'flat', 'flatten', 'from_array', 'from_csv', 'get', 'get_value', 'getfield', 'groupby', 'head', 'hist', 'idxmax', 'idxmin', 'iget', 'iget_value', 'imag', 'index', 'interpolate', 'irow', 'isin', 'isnull', 'item', 'itemset', 'itemsize', 'iteritems', 'iterkv', 'ix', 'keys', 'kurt', 'last', 'last_valid_index', 'load', 'mad', 'map', 'max', 'mean', 'median', 'min', 'mul', 'name', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'notnull', 'nunique', 'order', 'pct_change', 'plot', 'prod', 'ptp', 'put', 'quantile', 'rank', 'ravel', 'real', 'reindex', 'reindex_like', 'rename', 'reorder_levels', 'repeat', 'replace', 'resample', 'reset_index', 'reshape', 'resize', 'round', 'save', 'searchsorted', 'select', 'set_value', 'setasflat', 'setfield', 'setflags', 'shape', 'shift', 'size', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'std', 'str', <
span class="s">'strides', 'sub', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_csv', 'to_dict', 'to_sparse', 'to_string', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'valid', 'value_counts', 'values', 'var', 'view', 'weekday']


Reply to this email directly or view it on GitHub.

@durden

This comment has been minimized.

Copy link
Contributor

commented Nov 28, 2012

@jreback Ah, that clears up the confusion. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.