forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 1
/
v0.7.0.txt
272 lines (192 loc) · 9.34 KB
/
v0.7.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
.. _whatsnew_0700:
v.0.7.0 (February 9, 2012)
--------------------------
New features
~~~~~~~~~~~~
- New unified :ref:`merge function <merging.join>` for efficiently performing
full gamut of database / relational-algebra operations. Refactored existing
join methods to use the new infrastructure, resulting in substantial
performance gains (:issue:`220`, :issue:`249`, :issue:`267`)
- New :ref:`unified concatenation function <merging.concat>` for concatenating
Series, DataFrame or Panel objects along an axis. Can form union or
intersection of the other axes. Improves performance of ``Series.append`` and
``DataFrame.append`` (:issue:`468`, :issue:`479`, :issue:`273`)
- :ref:`Can <merging.concatenation>` pass multiple DataFrames to
`DataFrame.append` to concatenate (stack) and multiple Series to
``Series.append`` too
- :ref:`Can<basics.dataframe.from_list_of_dicts>` pass list of dicts (e.g., a
list of JSON objects) to DataFrame constructor (:issue:`526`)
- You can now :ref:`set multiple columns <indexing.columns.multiple>` in a
DataFrame via ``__getitem__``, useful for transformation (:issue:`342`)
- Handle differently-indexed output values in ``DataFrame.apply`` (:issue:`498`)
.. ipython:: python
df = DataFrame(randn(10, 4))
df.apply(lambda x: x.describe())
- :ref:`Add<indexing.reorderlevels>` ``reorder_levels`` method to Series and
DataFrame (:issue:`534`)
- :ref:`Add<indexing.dictionarylike>` dict-like ``get`` function to DataFrame
and Panel (:issue:`521`)
- :ref:`Add<basics.iterrows>` ``DataFrame.iterrows`` method for efficiently
iterating through the rows of a DataFrame
- :ref:`Add<dsintro.to_panel>` ``DataFrame.to_panel`` with code adapted from
``LongPanel.to_long``
- :ref:`Add <basics.reindexing>` ``reindex_axis`` method added to DataFrame
- :ref:`Add <basics.stats>` ``level`` option to binary arithmetic functions on
``DataFrame`` and ``Series``
- :ref:`Add <indexing.advanced_reindex>` ``level`` option to the ``reindex``
and ``align`` methods on Series and DataFrame for broadcasting values across
a level (:issue:`542`, :issue:`552`, others)
- :ref:`Add <dsintro.panel_item_selection>` attribute-based item access to
``Panel`` and add IPython completion (:issue:`563`)
- :ref:`Add <visualization.basic>` ``logy`` option to ``Series.plot`` for
log-scaling on the Y axis
- :ref:`Add <io.formatting>` ``index`` and ``header`` options to
``DataFrame.to_string``
- :ref:`Can <merging.multiple_join>` pass multiple DataFrames to
``DataFrame.join`` to join on index (:issue:`115`)
- :ref:`Can <merging.multiple_join>` pass multiple Panels to ``Panel.join``
(:issue:`115`)
- :ref:`Added <io.formatting>` ``justify`` argument to ``DataFrame.to_string``
to allow different alignment of column headers
- :ref:`Add <groupby.attributes>` ``sort`` option to GroupBy to allow disabling
sorting of the group keys for potential speedups (:issue:`595`)
- :ref:`Can <basics.dataframe.from_series>` pass MaskedArray to Series
constructor (:issue:`563`)
- :ref:`Add <dsintro.panel_item_selection>` Panel item access via attributes
and IPython completion (:issue:`554`)
- Implement ``DataFrame.lookup``, fancy-indexing analogue for retrieving values
given a sequence of row and column labels (:issue:`338`)
- Can pass a :ref:`list of functions <groupby.aggregate.multifunc>` to
aggregate with groupby on a DataFrame, yielding an aggregated result with
hierarchical columns (:issue:`166`)
- Can call ``cummin`` and ``cummax`` on Series and DataFrame to get cumulative
minimum and maximum, respectively (:issue:`647`)
- ``value_range`` added as utility function to get min and max of a dataframe
(:issue:`288`)
- Added ``encoding`` argument to ``read_csv``, ``read_table``, ``to_csv`` and
``from_csv`` for non-ascii text (:issue:`717`)
- :ref:`Added <basics.stats>` ``abs`` method to pandas objects
- :ref:`Added <reshaping.pivot>` ``crosstab`` function for easily computing frequency tables
- :ref:`Added <indexing.set_ops>` ``isin`` method to index objects
- :ref:`Added <indexing.xs>` ``level`` argument to ``xs`` method of DataFrame.
API Changes to integer indexing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the potentially riskiest API changes in 0.7.0, but also one of the most
important, was a complete review of how **integer indexes** are handled with
regard to label-based indexing. Here is an example:
.. ipython:: python
s = Series(randn(10), index=range(0, 20, 2))
s
s[0]
s[2]
s[4]
This is all exactly identical to the behavior before. However, if you ask for a
key **not** contained in the Series, in versions 0.6.1 and prior, Series would
*fall back* on a location-based lookup. This now raises a ``KeyError``:
.. code-block:: ipython
In [2]: s[1]
KeyError: 1
This change also has the same impact on DataFrame:
.. code-block:: ipython
In [3]: df = DataFrame(randn(8, 4), index=range(0, 16, 2))
In [4]: df
0 1 2 3
0 0.88427 0.3363 -0.1787 0.03162
2 0.14451 -0.1415 0.2504 0.58374
4 -1.44779 -0.9186 -1.4996 0.27163
6 -0.26598 -2.4184 -0.2658 0.11503
8 -0.58776 0.3144 -0.8566 0.61941
10 0.10940 -0.7175 -1.0108 0.47990
12 -1.16919 -0.3087 -0.6049 -0.43544
14 -0.07337 0.3410 0.0424 -0.16037
In [5]: df.ix[3]
KeyError: 3
In order to support purely integer-based indexing, the following methods have
been added:
.. csv-table::
:header: "Method","Description"
:widths: 40,60
``Series.iget_value(i)``, Retrieve value stored at location ``i``
``Series.iget(i)``, Alias for ``iget_value``
``DataFrame.irow(i)``, Retrieve the ``i``-th row
``DataFrame.icol(j)``, Retrieve the ``j``-th column
"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
API tweaks regarding label-based slicing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Label-based slicing using ``ix`` now requires that the index be sorted
(monotonic) **unless** both the start and endpoint are contained in the index:
.. ipython:: python
s = Series(randn(6), index=list('gmkaec'))
s
Then this is OK:
.. ipython:: python
s.ix['k':'e']
But this is not:
.. code-block:: ipython
In [12]: s.ix['b':'h']
KeyError 'b'
If the index had been sorted, the "range selection" would have been possible:
.. ipython:: python
s2 = s.sort_index()
s2
s2.ix['b':'h']
Changes to Series ``[]`` operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As as notational convenience, you can pass a sequence of labels or a label
slice to a Series when getting and setting values via ``[]`` (i.e. the
``__getitem__`` and ``__setitem__`` methods). The behavior will be the same as
passing similar input to ``ix`` **except in the case of integer indexing**:
.. ipython:: python
s = Series(randn(6), index=list('acegkm'))
s
s[['m', 'a', 'c', 'e']]
s['b':'l']
s['c':'k']
In the case of integer indexes, the behavior will be exactly as before
(shadowing ``ndarray``):
.. ipython:: python
s = Series(randn(6), index=range(0, 12, 2))
s[[4, 0, 2]]
s[1:5]
If you wish to do indexing with sequences and slicing on an integer index with
label semantics, use ``ix``.
Other API Changes
~~~~~~~~~~~~~~~~~
- The deprecated ``LongPanel`` class has been completely removed
- If ``Series.sort`` is called on a column of a DataFrame, an exception will
now be raised. Before it was possible to accidentally mutate a DataFrame's
column by doing ``df[col].sort()`` instead of the side-effect free method
``df[col].order()`` (:issue:`316`)
- Miscellaneous renames and deprecations which will (harmlessly) raise
``FutureWarning``
- ``drop`` added as an optional parameter to ``DataFrame.reset_index`` (:issue:`699`)
Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- :ref:`Cythonized GroupBy aggregations <groupby.aggregate.cython>` no longer
presort the data, thus achieving a significant speedup (:issue:`93`). GroupBy
aggregations with Python functions significantly sped up by clever
manipulation of the ndarray data type in Cython (:issue:`496`).
- Better error message in DataFrame constructor when passed column labels
don't match data (:issue:`497`)
- Substantially improve performance of multi-GroupBy aggregation when a
Python function is passed, reuse ndarray object in Cython (:issue:`496`)
- Can store objects indexed by tuples and floats in HDFStore (:issue:`492`)
- Don't print length by default in Series.to_string, add `length` option (:issue:`489`)
- Improve Cython code for multi-groupby to aggregate without having to sort
the data (:issue:`93`)
- Improve MultiIndex reindexing speed by storing tuples in the MultiIndex,
test for backwards unpickling compatibility
- Improve column reindexing performance by using specialized Cython take
function
- Further performance tweaking of Series.__getitem__ for standard use cases
- Avoid Index dict creation in some cases (i.e. when getting slices, etc.),
regression from prior versions
- Friendlier error message in setup.py if NumPy not installed
- Use common set of NA-handling operations (sum, mean, etc.) in Panel class
also (:issue:`536`)
- Default name assignment when calling ``reset_index`` on DataFrame with a
regular (non-hierarchical) index (:issue:`476`)
- Use Cythonized groupers when possible in Series/DataFrame stat ops with
``level`` parameter passed (:issue:`545`)
- Ported skiplist data structure to C to speed up ``rolling_median`` by about
5-10x in most typical use cases (:issue:`374`)