Permalink
Newer
Older
100644 418 lines (332 sloc) 17.5 KB
1
=============
2
Release Notes
3
=============
5
This is the list of changes to pandas between each release. For full details,
6
see the commit logs at http://github.com/wesm/pandas
7
8
9
pandas 0.4.1
10
============
11
12
**Release date:** Not yet released
13
14
This is a bug fix release
15
16
**New features / modules**
17
18
- Added new `DataFrame` methods `get_dtype_counts` and property `dtypes`
19
22
- Fixed DataFrame constructor bug causing downstream problems (e.g. .copy()
23
failing) when passing a Series as the values along with a column name and
24
index
25
- Fixed single-key groupby on DataFrame with as_index=False (GH #160)
27
**Improvements to existing features**
28
29
- Some speed enhancements with internal Index type-checking function
30
37
**pandas** is a library of powerful labeled-axis data structures, statistical
38
tools, and general code for working with relational data sets, including time
39
series and cross-sectional data. It was designed with the practical needs of
40
statistical modeling and large, inhomogeneous data sets in mind. It is
41
particularly well suited for, among other things, financial data analysis
42
applications.
46
47
Source code: http://github.com/wesm/pandas
48
Binary installers on PyPI: http://pypi.python.org/pypi/pandas
49
Documentation: http://pandas.sourceforge.net
50
51
Release notes
54
**Release date:** 9/12/2011
58
- `pandas.core.sparse` module: "Sparse" (mostly-NA, or some other fill value)
59
versions of `Series`, `DataFrame`, and `Panel`. For low-density data, this
Sep 1, 2011
60
will result in significant performance boosts, and smaller memory
61
footprint. Added `to_sparse` methods to `Series`, `DataFrame`, and
62
`Panel`. See online documentation for more on these
63
- Fancy indexing operator on Series / DataFrame, e.g. via .ix operator. Both
Sep 1, 2011
64
getting and setting of values is supported; however, setting values will only
65
currently work on homogeneously-typed DataFrame objects. Things like:
66
67
* series.ix[[d1, d2, d3]]
68
* frame.ix[5:10, ['C', 'B', 'A']], frame.ix[5:10, 'A':'C']
69
* frame.ix[date1:date2]
70
71
- Significantly enhanced `groupby` functionality
Sep 1, 2011
72
73
* Can groupby multiple keys, e.g. df.groupby(['key1', 'key2']). Iteration with
74
multiple groupings products a flattened tuple
75
* "Nuisance" columns (non-aggregatable) will automatically be excluded from
76
DataFrame aggregation operations
77
* Added automatic "dispatching to Series / DataFrame methods to more easily
78
invoke methods on groups. e.g. s.groupby(crit).std() will work even though
79
`std` is not implemented on the `GroupBy` class
80
Sep 6, 2011
81
- Hierarchical / multi-level indexing
82
83
* New the `MultiIndex` class. Integrated `MultiIndex` into `Series` and
84
`DataFrame` fancy indexing, slicing, __getitem__ and __setitem,
85
reindexing, etc. Added `level` keyword argument to `groupby` to enable
86
grouping by a level of a `MultiIndex`
87
88
- New data reshaping functions: `stack` and `unstack` on DataFrame and Series
89
90
* Integrate with MultiIndex to enable sophisticated reshaping of data
91
92
- `Index` objects (labels for axes) are now capable of holding tuples
93
- `Series.describe`, `DataFrame.describe`: produces an R-like table of summary
Sep 1, 2011
94
statistics about each data column
95
- `DataFrame.quantile`, `Series.quantile` for computing sample quantiles of data
Sep 1, 2011
96
across requested axis
97
- Added general `DataFrame.dropna` method to replace `dropIncompleteRows` and
Sep 1, 2011
98
`dropEmptyRows`, deprecated those.
99
- `Series` arithmetic methods with optional fill_value for missing data,
Sep 1, 2011
100
e.g. a.add(b, fill_value=0). If a location is missing for both it will still
101
be missing in the result though.
102
- fill_value option has been added to `DataFrame`.{add, mul, sub, div} methods
Sep 1, 2011
103
similar to `Series`
104
- Boolean indexing with `DataFrame` objects: data[data > 0.1] = 0.1 or
Sep 1, 2011
105
data[data> other] = 1.
106
- `pytz` / tzinfo support in `DateRange`
Sep 1, 2011
107
108
* `tz_localize`, `tz_normalize`, and `tz_validate` methods added
109
110
- Added `ExcelFile` class to `pandas.io.parsers` for parsing multiple sheets out
Sep 1, 2011
111
of a single Excel 2003 document
112
- `GroupBy` aggregations can now optionally *broadcast*, e.g. produce an object
Sep 1, 2011
113
of the same size with the aggregated value propagated
114
- Added `select` function in all data structures: reindex axis based on
Sep 1, 2011
115
arbitrary criterion (function returning boolean value),
116
e.g. frame.select(lambda x: 'foo' in x, axis=1)
117
- `DataFrame.consolidate` method, API function relating to redesigned internals
118
- `DataFrame.insert` method for inserting column at a specified location rather
Sep 1, 2011
119
than the default __setitem__ behavior (which puts it at the end)
120
- `HDFStore` class in `pandas.io.pytables` has been largely rewritten using
Sep 1, 2011
121
patches from Jeff Reback from others. It now supports mixed-type `DataFrame`
122
and `Series` data and can store `Panel` objects. It also has the option to
123
query `DataFrame` and `Panel` data. Loading data from legacy `HDFStore`
Sep 1, 2011
124
files is supported explicitly in the code
125
- Added `set_printoptions` method to modify appearance of DataFrame tabular
Sep 1, 2011
126
output
127
- `rolling_quantile` functions; a moving version of `Series.quantile` /
Sep 1, 2011
128
`DataFrame.quantile`
129
- Generic `rolling_apply` moving window function
130
- New `drop` method added to `Series`, `DataFrame`, etc. which can drop a set of
Sep 1, 2011
131
labels from an axis, producing a new object
132
- `reindex` methods now sport a `copy` option so that data is not forced to be
Sep 1, 2011
133
copied then the resulting object is indexed the same
134
- Added `sort_index` methods to Series and Panel. Renamed `DataFrame.sort`
Sep 1, 2011
135
to `sort_index`. Leaving `DataFrame.sort` for now.
136
- Added ``skipna`` option to statistical instance methods on all the data
137
structures
138
- `pandas.io.data` module providing a consistent interface for reading time
139
series data from several different sources
140
141
**Improvements to existing features**
143
* The 2-dimensional `DataFrame` and `DataMatrix` classes have been extensively
144
redesigned internally into a single class `DataFrame`, preserving where
145
possible their optimal performance characteristics. This should reduce
146
confusion from users about which class to use.
147
148
* Note that under the hood there is a new essentially "lazy evaluation"
149
scheme within respect to adding columns to DataFrame. During some
150
operations, like-typed blocks will be "consolidated" but not before.
151
152
* `DataFrame` accessing columns repeatedly is now significantly faster than
153
`DataMatrix` used to be in 0.3.0 due to an internal Series caching mechanism
154
(which are all views on the underlying data)
155
* Column ordering for mixed type data is now completely consistent in
156
`DataFrame`. In prior releases, there was inconsistent column ordering in
157
`DataMatrix`
158
* Improved console / string formatting of DataMatrix with negative numbers
159
* Improved tabular data parsing functions, `read_table` and `read_csv`:
160
161
* Added `skiprows` and `na_values` arguments to `pandas.io.parsers` functions
162
for more flexible IO
163
* `parseCSV` / `read_csv` functions and others in `pandas.io.parsers` now can
164
take a list of custom NA values, and also a list of rows to skip
165
166
* Can slice `DataFrame` and get a view of the data (when homogeneously typed),
167
e.g. frame.xs(idx, copy=False) or frame.ix[idx]
168
* Many speed optimizations throughout `Series` and `DataFrame`
169
* Eager evaluation of groups when calling ``groupby`` functions, so if there is
170
an exception with the grouping function it will raised immediately versus
171
sometime later on when the groups are needed
172
* `datetools.WeekOfMonth` offset can be parameterized with `n` different than 1
173
or -1.
174
* Statistical methods on DataFrame like `mean`, `std`, `var`, `skew` will now
175
ignore non-numerical data. Before a not very useful error message was
176
generated. A flag `numeric_only` has been added to `DataFrame.sum` and
177
`DataFrame.count` to enable this behavior in those methods if so desired
178
(disabled by default)
179
* `DataFrame.pivot` generalized to enable pivoting multiple columns into a
180
`DataFrame` with hierarhical columns
181
* `DataFrame` constructor can accept structured / record arrays
182
* `Panel` constructor can accept a dict of DataFrame-like objects. Do not
183
need to use `from_dict` anymore (`from_dict` is there to stay, though).
187
* The `DataMatrix` variable now refers to `DataFrame`, will be removed within
188
two releases
189
* `WidePanel` is now known as `Panel`. The `WidePanel` variable in the pandas
190
namespace now refers to the renamed `Panel` class
191
* `LongPanel` and `Panel` / `WidePanel` now no longer have a common
192
subclass. `LongPanel` is now a subclass of `DataFrame` having a number of
193
additional methods and a hierarchical index instead of the old
194
`LongPanelIndex` object, which has been removed. Legacy `LongPanel` pickles
195
may not load properly
196
* Cython is now required to build `pandas` from a development branch. This was
197
done to avoid continuing to check in cythonized C files into source
198
control. Builds from released source distributions will not require Cython
199
* Cython code has been moved up to a top level `pandas/src` directory. Cython
200
extension modules have been renamed and promoted from the `lib` subpackage to
201
the top level, i.e.
202
203
* `pandas.lib.tseries` -> `pandas._tseries`
204
* `pandas.lib.sparse` -> `pandas._sparse`
205
206
* `DataFrame` pickling format has changed. Backwards compatibility for legacy
207
pickles is provided, but it's recommended to consider PyTables-based
208
`HDFStore` for storing data with a longer expected shelf life
209
* A `copy` argument has been added to the `DataFrame` constructor to avoid
210
unnecessary copying of data. Data is no longer copied by default when passed
211
into the constructor
212
* Handling of boolean dtype in `DataFrame` has been improved to support storage
213
of boolean data with NA / NaN values. Before it was being converted to float64
214
so this should not (in theory) cause API breakage
215
* To optimize performance, Index objects now only check that their labels are
216
unique when uniqueness matters (i.e. when someone goes to perform a
217
lookup). This is a potentially dangerous tradeoff, but will lead to much
218
better performance in many places (like groupby).
219
* Boolean indexing using Series must now have the same indices (labels)
220
* Backwards compatibility support for begin/end/nPeriods keyword arguments in
221
DateRange class has been removed
222
* More intuitive / shorter filling aliases `ffill` (for `pad`) and `bfill` (for
223
`backfill`) have been added to the functions that use them: `reindex`,
224
`asfreq`, `fillna`.
225
* `pandas.core.mixins` code moved to `pandas.core.generic`
226
* `buffer` keyword arguments (e.g. `DataFrame.toString`) renamed to `buf` to
227
avoid using Python built-in name
228
* `DataFrame.rows()` removed (use `DataFrame.index`)
229
* Added deprecation warning to `DataFrame.cols()`, to be removed in next release
230
* `DataFrame` deprecations and de-camelCasing: `merge`, `asMatrix`,
231
`toDataMatrix`, `_firstTimeWithValue`, `_lastTimeWithValue`, `toRecords`,
232
`fromRecords`, `tgroupby`, `toString`
233
* `pandas.io.parsers` method deprecations
234
235
* `parseCSV` is now `read_csv` and keyword arguments have been de-camelCased
236
* `parseText` is now `read_table`
237
* `parseExcel` is replaced by the `ExcelFile` class and its `parse` method
238
239
* `fillMethod` arguments (deprecated in prior release) removed, should be
240
replaced with `method`
241
* `Series.fill`, `DataFrame.fill`, and `Panel.fill` removed, use `fillna`
242
instead
243
* `groupby` functions now exclude NA / NaN values from the list of groups. This
244
matches R behavior with NAs in factors e.g. with the `tapply` function
245
* Removed `parseText`, `parseCSV` and `parseExcel` from pandas namespace
246
* `Series.combineFunc` renamed to `Series.combine` and made a bit more general
247
with a `fill_value` keyword argument defaulting to NaN
248
* Removed `pandas.core.pytools` module. Code has been moved to
249
`pandas.core.common`
250
* Tacked on `groupName` attribute for groups in GroupBy renamed to `name`
251
* Panel/LongPanel `dims` attribute renamed to `shape` to be more conformant
252
* Slicing a `Series` returns a view now
253
* More Series deprecations / renaming: `toCSV` to `to_csv`, `asOf` to `asof`,
254
`merge` to `map`, `applymap` to `apply`, `toDict` to `to_dict`,
255
`combineFirst` to `combine_first`. Will print `FutureWarning`.
256
* `DataFrame.to_csv` does not write an "index" column label by default
257
anymore since the output file can be read back without it. However, there
258
is a new ``index_label`` argument. So you can do ``index_label='index'`` to
259
emulate the old behavior
260
* `datetools.Week` argument renamed from `dayOfWeek` to `weekday`
261
* `timeRule` argument in `shift` has been deprecated in favor of using the
262
`offset` argument for everything. So you can still pass a time rule string
263
to `offset`
267
* Column ordering in `pandas.io.parsers.parseCSV` will match CSV in the presence
268
of mixed-type data
269
* Fixed handling of Excel 2003 dates in `pandas.io.parsers`
270
* `DateRange` caching was happening with high resolution `DateOffset` objects,
271
e.g. `DateOffset(seconds=1)`. This has been fixed
272
* Fixed __truediv__ issue in `DataFrame`
273
* Fixed `DataFrame.toCSV` bug preventing IO round trips in some cases
274
* Fixed bug in `Series.plot` causing matplotlib to barf in exceptional cases
275
* Disabled `Index` objects from being hashable, like ndarrays
276
* Added `__ne__` implementation to `Index` so that operations like ts[ts != idx]
277
will work
278
* Added `__ne__` implementation to `DataFrame`
279
* Bug / unintuitive result when calling `fillna` on unordered labels
280
* Bug calling `sum` on boolean DataFrame
281
* Bug fix when creating a DataFrame from a dict with scalar values
282
* Series.{sum, mean, std, ...} now return NA/NaN when the whole Series is NA
283
* NumPy 1.4 through 1.6 compatibility fixes
284
* Fixed bug in bias correction in `rolling_cov`, was affecting `rolling_corr`
285
too
Sep 8, 2011
286
* R-square value was incorrect in the presence of fixed and time effects in
287
the `PanelOLS` classes
288
* `HDFStore` can handle duplicates in table format, will take
Sep 1, 2011
289
Sep 1, 2011
292
- Joon Ro
293
- Michael Pennington
294
- Chris Uga
295
- Chris Withers
296
- Jeff Reback
297
- Ted Square
298
- Craig Austin
299
- William Ferreira
300
- Daniel Fortunov
301
- Tony Roberts
302
- Martin Felder
303
- John Marino
304
- Tim McNamara
305
- Justin Berka
306
- Dieter Vandenbussche
307
- Shane Conway
308
- Skipper Seabold
309
- Chris Jordan-Squire
Feb 19, 2011
314
This major release of pandas represents approximately 1 year of continuous
315
development work and brings with it many new features, bug fixes, speed
316
enhancements, and general quality-of-life improvements. The most significant
317
change from the 0.2 release has been the completion of a rigorous unit test
318
suite covering all of the core functionality.
Feb 19, 2011
320
What is it
Feb 19, 2011
322
323
**pandas** is a library of labeled data structures, statistical models, and
324
general code for working with time series and cross-sectional data. It was
325
designed with the practical needs of statistical modeling and large,
326
inhomogeneous data sets in mind.
327
328
Where to get it
Feb 19, 2011
330
331
Source code: http://github.com/wesm/pandas
332
Binary installers on PyPI: http://pypi.python.org/pypi/pandas
333
Documentation: http://pandas.sourceforge.net
334
Aug 27, 2011
335
Release notes
Feb 19, 2011
337
338
**Release date:** February 20, 2011
339
340
**New features / modules**
341
Nov 24, 2010
342
* DataFrame / DataMatrix classes
Nov 24, 2010
344
* `corrwith` function to compute column- or row-wise correlations between two
345
objects
Feb 19, 2011
346
* Can boolean-index DataFrame objects, e.g. df[df > 2] = 2, px[px > last_px] = 0
347
* Added comparison magic methods (__lt__, __gt__, etc.)
348
* Flexible explicit arithmetic methods (add, mul, sub, div, etc.)
349
* Added `reindex_like` method
Nov 24, 2010
350
Feb 19, 2011
351
* WidePanel
Feb 19, 2011
353
* Added `reindex_like` method
354
355
* `pandas.io`: IO utilities
357
* `pandas.io.sql` module
358
359
* Convenience functions for accessing SQL-like databases
360
Feb 19, 2011
361
* `pandas.io.pytables` module
Feb 19, 2011
363
* Added (still experimental) HDFStore class for storing pandas data
364
structures using HDF5 / PyTables
365
366
* `pandas.core.datetools`
367
368
* Added WeekOfMonth date offset
Feb 19, 2011
369
370
* `pandas.rpy` (experimental) module created, provide some interfacing /
371
conversion between rpy2 and pandas
Feb 19, 2011
375
* Unit test coverage: 100% line coverage of core data structures
377
* Speed enhancement to rolling_{median, max, min}
Feb 19, 2011
379
* Column ordering between DataFrame and DataMatrix is now consistent: before
380
DataFrame would not respect column order
Feb 19, 2011
382
* Improved {Series, DataFrame}.plot methods to be more flexible (can pass
383
matplotlib Axis arguments, plot DataFrame columns in multiple subplots, etc.)
387
* Exponentially-weighted moment functions in `pandas.stats.moments`
388
have a more consistent API and accept a min_periods argument like
389
their regular moving counterparts.
Nov 24, 2010
391
* **fillMethod** argument in Series, DataFrame changed to **method**,
392
`FutureWarning` added.
393
Feb 19, 2011
394
* **fill** method in Series, DataFrame/DataMatrix, WidePanel renamed to
395
**fillna**, `FutureWarning` added to **fill**
396
Feb 19, 2011
397
* Renamed **DataFrame.getXS** to **xs**, `FutureWarning` added
Feb 19, 2011
399
* Removed **cap** and **floor** functions from DataFrame, renamed to
400
**clip_upper** and **clip_lower** for consistency with NumPy
404
* Fixed bug in IndexableSkiplist Cython code that was breaking
405
rolling_max function
Feb 19, 2011
407
* Numerous numpy.int64-related indexing fixes
Feb 19, 2011
409
* Several NumPy 1.4.0 NaN-handling fixes
Feb 19, 2011
411
* Bug fixes to pandas.io.parsers.parseCSV
Feb 19, 2011
413
* Fixed `DateRange` caching issue with unusual date offsets
Feb 19, 2011
415
* Fixed bug in `DateRange.union`
Feb 19, 2011
417
* Fixed corner case in `IndexableSkiplist` implementation