Permalink
Newer
Older
100644 1062 lines (906 sloc) 46.1 KB
1
=============
2
Release Notes
3
=============
5
This is the list of changes to pandas between each release. For full details,
6
see the commit logs at http://github.com/wesm/pandas
7
Oct 19, 2011
8
What is it
9
----------
10
Oct 21, 2011
11
pandas is a Python package providing fast, flexible, and expressive data
12
structures designed to make working with “relational” or “labeled” data both
13
easy and intuitive. It aims to be the fundamental high-level building block for
14
doing practical, real world data analysis in Python. Additionally, it has the
15
broader goal of becoming the most powerful and flexible open source data
16
analysis / manipulation tool available in any language.
Oct 19, 2011
17
18
Where to get it
19
---------------
20
21
* Source code: http://github.com/wesm/pandas
22
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
23
* Documentation: http://pandas.sourceforge.net
Oct 19, 2011
24
25
pandas 0.6.1
26
============
27
28
**Release date:** Not yet released
29
30
**API Changes**
31
32
- Rename `names` argument in DataFrame.from_records to `columns`. Add
33
deprecation warning
34
- Boolean get/set operations on Series with boolean Series will reindex
35
instead of requiring that the indexes be exactly equal (GH #429)
37
**New features / modules**
38
39
- Can pass Series to DataFrame.append with ignore_index=True for appending a
40
single row (GH #430)
41
- Add Spearman and Kendall correlation options to Series.corr and
42
DataFrame.corr (GH #428)
Dec 5, 2011
43
- Add new `get_value` and `set_value` methods to Series, DataFrame, and Panel
44
to very low-overhead access to scalar elements. df.get_value(row, column)
45
is about 3x faster than df[column][row] by handling fewer cases (GH #437,
46
#438). Add similar methods to sparse data structures for compatibility
Dec 8, 2011
47
- Add Qt table widget to sandbox (PR #435)
48
- DataFrame.align can accept Series arguments, add axis keyword (GH #461)
49
- Implement new SparseList and SparseArray data structures. SparseSeries now
50
derives from SparseArray (GH #463)
Dec 11, 2011
51
- max_columns / max_rows options in set_printoptions (PR #453)
52
- Implement Series.rank and DataFrame.rank, fast versions of
53
scipy.stats.rankdata (GH #428)
54
- Implement DataFrame.from_items alternate constructor (GH #444)
55
- DataFrame.convert_objects method for inferring better dtypes for object
56
columns (GH #302)
57
- Add rolling_corr_pairwise function for computing Panel of correlation
58
matrices (GH #189)
59
- Add `margins` option to `pivot_table` for computing subgroup aggregates (GH
60
#114)
Dec 8, 2011
61
Dec 2, 2011
62
**Improvements to existing features**
Dec 8, 2011
63
Dec 2, 2011
64
- Improve memory usage of `DataFrame.describe` (do not copy data
65
unnecessarily) (PR #425)
66
- Use same formatting function for outputting floating point Series to console
67
as in DataFrame (PR #420)
68
- DataFrame.delevel will try to infer better dtype for new columns (GH #440)
69
- Exclude non-numeric types in DataFrame.{corr, cov}
70
- Override Index.astype to enable dtype casting (GH #412)
Dec 8, 2011
71
- Use same float formatting function for Series.__repr__ (PR #420)
72
- Use available console width to output DataFrame columns (PR #453)
73
- Accept ndarrays when setting items in Panel (GH #452)
Dec 11, 2011
74
- Infer console width when printing __repr__ of DataFrame to console (PR
75
#453)
76
- Optimize scalar value lookups in the general case by 25% or more in Series
77
and DataFrame
78
- Can pass DataFrame/DataFrame and DataFrame/Series to
79
rolling_corr/rolling_cov (GH #462)
Dec 13, 2011
80
- Fix performance regression in cross-sectional count in DataFrame, affecting
81
DataFrame.dropna speed
82
- Column deletion in DataFrame copies no data (computes views on blocks) (GH
83
#158)
84
- MultiIndex.get_level_values can take the level name
Dec 2, 2011
85
88
- Fix O(K^2) memory leak caused by inserting many columns without
89
consolidating, had been present since 0.4.0 (GH #467)
90
- `DataFrame.count` should return Series with zero instead of NA with length-0
91
axis (GH #423)
Dec 2, 2011
92
- Fix Yahoo! Finance API usage in pandas.io.data (GH #419, PR #427)
93
- Fix upstream bug causing failure in Series.align with empty Series (GH #434)
94
- Function passed to DataFrame.apply can return a list, as long as it's the
95
right length. Regression from 0.4 (GH #432)
96
- Don't "accidentally" upcast scalar values when indexing using .ix (GH #431)
97
- Fix groupby exception raised with as_index=False and single column selected
98
(GH #421)
99
- Implement DateOffset.__ne__ causing downstream bug (GH #456)
100
- Fix __doc__-related issue when converting py -> pyo with py2exe
Dec 11, 2011
101
- Bug fix in left join Cython code with duplicate monotonic labels
102
- Fix bug when unstacking multiple levels described in #451
103
- Exclude NA values in dtype=object arrays, regression from 0.5.0 (GH #469)
104
- Use Cython map_infer function in DataFrame.applymap to properly infer
105
output type, handle tuple return values and other things that were breaking
106
(GH #465)
107
- Handle floating point index values in HDFStore (GH #454)
108
- Fixed stale column reference bug (cached Series object) caused by type
109
change / item deletion in DataFrame (GH #473)
110
- Index.get_loc should always raise Exception when there are duplicates
Dec 12, 2011
111
- Handle differently-indexed Series input to DataFrame constructor (GH #475)
112
- Omit nuisance columns in multi-groupby with Python function
Dec 13, 2011
113
- Buglet in handling of single grouping in general apply
Dec 2, 2011
115
Thanks
116
------
117
- Ralph Bean
Dec 13, 2011
118
- Luca Beltrame
119
- Marius Cobzarenco
120
- Andreas Hilboll
Dec 8, 2011
121
- Jev Kuznetsov
Dec 13, 2011
122
- Adam Lichtenstein
Dec 8, 2011
124
- Fernando Perez
Dec 11, 2011
126
- Christian Prinoth
Dec 12, 2011
127
- Alex Reyfman
134
============
135
136
**Release date:** 11/25/2011
138
**API Changes**
139
140
- Arithmetic methods like `sum` will attempt to sum dtype=object values by
141
default instead of excluding them (GH #382)
142
143
**New features / modules**
144
145
- Add `melt` function to `pandas.core.reshape`
146
- Add `level` parameter to group by level in Series and DataFrame
147
descriptive statistics (PR #313)
148
- Add `head` and `tail` methods to Series, analogous to to DataFrame (PR
149
#296)
150
- Add `Series.isin` function which checks if each value is contained in a
151
passed sequence (GH #289)
152
- Add `float_format` option to `Series.to_string`
153
- Add `skip_footer` (GH #291) and `converters` (GH #343) options to
154
`read_csv` and `read_table`
155
- Add proper, tested weighted least squares to standard and panel OLS (GH
156
#303)
157
- Add `drop_duplicates` and `duplicated` functions for removing duplicate
158
DataFrame rows and checking for duplicate rows, respectively (GH #319)
159
- Implement logical (boolean) operators &, |, ^ on DataFrame (GH #347)
160
- Add `Series.mad`, mean absolute deviation, matching DataFrame
161
- Add `QuarterEnd` DateOffset (PR #321)
162
- Add matrix multiplication function `dot` to DataFrame (GH #65)
163
- Add `orient` option to `Panel.from_dict` to ease creation of mixed-type
165
- Add `DataFrame.from_dict` with similar `orient` option
166
- Can now pass list of tuples or list of lists to `DataFrame.from_records`
167
for fast conversion to DataFrame (GH #357)
168
- Can pass multiple levels to groupby, e.g. `df.groupby(level=[0, 1])` (GH
169
#103)
170
- Can sort by multiple columns in `DataFrame.sort_index` (GH #92, PR #362)
171
- Add fast `get_value` and `put_value` methods to DataFrame and
172
micro-performance tweaks (GH #360)
173
- Add `cov` instance methods to Series and DataFrame (GH #194, PR #362)
174
- Add bar plot option to `DataFrame.plot` (PR #348)
Nov 14, 2011
175
- Add `idxmin` and `idxmax` functions to Series and DataFrame for computing
176
index labels achieving maximum and minimum values (PR #286)
177
- Add `read_clipboard` function for parsing DataFrame from OS clipboard,
178
should work across platforms (GH #300)
179
- Add `nunique` function to Series for counting unique elements (GH #297)
180
- DataFrame constructor will use Series name if no columns passed (GH #373)
181
- Support regular expressions and longer delimiters in read_table/read_csv,
182
but does not handle quoted strings yet (GH #364)
183
- Add `DataFrame.to_html` for formatting DataFrame to HTML (PR #387)
184
- MaskedArray can be passed to DataFrame constructor and masked values will be
185
converted to NaN (PR #396)
186
- Add `DataFrame.boxplot` function (GH #368, others)
187
- Can pass extra args, kwds to DataFrame.apply (GH #376)
189
**Improvements to existing features**
190
191
- Raise more helpful exception if date parsing fails in DateRange (GH #298)
192
- Vastly improved performance of GroupBy on axes with a MultiIndex (GH #299)
193
- Print level names in hierarchical index in Series repr (GH #305)
194
- Return DataFrame when performing GroupBy on selected column and
195
as_index=False (GH #308)
196
- Can pass vector to `on` argument in `DataFrame.join` (GH #312)
197
- Don't show Series name if it's None in the repr, also omit length for short
198
Series (GH #317)
199
- Show legend by default in `DataFrame.plot`, add `legend` boolean flag (GH
200
#324)
201
- Significantly improved performance of `Series.order`, which also makes
202
np.unique called on a Series faster (GH #327)
203
- Faster cythonized count by level in Series and DataFrame (GH #341)
204
- Raise exception if dateutil 2.0 installed on Python 2.x runtime (GH #346)
205
- Significant GroupBy performance enhancement with multiple keys with many
206
"empty" combinations
207
- New Cython vectorized function `map_infer` speeds up `Series.apply` and
208
`Series.map` significantly when passed elementwise Python function,
209
motivated by PR #355
210
- Cythonized `cache_readonly`, resulting in substantial micro-performance
211
enhancements throughout the codebase (GH #361)
212
- Special Cython matrix iterator for applying arbitrary reduction operations
213
with 3-5x better performance than `np.apply_along_axis` (GH #309)
214
- Add `raw` option to `DataFrame.apply` for getting better performance when
215
the passed function only requires an ndarray (GH #309)
216
- Improve performance of `MultiIndex.from_tuples`
217
- Can pass multiple levels to `stack` and `unstack` (GH #370)
Nov 18, 2011
218
- Can pass multiple values columns to `pivot_table` (GH #381)
219
- Can call `DataFrame.delevel` with standard Index with name set (GH #393)
220
- Use Series name in GroupBy for result index (GH #363)
221
- Refactor Series/DataFrame stat methods to use common set of NaN-friendly
222
function
223
- Handle NumPy scalar integers at C level in Cython conversion routines
225
**Bug fixes**
226
227
- Fix bug in `DataFrame.to_csv` when writing a DataFrame with an index
228
name (GH #290)
229
- DataFrame should clear its Series caches on consolidation, was causing
230
"stale" Series to be returned in some corner cases (GH #304)
231
- DataFrame constructor failed if a column had a list of tuples (GH #293)
232
- Ensure that `Series.apply` always returns a Series and implement
233
`Series.round` (GH #314)
234
- Support boolean columns in Cythonized groupby functions (GH #315)
235
- `DataFrame.describe` should not fail if there are no numeric columns,
236
instead return categorical describe (GH #323)
237
- Fixed bug which could cause columns to be printed in wrong order in
238
`DataFrame.to_string` if specific list of columns passed (GH #325)
239
- Fix legend plotting failure if DataFrame columns are integers (GH #326)
240
- Shift start date back by one month for Yahoo! Finance API in pandas.io.data
241
(GH #329)
242
- Fix `DataFrame.join` failure on unconsolidated inputs (GH #331)
243
- DataFrame.min/max will no longer fail on mixed-type DataFrame (GH #337)
244
- Fix `read_csv` / `read_table` failure when passing list to index_col that is
245
not in ascending order (GH #349)
246
- Fix failure passing Int64Index to Index.union when both are monotonic
247
- Fix error when passing SparseSeries to (dense) DataFrame constructor
248
- Added missing bang at top of setup.py (GH #352)
249
- Change `is_monotonic` on MultiIndex so it properly compares the tuples
250
- Fix MultiIndex outer join logic (GH #351)
251
- Set index name attribute with single-key groupby (GH #358)
252
- Bug fix in reflexive binary addition in Series and DataFrame for
253
non-commutative operations (like string concatenation) (GH #353)
254
- setupegg.py will invoke Cython (GH #192)
255
- Fix block consolidation bug after inserting column into MultiIndex (GH #366)
256
- Fix bug in join operations between Index and Int64Index (GH #367)
257
- Handle min_periods=0 case in moving window functions (GH #365)
258
- Fixed corner cases in DataFrame.apply/pivot with empty DataFrame (GH #378)
259
- Fixed repr exception when Series name is a tuple
260
- Always return DateRange from `asfreq` (GH #390)
261
- Pass level names to `swaplavel` (GH #379)
262
- Don't lose index names in `MultiIndex.droplevel` (GH #394)
Nov 22, 2011
263
- Infer more proper return type in `DataFrame.apply` when no columns or rows
264
depending on whether the passed function is a reduction (GH #389)
265
- Always return NA/NaN from Series.min/max and DataFrame.min/max when all of a
266
row/column/values are NA (GH #384)
267
- Enable partial setting with .ix / advanced indexing (GH #397)
268
- Handle mixed-type DataFrames correctly in unstack, do not lose type
269
information (GH #403)
270
- Fix integer name formatting bug in Index.format and in Series.__repr__
271
- Handle label types other than string passed to groupby (GH #405)
272
- Fix bug in .ix-based indexing with partial retrieval when a label is not
273
contained in a level
274
- Index name was not being pickled (GH #408)
275
- Level name should be passed to result index in GroupBy.apply (GH #416)
276
277
Thanks
278
------
279
Nov 18, 2011
280
- Craig Austin
281
- Marius Cobzarenco
283
- Jeff Hammerbacher
284
- Adam Klein
286
- Jev Kuznetsov
287
- Kieran O'Mahony
288
- Wouter Overmeire
289
- Nathan Pinger
290
- Christian Prinoth
291
- Skipper Seabold
292
- Chang She
Nov 18, 2011
293
- Ted Square
294
- Aman Thakral
296
- Dieter Vandenbussche
297
- carljv
300
pandas 0.5.0
Oct 25, 2011
303
**Release date:** 10/24/2011
Oct 15, 2011
305
This release of pandas includes a number of API changes (see below) and cleanup
Oct 19, 2011
306
of deprecated APIs from pre-0.4.0 releases. There are also bug fixes, new
307
features, numerous significant performance enhancements, and includes a new
308
IPython completer hook to enable tab completion of DataFrame columns accesses
309
as attributes (a new feature).
Oct 15, 2011
310
311
In addition to the changes listed here from 0.4.3 to 0.5.0, the minor releases
312
0.4.1, 0.4.2, and 0.4.3 brought some significant new functionality and
313
performance improvements that are worth taking a look at.
314
Oct 19, 2011
315
Thanks to all for bug reports, contributed patches and generally providing
Oct 15, 2011
316
feedback on the library.
Oct 14, 2011
317
318
**API Changes**
319
320
- `read_table`, `read_csv`, and `ExcelFile.parse` default arguments for
321
`index_col` is now None. To use one or more of the columns as the resulting
322
DataFrame's index, these must be explicitly specified now
323
- Parsing functions like `read_csv` no longer parse dates by default (GH
324
#225)
Oct 14, 2011
325
- Removed `weights` option in panel regression which was not doing anything
Oct 19, 2011
326
principled (GH #155)
327
- Changed `buffer` argument name in `Series.to_string` to `buf`
328
- `Series.to_string` and `DataFrame.to_string` now return strings by default
329
instead of printing to sys.stdout
330
- Deprecated `nanRep` argument in various `to_string` and `to_csv` functions
331
in favor of `na_rep`. Will be removed in 0.6 (GH #275)
332
- Renamed `delimiter` to `sep` in `DataFrame.from_csv` for consistency
333
- Changed order of `Series.clip` arguments to match those of `numpy.clip` and
334
added (unimplemented) `out` argument so `numpy.clip` can be called on a
335
Series (GH #272)
336
- Series functions renamed (and thus deprecated) in 0.4 series have been
337
removed:
338
339
* `asOf`, use `asof`
340
* `toDict`, use `to_dict`
341
* `toString`, use `to_string`
342
* `toCSV`, use `to_csv`
343
* `merge`, use `map`
344
* `applymap`, use `apply`
345
* `combineFirst`, use `combine_first`
346
* `_firstTimeWithValue` use `first_valid_index`
347
* `_lastTimeWithValue` use `last_valid_index`
348
349
- DataFrame functions renamed / deprecated in 0.4 series have been removed:
350
351
* `asMatrix` method, use `as_matrix` or `values` attribute
352
* `combineFirst`, use `combine_first`
353
* `getXS`, use `xs`
354
* `merge`, use `join`
355
* `fromRecords`, use `from_records`
356
* `fromcsv`, use `from_csv`
357
* `toRecords`, use `to_records`
358
* `toDict`, use `to_dict`
359
* `toString`, use `to_string`
360
* `toCSV`, use `to_csv`
361
* `_firstTimeWithValue` use `first_valid_index`
362
* `_lastTimeWithValue` use `last_valid_index`
363
* `toDataMatrix` is no longer needed
364
* `rows()` method, use `index` attribute
365
* `cols()` method, use `columns` attribute
366
* `dropEmptyRows()`, use `dropna(how='all')`
367
* `dropIncompleteRows()`, use `dropna()`
368
* `tapply(f)`, use `apply(f, axis=1)`
369
* `tgroupby(keyfunc, aggfunc)`, use `groupby` with `axis=1`
370
371
- Other outstanding deprecations have been removed:
372
373
* `indexField` argument in `DataFrame.from_records`
374
* `missingAtEnd` argument in `Series.order`. Use `na_last` instead
375
* `Series.fromValue` classmethod, use regular `Series` constructor instead
376
* Functions `parseCSV`, `parseText`, and `parseExcel` methods in
377
`pandas.io.parsers` have been removed
378
* `Index.asOfDate` function
379
* `Panel.getMinorXS` (use `minor_xs`) and `Panel.getMajorXS` (use
380
`major_xs`)
381
* `Panel.toWide`, use `Panel.to_wide` instead
383
**New features / modules**
384
Oct 12, 2011
385
- Added `DataFrame.align` method with standard join options
386
- Added `parse_dates` option to `read_csv` and `read_table` methods to
387
optionally try to parse dates in the index columns
Oct 21, 2011
388
- Add `nrows`, `chunksize`, and `iterator` arguments to `read_csv` and
389
`read_table`. The last two return a new `TextParser` class capable of
390
lazily iterating through chunks of a flat file (GH #242)
391
- Added ability to join on multiple columns in `DataFrame.join` (GH #214)
392
- Added private `_get_duplicates` function to `Index` for identifying
393
duplicate values more easily
394
- Added column attribute access to DataFrame, e.g. df.A equivalent to df['A']
395
if 'A' is a column in the DataFrame (PR #213)
396
- Added IPython tab completion hook for DataFrame columns. (PR #233, GH #230)
Oct 19, 2011
397
- Implement `Series.describe` for Series containing objects (PR #241)
398
- Add inner join option to `DataFrame.join` when joining on key(s) (GH #248)
399
- Can select set of DataFrame columns by passing a list to `__getitem__` (GH
400
#253)
Oct 21, 2011
401
- Can use & and | to intersection / union Index objects, respectively (GH
402
#261)
403
- Added `pivot_table` convenience function to pandas namespace (GH #234)
404
- Implemented `Panel.rename_axis` function (GH #243)
405
- DataFrame will show index level names in console output
Oct 24, 2011
407
- Add `set_eng_float_format` function for setting alternate DataFrame
408
floating point string formatting
409
- Add convenience `set_index` function for creating a DataFrame index from
410
its existing columns
412
**Improvements to existing features**
413
Oct 16, 2011
414
- Major performance improvements in file parsing functions `read_csv` and
415
`read_table`
416
- Added Cython function for converting tuples to ndarray very fast. Speeds up
417
many MultiIndex-related operations
418
- File parsing functions like `read_csv` and `read_table` will explicitly
419
check if a parsed index has duplicates and raise a more helpful exception
420
rather than deferring the check until later
421
- Refactored merging / joining code into a tidy class and disabled unnecessary
422
computations in the float/object case, thus getting about 10% better
Oct 19, 2011
423
performance (GH #211)
424
- Improved speed of `DataFrame.xs` on mixed-type DataFrame objects by about
Oct 19, 2011
425
5x, regression from 0.3.0 (GH #215)
Oct 12, 2011
426
- With new `DataFrame.align` method, speeding up binary operations between
427
differently-indexed DataFrame objects by 10-25%.
Oct 19, 2011
428
- Significantly sped up conversion of nested dict into DataFrame (GH #212)
429
- Can pass hierarchical index level name to `groupby` instead of the level
430
number if desired (GH #223)
Oct 19, 2011
431
- Add support for different delimiters in `DataFrame.to_csv` (PR #244)
Oct 19, 2011
432
- Add more helpful error message when importing pandas post-installation from
433
the source directory (GH #250)
434
- Significantly speed up DataFrame `__repr__` and `count` on large mixed-type
435
DataFrame objects
436
- Better handling of pyx file dependencies in Cython module build (GH #271)
440
- `read_csv` / `read_table` fixes
441
- Be less aggressive about converting float->int in cases of floating point
442
representations of integers like 1.0, 2.0, etc.
443
- "True"/"False" will not get correctly converted to boolean
444
- Index name attribute will get set when specifying an index column
445
- Passing column names should force `header=None` (GH #257)
446
- Don't modify passed column names when `index_col` is not
447
None (GH #258)
448
- Can sniff CSV separator in zip file (since seek is not supported, was
449
failing before)
450
- Worked around matplotlib "bug" in which series[:, np.newaxis] fails. Should
451
be reported upstream to matplotlib (GH #224)
452
- DataFrame.iteritems was not returning Series with the name attribute
453
set. Also neither was DataFrame._series
454
- Can store datetime.date objects in HDFStore (GH #231)
455
- Index and Series names are now stored in HDFStore
Oct 19, 2011
456
- Fixed problem in which data would get upcasted to object dtype in
457
GroupBy.apply operations (GH #237)
458
- Fixed outer join bug with empty DataFrame (GH #238)
459
- Can create empty Panel (GH #239)
Oct 19, 2011
460
- Fix join on single key when passing list with 1 entry (GH #246)
461
- Don't raise Exception on plotting DataFrame with an all-NA column (GH #251,
462
PR #254)
463
- Bug min/max errors when called on integer DataFrames (PR #241)
464
- `DataFrame.iteritems` and `DataFrame._series` not assigning name attribute
465
- Panel.__repr__ raised exception on length-0 major/minor axes
Oct 19, 2011
466
- `DataFrame.join` on key with empty DataFrame produced incorrect columns
467
- Implemented `MultiIndex.diff` (GH #260)
468
- `Int64Index.take` and `MultiIndex.take` lost name field, fix downstream
469
issue GH #262
470
- Can pass list of tuples to `Series` (GH #270)
471
- Can pass level name to `DataFrame.stack`
472
- Support set operations between MultiIndex and Index
473
- Fix many corner cases in MultiIndex set operations
474
- Fix MultiIndex-handling bug with GroupBy.apply when returned groups are not
476
- Fix corner case bugs in DataFrame.apply
477
- Setting DataFrame index did not cause Series cache to get cleared
478
- Various int32 -> int64 platform-specific issues
479
- Don't be too aggressive converting to integer when parsing file with
480
MultiIndex (GH #285)
481
- Fix bug when slicing Series with negative indices before beginning
Oct 19, 2011
482
483
Thanks
484
------
485
486
- Thomas Kluyver
487
- Daniel Fortunov
488
- Aman Thakral
489
- Luca Beltrame
490
- Wouter Overmeire
492
pandas 0.4.3
493
============
494
Oct 19, 2011
495
Release notes
496
-------------
497
498
**Release date:** 10/9/2011
500
This is largely a bugfix release from 0.4.2 but also includes a handful of new
501
and enhanced features. Also, pandas can now be installed and used on Python 3
502
(thanks Thomas Kluyver!).
504
**New features / modules**
505
506
- Python 3 support using 2to3 (PR #200, Thomas Kluyver)
507
- Add `name` attribute to `Series` and added relevant logic and tests. Name
508
now prints as part of `Series.__repr__`
509
- Add `name` attribute to standard Index so that stacking / unstacking does
510
not discard names and so that indexed DataFrame objects can be reliably
511
round-tripped to flat files, pickle, HDF5, etc.
512
- Add `isnull` and `notnull` as instance methods on Series (PR #209, GH #203)
514
**Improvements to existing features**
515
516
- Skip xlrd-related unit tests if not installed
517
- `Index.append` and `MultiIndex.append` can accept a list of Index objects to
518
concatenate together
519
- Altered binary operations on differently-indexed SparseSeries objects to use
520
the integer-based (dense) alignment logic which is faster with a larger
521
number of blocks (GH #205)
522
- Refactored `Series.__repr__` to be a bit more clean and consistent
524
**API Changes**
525
526
- `Series.describe` and `DataFrame.describe` now bring the 25% and 75%
527
quartiles instead of the 10% and 90% deciles. The other outputs have not
528
changed
529
- `Series.toString` will print deprecation warning, has been de-camelCased to
530
`to_string`
531
532
**Bug fixes**
533
534
- Fix broken interaction between `Index` and `Int64Index` when calling
535
intersection. Implement `Int64Index.intersection`
536
- `MultiIndex.sortlevel` discarded the level names (GH #202)
537
- Fix bugs in groupby, join, and append due to improper concatenation of
538
`MultiIndex` objects (GH #201)
539
- Fix regression from 0.4.1, `isnull` and `notnull` ceased to work on other
540
kinds of Python scalar objects like `datetime.datetime`
541
- Raise more helpful exception when attempting to write empty DataFrame or
542
LongPanel to `HDFStore` (GH #204)
543
- Use stdlib csv module to properly escape strings with commas in
544
`DataFrame.to_csv` (PR #206, Thomas Kluyver)
545
- Fix Python ndarray access in Cython code for sparse blocked index integrity
546
check
547
- Fix bug writing Series to CSV in Python 3 (PR #209)
Oct 9, 2011
548
- Miscellaneous Python 3 bugfixes
549
550
Thanks
551
------
552
553
- Thomas Kluyver
554
- rsamson
556
pandas 0.4.2
557
============
558
Oct 19, 2011
559
Release notes
560
-------------
561
Oct 3, 2011
562
**Release date:** 10/3/2011
564
This is a performance optimization release with several bug fixes. The new
565
Int64Index and new merging / joining Cython code and related Python
566
infrastructure are the main new additions
567
568
**New features / modules**
569
570
- Added fast `Int64Index` type with specialized join, union,
571
intersection. Will result in significant performance enhancements for
572
int64-based time series (e.g. using NumPy's datetime64 one day) and also
573
faster operations on DataFrame objects storing record array-like data.
574
- Refactored `Index` classes to have a `join` method and associated data
575
alignment routines throughout the codebase to be able to leverage optimized
576
joining / merging routines.
577
- Added `Series.align` method for aligning two series with choice of join
578
method
579
- Wrote faster Cython data alignment / merging routines resulting in
580
substantial speed increases
581
- Added `is_monotonic` property to `Index` classes with associated Cython
582
code to evaluate the monotonicity of the `Index` values
583
- Add method `get_level_values` to `MultiIndex`
584
- Implemented shallow copy of `BlockManager` object in `DataFrame` internals
585
586
**Improvements to existing features**
587
588
- Improved performance of `isnull` and `notnull`, a regression from v0.3.0
589
(GH #187)
590
- Wrote templating / code generation script to auto-generate Cython code for
591
various functions which need to be available for the 4 major data types
592
used in pandas (float64, bool, object, int64)
593
- Refactored code related to `DataFrame.join` so that intermediate aligned
594
copies of the data in each `DataFrame` argument do not need to be
595
created. Substantial performance increases result (GH #176)
596
- Substantially improved performance of generic `Index.intersection` and
597
`Index.union`
598
- Improved performance of `DateRange.union` with overlapping ranges and
599
non-cacheable offsets (like Minute). Implemented analogous fast
600
`DateRange.intersection` for overlapping ranges.
601
- Implemented `BlockManager.take` resulting in significantly faster `take`
602
performance on mixed-type `DataFrame` objects (GH #104)
603
- Improved performance of `Series.sort_index`
604
- Significant groupby performance enhancement: removed unnecessary integrity
605
checks in DataFrame internals that were slowing down slicing operations to
606
retrieve groups
607
- Added informative Exception when passing dict to DataFrame groupby
608
aggregation with axis != 0
610
**API Changes**
611
612
None
613
614
**Bug fixes**
615
616
- Fixed minor unhandled exception in Cython code implementing fast groupby
617
aggregation operations
618
- Fixed bug in unstacking code manifesting with more than 3 hierarchical
619
levels
620
- Throw exception when step specified in label-based slice (GH #185)
621
- Fix isnull to correctly work with np.float32. Fix upstream bug described in
622
GH #182
623
- Finish implementation of as_index=False in groupby for DataFrame
624
aggregation (GH #181)
Oct 3, 2011
625
- Raise SkipTest for pre-epoch HDFStore failure. Real fix will be sorted out
626
via datetime64 dtype
628
Thanks
629
------
630
631
- Uri Laserson
632
- Scott Sinclair
633
634
pandas 0.4.1
635
============
636
Oct 19, 2011
637
Release notes
638
-------------
639
640
**Release date:** 9/25/2011
642
This is primarily a bug fix release but includes some new features and
643
improvements
645
**New features / modules**
646
647
- Added new `DataFrame` methods `get_dtype_counts` and property `dtypes`
648
- Setting of values using ``.ix`` indexing attribute in mixed-type DataFrame
649
objects has been implemented (fixes GH #135)
650
- `read_csv` can read multiple columns into a `MultiIndex`. DataFrame's
651
`to_csv` method will properly write out a `MultiIndex` which can be read
652
back (PR #151, thanks to Skipper Seabold)
653
- Wrote fast time series merging / joining methods in Cython. Will be
654
integrated later into DataFrame.join and related functions
655
- Added `ignore_index` option to `DataFrame.append` for combining unindexed
656
records stored in a DataFrame
657
658
**Improvements to existing features**
659
660
- Some speed enhancements with internal Index type-checking function
661
- `DataFrame.rename` has a new `copy` parameter which can rename a DataFrame
662
in place
663
- Enable unstacking by level name (PR #142)
664
- Enable sortlevel to work by level name (PR #141)
665
- `read_csv` can automatically "sniff" other kinds of delimiters using
666
`csv.Sniffer` (PR #146)
667
- Improved speed of unit test suite by about 40%
668
- Exception will not be raised calling `HDFStore.remove` on non-existent node
669
with where clause
670
- Optimized `_ensure_index` function resulting in performance savings in
671
type-checking Index objects
673
**API Changes**
674
675
None
676
679
- Fixed DataFrame constructor bug causing downstream problems (e.g. .copy()
680
failing) when passing a Series as the values along with a column name and
681
index
682
- Fixed single-key groupby on DataFrame with as_index=False (GH #160)
683
- `Series.shift` was failing on integer Series (GH #154)
684
- `unstack` methods were producing incorrect output in the case of duplicate
685
hierarchical labels. An exception will now be raised (GH #147)
686
- Calling `count` with level argument caused reduceat failure or segfault in
687
earlier NumPy (GH #169)
688
- Fixed `DataFrame.corrwith` to automatically exclude non-numeric data (GH
689
#144)
690
- Unicode handling bug fixes in `DataFrame.to_string` (GH #138)
691
- Excluding OLS degenerate unit test case that was causing platform specific
692
failure (GH #149)
693
- Skip blosc-dependent unit tests for PyTables < 2.2 (PR #137)
694
- Calling `copy` on `DateRange` did not copy over attributes to the new object
695
(GH #168)
696
- Fix bug in `HDFStore` in which Panel data could be appended to a Table with
697
different item order, thus resulting in an incorrect result read back
698
699
Thanks
700
------
701
- Yaroslav Halchenko
702
- Jeff Reback
703
- Skipper Seabold
704
- Dan Lovell
705
- Nick Pentreath
707
pandas 0.4.0
708
============
713
**Release date:** 9/12/2011
714
715
**New features / modules**
716
717
- `pandas.core.sparse` module: "Sparse" (mostly-NA, or some other fill value)
718
versions of `Series`, `DataFrame`, and `Panel`. For low-density data, this
Sep 1, 2011
719
will result in significant performance boosts, and smaller memory
720
footprint. Added `to_sparse` methods to `Series`, `DataFrame`, and
721
`Panel`. See online documentation for more on these
722
- Fancy indexing operator on Series / DataFrame, e.g. via .ix operator. Both
Sep 1, 2011
723
getting and setting of values is supported; however, setting values will only
724
currently work on homogeneously-typed DataFrame objects. Things like:
725
726
* series.ix[[d1, d2, d3]]
727
* frame.ix[5:10, ['C', 'B', 'A']], frame.ix[5:10, 'A':'C']
728
* frame.ix[date1:date2]
729
730
- Significantly enhanced `groupby` functionality
Sep 1, 2011
731
732
* Can groupby multiple keys, e.g. df.groupby(['key1', 'key2']). Iteration with
733
multiple groupings products a flattened tuple
734
* "Nuisance" columns (non-aggregatable) will automatically be excluded from
735
DataFrame aggregation operations
736
* Added automatic "dispatching to Series / DataFrame methods to more easily
737
invoke methods on groups. e.g. s.groupby(crit).std() will work even though
738
`std` is not implemented on the `GroupBy` class
739
Sep 6, 2011
740
- Hierarchical / multi-level indexing
741
742
* New the `MultiIndex` class. Integrated `MultiIndex` into `Series` and
743
`DataFrame` fancy indexing, slicing, __getitem__ and __setitem,
744
reindexing, etc. Added `level` keyword argument to `groupby` to enable
745
grouping by a level of a `MultiIndex`
746
747
- New data reshaping functions: `stack` and `unstack` on DataFrame and Series
748
749
* Integrate with MultiIndex to enable sophisticated reshaping of data
750
751
- `Index` objects (labels for axes) are now capable of holding tuples
752
- `Series.describe`, `DataFrame.describe`: produces an R-like table of summary
Sep 1, 2011
753
statistics about each data column
754
- `DataFrame.quantile`, `Series.quantile` for computing sample quantiles of data
Sep 1, 2011
755
across requested axis
756
- Added general `DataFrame.dropna` method to replace `dropIncompleteRows` and
Sep 1, 2011
757
`dropEmptyRows`, deprecated those.
758
- `Series` arithmetic methods with optional fill_value for missing data,
Sep 1, 2011
759
e.g. a.add(b, fill_value=0). If a location is missing for both it will still
760
be missing in the result though.
761
- fill_value option has been added to `DataFrame`.{add, mul, sub, div} methods
Sep 1, 2011
762
similar to `Series`
763
- Boolean indexing with `DataFrame` objects: data[data > 0.1] = 0.1 or
Sep 1, 2011
764
data[data> other] = 1.