
<a id='whatsnew-0700'></a>

# v.0.7.0 (February 9, 2012)

{{ header }}

## New features

- New unified [merge function](user_guide/merging.ipynb#merging-join) for efficiently performing
  full gamut of database / relational-algebra operations. Refactored existing
  join methods to use the new infrastructure, resulting in substantial
  performance gains ([GH220](https://github.com/pandas-dev/pandas/issues/220), [GH249](https://github.com/pandas-dev/pandas/issues/249), [GH267](https://github.com/pandas-dev/pandas/issues/267))  
- New [unified concatenation function](user_guide/merging.ipynb#merging-concat) for concatenating
  Series, DataFrame or Panel objects along an axis. Can form union or
  intersection of the other axes. Improves performance of `Series.append` and
  `DataFrame.append` ([GH468](https://github.com/pandas-dev/pandas/issues/468), [GH479](https://github.com/pandas-dev/pandas/issues/479), [GH273](https://github.com/pandas-dev/pandas/issues/273))  
- [Can](user_guide/merging.ipynb#merging-concatenation) pass multiple DataFrames to
  DataFrame.append to concatenate (stack) and multiple Series to
  `Series.append` too  
- [Can](getting_started/dsintro.ipynb#basics-dataframe-from-list-of-dicts) pass list of dicts (e.g., a
  list of JSON objects) to DataFrame constructor ([GH526](https://github.com/pandas-dev/pandas/issues/526))  
- You can now [set multiple columns](user_guide/indexing.ipynb#indexing-columns-multiple) in a
  DataFrame via `__getitem__`, useful for transformation ([GH342](https://github.com/pandas-dev/pandas/issues/342))  
- Handle differently-indexed output values in `DataFrame.apply` ([GH498](https://github.com/pandas-dev/pandas/issues/498))  


- [Add](user_guide/advanced.ipynb#advanced-reorderlevels) `reorder_levels` method to Series and
  DataFrame ([GH534](https://github.com/pandas-dev/pandas/issues/534))  
- [Add](user_guide/indexing.ipynb#indexing-dictionarylike) dict-like `get` function to DataFrame
  and Panel ([GH521](https://github.com/pandas-dev/pandas/issues/521))  
- [Add](getting_started/basics.ipynb#basics-iterrows) `DataFrame.iterrows` method for efficiently
  iterating through the rows of a DataFrame  
- Add `DataFrame.to_panel` with code adapted from
  `LongPanel.to_long`  
- [Add](getting_started/basics.ipynb#basics-reindexing) `reindex_axis` method added to DataFrame  
- [Add](getting_started/basics.ipynb#basics-stats) `level` option to binary arithmetic functions on
  `DataFrame` and `Series`  
- [Add](user_guide/advanced.ipynb#advanced-advanced-reindex) `level` option to the `reindex`
  and `align` methods on Series and DataFrame for broadcasting values across
  a level ([GH542](https://github.com/pandas-dev/pandas/issues/542), [GH552](https://github.com/pandas-dev/pandas/issues/552), others)  
- Add attribute-based item access to
  `Panel` and add IPython completion ([GH563](https://github.com/pandas-dev/pandas/issues/563))  
- [Add](user_guide/visualization.ipynb#visualization-basic) `logy` option to `Series.plot` for
  log-scaling on the Y axis  
- [Add](user_guide/io.ipynb#io-formatting) `index` and `header` options to
  `DataFrame.to_string`  
- [Can](user_guide/merging.ipynb#merging-multiple-join) pass multiple DataFrames to
  `DataFrame.join` to join on index ([GH115](https://github.com/pandas-dev/pandas/issues/115))  
- [Can](user_guide/merging.ipynb#merging-multiple-join) pass multiple Panels to `Panel.join`
  ([GH115](https://github.com/pandas-dev/pandas/issues/115))  
- [Added](user_guide/io.ipynb#io-formatting) `justify` argument to `DataFrame.to_string`
  to allow different alignment of column headers  
- [Add](user_guide/groupby.ipynb#groupby-attributes) `sort` option to GroupBy to allow disabling
  sorting of the group keys for potential speedups ([GH595](https://github.com/pandas-dev/pandas/issues/595))  
- [Can](getting_started/dsintro.ipynb#basics-dataframe-from-series) pass MaskedArray to Series
  constructor ([GH563](https://github.com/pandas-dev/pandas/issues/563))  
- Add Panel item access via attributes
  and IPython completion ([GH554](https://github.com/pandas-dev/pandas/issues/554))  
- Implement `DataFrame.lookup`, fancy-indexing analogue for retrieving values
  given a sequence of row and column labels ([GH338](https://github.com/pandas-dev/pandas/issues/338))  
- Can pass a [list of functions](user_guide/groupby.ipynb#groupby-aggregate-multifunc) to
  aggregate with groupby on a DataFrame, yielding an aggregated result with
  hierarchical columns ([GH166](https://github.com/pandas-dev/pandas/issues/166))  
- Can call `cummin` and `cummax` on Series and DataFrame to get cumulative
  minimum and maximum, respectively ([GH647](https://github.com/pandas-dev/pandas/issues/647))  
- `value_range` added as utility function to get min and max of a dataframe
  ([GH288](https://github.com/pandas-dev/pandas/issues/288))  
- Added `encoding` argument to `read_csv`, `read_table`, `to_csv` and
  `from_csv` for non-ascii text ([GH717](https://github.com/pandas-dev/pandas/issues/717))  
- [Added](getting_started/basics.ipynb#basics-stats) `abs` method to pandas objects  
- [Added](user_guide/reshaping.ipynb#reshaping-pivot) `crosstab` function for easily computing frequency tables  
- [Added](user_guide/indexing.ipynb#indexing-set-ops) `isin` method to index objects  
- [Added](user_guide/advanced.ipynb#advanced-xs) `level` argument to `xs` method of DataFrame.  

## API changes to integer indexing

One of the potentially riskiest API changes in 0.7.0, but also one of the most
important, was a complete review of how **integer indexes** are handled with
regard to label-based indexing. Here is an example:

This is all exactly identical to the behavior before. However, if you ask for a
key **not** contained in the Series, in versions 0.6.1 and prior, Series would
*fall back* on a location-based lookup. This now raises a `KeyError`:

```ipython
In [2]: s[1]
KeyError: 1
```


This change also has the same impact on DataFrame:

```ipython
In [3]: df = pd.DataFrame(np.random.randn(8, 4), index=range(0, 16, 2))

In [4]: df
    0        1       2       3
0   0.88427  0.3363 -0.1787  0.03162
2   0.14451 -0.1415  0.2504  0.58374
4  -1.44779 -0.9186 -1.4996  0.27163
6  -0.26598 -2.4184 -0.2658  0.11503
8  -0.58776  0.3144 -0.8566  0.61941
10  0.10940 -0.7175 -1.0108  0.47990
12 -1.16919 -0.3087 -0.6049 -0.43544
14 -0.07337  0.3410  0.0424 -0.16037

In [5]: df.ix[3]
KeyError: 3
```


In order to support purely integer-based indexing, the following methods have
been added:

``````````````````````|Method|Description|
|:--------------------------------------:|:----------------------------------------------------------:|
|Series.iget_value(i)|Retrieve value stored at location i|
|Series.iget(i)|Alias for iget_value|
|DataFrame.irow(i)|Retrieve the i-th row|
|DataFrame.icol(j)|Retrieve the j-th column|
|DataFrame.iget_value(i, j)|Retrieve the value at row i and column j|

## API tweaks regarding label-based slicing

Label-based slicing using `ix` now requires that the index be sorted
(monotonic) **unless** both the start and endpoint are contained in the index:

In [None]:
In [1]: s = pd.Series(np.random.randn(6), index=list('gmkaec'))

In [2]: s
Out[2]:
g   -1.182230
m   -0.276183
k   -0.243550
a    1.628992
e    0.073308
c   -0.539890
dtype: float64

Then this is OK:

In [None]:
In [3]: s.ix['k':'e']
Out[3]:
k   -0.243550
a    1.628992
e    0.073308
dtype: float64

But this is not:

```ipython
In [12]: s.ix['b':'h']
KeyError 'b'
```


If the index had been sorted, the “range selection” would have been possible:

In [None]:
In [4]: s2 = s.sort_index()

In [5]: s2
Out[5]:
a    1.628992
c   -0.539890
e    0.073308
g   -1.182230
k   -0.243550
m   -0.276183
dtype: float64

In [6]: s2.ix['b':'h']
Out[6]:
c   -0.539890
e    0.073308
g   -1.182230
dtype: float64

## Changes to Series `[]` operator

As as notational convenience, you can pass a sequence of labels or a label
slice to a Series when getting and setting values via `[]` (i.e. the
`__getitem__` and `__setitem__` methods). The behavior will be the same as
passing similar input to `ix` **except in the case of integer indexing**:

In the case of integer indexes, the behavior will be exactly as before
(shadowing `ndarray`):

If you wish to do indexing with sequences and slicing on an integer index with
label semantics, use `ix`.

## Other API changes

- The deprecated `LongPanel` class has been completely removed  
- If `Series.sort` is called on a column of a DataFrame, an exception will
  now be raised. Before it was possible to accidentally mutate a DataFrame’s
  column by doing `df[col].sort()` instead of the side-effect free method
  `df[col].order()` ([GH316](https://github.com/pandas-dev/pandas/issues/316))  
- Miscellaneous renames and deprecations which will (harmlessly) raise
  `FutureWarning`  
- `drop` added as an optional parameter to `DataFrame.reset_index` ([GH699](https://github.com/pandas-dev/pandas/issues/699))  

## Performance improvements

- [Cythonized GroupBy aggregations](user_guide/groupby.ipynb#groupby-aggregate-cython) no longer
  presort the data, thus achieving a significant speedup ([GH93](https://github.com/pandas-dev/pandas/issues/93)).  GroupBy
  aggregations with Python functions significantly sped up by clever
  manipulation of the ndarray data type in Cython ([GH496](https://github.com/pandas-dev/pandas/issues/496)).  
- Better error message in DataFrame constructor when passed column labels
  don’t match data ([GH497](https://github.com/pandas-dev/pandas/issues/497))  
- Substantially improve performance of multi-GroupBy aggregation when a
  Python function is passed, reuse ndarray object in Cython ([GH496](https://github.com/pandas-dev/pandas/issues/496))  
- Can store objects indexed by tuples and floats in HDFStore ([GH492](https://github.com/pandas-dev/pandas/issues/492))  
- Don’t print length by default in Series.to_string, add length option ([GH489](https://github.com/pandas-dev/pandas/issues/489))  
- Improve Cython code for multi-groupby to aggregate without having to sort
  the data ([GH93](https://github.com/pandas-dev/pandas/issues/93))  
- Improve MultiIndex reindexing speed by storing tuples in the MultiIndex,
  test for backwards unpickling compatibility  
- Improve column reindexing performance by using specialized Cython take
  function  
- Further performance tweaking of Series.__getitem__ for standard use cases  
- Avoid Index dict creation in some cases (i.e. when getting slices, etc.),
  regression from prior versions  
- Friendlier error message in setup.py if NumPy not installed  
- Use common set of NA-handling operations (sum, mean, etc.) in Panel class
  also ([GH536](https://github.com/pandas-dev/pandas/issues/536))  
- Default name assignment when calling `reset_index` on DataFrame with a
  regular (non-hierarchical) index ([GH476](https://github.com/pandas-dev/pandas/issues/476))  
- Use Cythonized groupers when possible in Series/DataFrame stat ops with
  `level` parameter passed ([GH545](https://github.com/pandas-dev/pandas/issues/545))  
- Ported skiplist data structure to C to speed up `rolling_median` by about
  5-10x in most typical use cases ([GH374](https://github.com/pandas-dev/pandas/issues/374))  



<a id='whatsnew-0-7-0-contributors'></a>

## Contributors

A total of 18 people contributed patches to this release.  People with a
“+” by their names contributed a patch for the first time.


- Adam Klein  
- Bayle Shanks +  
- Chris Billington +  
- Dieter Vandenbussche  
- Fabrizio Pollastri +  
- Graham Taylor +  
- Gregg Lind +  
- Josh Klein +  
- Luca Beltrame  
- Olivier Grisel +  
- Skipper Seabold  
- Thomas Kluyver  
- Thomas Wiecki +  
- Wes McKinney  
- Wouter Overmeire  
- Yaroslav Halchenko  
- fabriziop +  
- theandygross +  