
<a id='whatsnew-0152'></a>

# v0.15.2 (December 12, 2014)

{{ header }}

This is a minor release from 0.15.1 and includes a large number of bug fixes
along with several new features, enhancements, and performance improvements.
A small number of API changes were necessary to fix existing bugs.
We recommend that all users upgrade to this version.

- [Enhancements](#whatsnew-0152-enhancements)  
- [API Changes](#whatsnew-0152-api)  
- [Performance Improvements](#whatsnew-0152-performance)  
- [Bug Fixes](#whatsnew-0152-bug-fixes)  



<a id='whatsnew-0152-api'></a>

## API changes

- Indexing in `MultiIndex` beyond lex-sort depth is now supported, though
  a lexically sorted index will have a better performance. ([GH2646](https://github.com/pandas-dev/pandas/issues/2646))  
- Bug in unique of Series with `category` dtype, which returned all categories regardless
whether they were “used” or not (see [GH8559](https://github.com/pandas-dev/pandas/issues/8559) for the discussion).
Previous behaviour was to return all categories:  

```ipython
In [3]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c'])

In [4]: cat
Out[4]:
[a, b, a]
Categories (3, object): [a < b < c]

In [5]: cat.unique()
Out[5]: array(['a', 'b', 'c'], dtype=object)
```


Now, only the categories that do effectively occur in the array are returned:  
- `Series.all` and `Series.any` now support the `level` and `skipna` parameters. `Series.all`, `Series.any`, `Index.all`, and `Index.any` no longer support the `out` and `keepdims` parameters, which existed for compatibility with ndarray. Various index types no longer support the `all` and `any` aggregation functions and will now raise `TypeError`. ([GH8302](https://github.com/pandas-dev/pandas/issues/8302)).  
- Allow equality comparisons of Series with a categorical dtype and object dtype; previously these would raise `TypeError` ([GH8938](https://github.com/pandas-dev/pandas/issues/8938))  
- Bug in `NDFrame`: conflicting attribute/column names now behave consistently between getting and setting. Previously, when both a column and attribute named `y` existed, `data.y` would return the attribute, while `data.y = z` would update the column ([GH8994](https://github.com/pandas-dev/pandas/issues/8994))  
Old behavior:  

```ipython
In [6]: data.y
Out[6]: 2

In [7]: data['y'].values
Out[7]: array([5, 5, 5])
```


New behavior:  
- `Timestamp('now')` is now equivalent to `Timestamp.now()` in that it returns the local time rather than UTC. Also, `Timestamp('today')` is now equivalent to `Timestamp.today()` and both have `tz` as a possible argument. ([GH9000](https://github.com/pandas-dev/pandas/issues/9000))  
- Fix negative step support for label-based slices ([GH8753](https://github.com/pandas-dev/pandas/issues/8753))  
Old behavior:  

```ipython
In [1]: s = pd.Series(np.arange(3), ['a', 'b', 'c'])
Out[1]:
a    0
b    1
c    2
dtype: int64

In [2]: s.loc['c':'a':-1]
Out[2]:
c    2
dtype: int64
```


New behavior:  



<a id='whatsnew-0152-enhancements'></a>

## Enhancements

`Categorical` enhancements:

- Added ability to export Categorical data to Stata ([GH8633](https://github.com/pandas-dev/pandas/issues/8633)).  See [here](user_guide/io.ipynb#io-stata-categorical) for limitations of categorical variables exported to Stata data files.  
- Added flag `order_categoricals` to `StataReader` and `read_stata` to select whether to order imported categorical data ([GH8836](https://github.com/pandas-dev/pandas/issues/8836)).  See [here](user_guide/io.ipynb#io-stata-categorical) for more information on importing categorical variables from Stata data files.  
- Added ability to export Categorical data to to/from HDF5 ([GH7621](https://github.com/pandas-dev/pandas/issues/7621)). Queries work the same as if it was an object array. However, the `category` dtyped data is stored in a more efficient manner. See [here](user_guide/io.ipynb#io-hdf5-categorical) for an example and caveats w.r.t. prior versions of pandas.  
- Added support for `searchsorted()` on Categorical class ([GH8420](https://github.com/pandas-dev/pandas/issues/8420)).  


Other enhancements:

- Added the ability to specify the SQL type of columns when writing a DataFrame
to a database ([GH8778](https://github.com/pandas-dev/pandas/issues/8778)).
For example, specifying to use the sqlalchemy `String` type instead of the
default `Text` type for string columns:  

In [None]:
from sqlalchemy.types import String
data.to_sql('data_dtype', engine, dtype={'Col_1': String})  # noqa F821

- `Series.all` and `Series.any` now support the `level` and `skipna` parameters ([GH8302](https://github.com/pandas-dev/pandas/issues/8302)):  
- `Panel` now supports the `all` and `any` aggregation functions. ([GH8302](https://github.com/pandas-dev/pandas/issues/8302)):  

In [None]:
>>> p = pd.Panel(np.random.rand(2, 5, 4) > 0.1)
>>> p.all()
       0      1      2     3
0   True   True   True  True
1   True  False   True  True
2   True   True   True  True
3  False   True  False  True
4   True   True   True  True

- Added support for `utcfromtimestamp()`, `fromtimestamp()`, and `combine()` on Timestamp class ([GH5351](https://github.com/pandas-dev/pandas/issues/5351)).  
- Added Google Analytics (pandas.io.ga) basic documentation ([GH8835](https://github.com/pandas-dev/pandas/issues/8835)). See [here](http://pandas.pydata.org/pandas-docs/version/0.15.2/remote_data.html#remote-data-ga).  
- `Timedelta` arithmetic returns `NotImplemented` in unknown cases, allowing extensions by custom classes ([GH8813](https://github.com/pandas-dev/pandas/issues/8813)).  
- `Timedelta` now supports arithmetic with `numpy.ndarray` objects of the appropriate dtype (numpy 1.8 or newer only) ([GH8884](https://github.com/pandas-dev/pandas/issues/8884)).  
- Added `Timedelta.to_timedelta64()` method to the public API ([GH8884](https://github.com/pandas-dev/pandas/issues/8884)).  
- Added `gbq.generate_bq_schema()` function to the gbq module ([GH8325](https://github.com/pandas-dev/pandas/issues/8325)).  
- `Series` now works with map objects the same way as generators ([GH8909](https://github.com/pandas-dev/pandas/issues/8909)).  
- Added context manager to `HDFStore` for automatic closing ([GH8791](https://github.com/pandas-dev/pandas/issues/8791)).  
- `to_datetime` gains an `exact` keyword to allow for a format to not require an exact match for a provided format string (if its `False`). `exact` defaults to `True` (meaning that exact matching is still the default)  ([GH8904](https://github.com/pandas-dev/pandas/issues/8904))  
- Added `axvlines` boolean option to parallel_coordinates plot function, determines whether vertical lines will be printed, default is True  
- Added ability to read table footers to read_html ([GH8552](https://github.com/pandas-dev/pandas/issues/8552))  
- `to_sql` now infers data types of non-NA values for columns that contain NA values and have dtype `object` ([GH8778](https://github.com/pandas-dev/pandas/issues/8778)).  



<a id='whatsnew-0152-performance'></a>

## Performance

- Reduce memory usage when skiprows is an integer in read_csv ([GH8681](https://github.com/pandas-dev/pandas/issues/8681))  
- Performance boost for `to_datetime` conversions with a passed `format=`, and the `exact=False` ([GH8904](https://github.com/pandas-dev/pandas/issues/8904))  



<a id='whatsnew-0152-bug-fixes'></a>

## Bug fixes

- Bug in concat of Series with `category` dtype which were coercing to `object`. ([GH8641](https://github.com/pandas-dev/pandas/issues/8641))  
- Bug in Timestamp-Timestamp not returning a Timedelta type and datelike-datelike ops with timezones ([GH8865](https://github.com/pandas-dev/pandas/issues/8865))  
- Made consistent a timezone mismatch exception (either tz operated with None or incompatible timezone), will now return `TypeError` rather than `ValueError` (a couple of edge cases only), ([GH8865](https://github.com/pandas-dev/pandas/issues/8865))  
- Bug in using a `pd.Grouper(key=...)` with no level/axis or level only ([GH8795](https://github.com/pandas-dev/pandas/issues/8795), [GH8866](https://github.com/pandas-dev/pandas/issues/8866))  
- Report a `TypeError` when invalid/no parameters are passed in a groupby ([GH8015](https://github.com/pandas-dev/pandas/issues/8015))  
- Bug in packaging pandas with `py2app/cx_Freeze` ([GH8602](https://github.com/pandas-dev/pandas/issues/8602), [GH8831](https://github.com/pandas-dev/pandas/issues/8831))  
- Bug in `groupby` signatures that didn’t include *args or **kwargs ([GH8733](https://github.com/pandas-dev/pandas/issues/8733)).  
- `io.data.Options` now raises `RemoteDataError` when no expiry dates are available from Yahoo and when it receives no data from Yahoo ([GH8761](https://github.com/pandas-dev/pandas/issues/8761)), ([GH8783](https://github.com/pandas-dev/pandas/issues/8783)).  
- Unclear error message in csv parsing when passing dtype and names and the parsed data is a different data type ([GH8833](https://github.com/pandas-dev/pandas/issues/8833))  
- Bug in slicing a MultiIndex with an empty list and at least one boolean indexer ([GH8781](https://github.com/pandas-dev/pandas/issues/8781))  
- `io.data.Options` now raises `RemoteDataError` when no expiry dates are available from Yahoo ([GH8761](https://github.com/pandas-dev/pandas/issues/8761)).  
- `Timedelta` kwargs may now be numpy ints and floats ([GH8757](https://github.com/pandas-dev/pandas/issues/8757)).  
- Fixed several outstanding bugs for `Timedelta` arithmetic and comparisons ([GH8813](https://github.com/pandas-dev/pandas/issues/8813), [GH5963](https://github.com/pandas-dev/pandas/issues/5963), [GH5436](https://github.com/pandas-dev/pandas/issues/5436)).  
- `sql_schema` now generates dialect appropriate `CREATE TABLE` statements ([GH8697](https://github.com/pandas-dev/pandas/issues/8697))  
- `slice` string method now takes step into account ([GH8754](https://github.com/pandas-dev/pandas/issues/8754))  
- Bug in `BlockManager` where setting values with different type would break block integrity ([GH8850](https://github.com/pandas-dev/pandas/issues/8850))  
- Bug in `DatetimeIndex` when using `time` object as key ([GH8667](https://github.com/pandas-dev/pandas/issues/8667))  
- Bug in `merge` where `how='left'` and `sort=False` would not preserve left frame order ([GH7331](https://github.com/pandas-dev/pandas/issues/7331))  
- Bug in `MultiIndex.reindex` where reindexing at level would not reorder labels ([GH4088](https://github.com/pandas-dev/pandas/issues/4088))  
- Bug in certain operations with dateutil timezones, manifesting with dateutil 2.3 ([GH8639](https://github.com/pandas-dev/pandas/issues/8639))  
- Regression in DatetimeIndex iteration with a Fixed/Local offset timezone ([GH8890](https://github.com/pandas-dev/pandas/issues/8890))  
- Bug in `to_datetime` when parsing a nanoseconds using the `%f` format ([GH8989](https://github.com/pandas-dev/pandas/issues/8989))  
- `io.data.Options` now raises `RemoteDataError` when no expiry dates are available from Yahoo and when it receives no data from Yahoo ([GH8761](https://github.com/pandas-dev/pandas/issues/8761)), ([GH8783](https://github.com/pandas-dev/pandas/issues/8783)).  
- Fix: The font size was only set on x axis if vertical or the y axis if horizontal. ([GH8765](https://github.com/pandas-dev/pandas/issues/8765))  
- Fixed division by 0 when reading big csv files in python 3 ([GH8621](https://github.com/pandas-dev/pandas/issues/8621))  
- Bug in outputting a MultiIndex with `to_html,index=False` which would add an extra column ([GH8452](https://github.com/pandas-dev/pandas/issues/8452))  
- Imported categorical variables from Stata files retain the ordinal information in the underlying data ([GH8836](https://github.com/pandas-dev/pandas/issues/8836)).  
- Defined `.size` attribute across `NDFrame` objects to provide compat with numpy >= 1.9.1; buggy with `np.array_split` ([GH8846](https://github.com/pandas-dev/pandas/issues/8846))  
- Skip testing of histogram plots for matplotlib <= 1.2 ([GH8648](https://github.com/pandas-dev/pandas/issues/8648)).  
- Bug where `get_data_google` returned object dtypes ([GH3995](https://github.com/pandas-dev/pandas/issues/3995))  
- Bug in `DataFrame.stack(..., dropna=False)` when the DataFrame’s `columns` is a `MultiIndex`
  whose `labels` do not reference all its `levels`. ([GH8844](https://github.com/pandas-dev/pandas/issues/8844))  
- Bug in that Option context applied on `__enter__` ([GH8514](https://github.com/pandas-dev/pandas/issues/8514))  
- Bug in resample that causes a ValueError when resampling across multiple days
  and the last offset is not calculated from the start of the range ([GH8683](https://github.com/pandas-dev/pandas/issues/8683))  
- Bug where `DataFrame.plot(kind='scatter')` fails when checking if an np.array is in the DataFrame ([GH8852](https://github.com/pandas-dev/pandas/issues/8852))  
- Bug in `pd.infer_freq/DataFrame.inferred_freq` that prevented proper sub-daily frequency inference when the index contained DST days ([GH8772](https://github.com/pandas-dev/pandas/issues/8772)).  
- Bug where index name was still used when plotting a series with `use_index=False` ([GH8558](https://github.com/pandas-dev/pandas/issues/8558)).  
- Bugs when trying to stack multiple columns, when some (or all) of the level names are numbers ([GH8584](https://github.com/pandas-dev/pandas/issues/8584)).  
- Bug in `MultiIndex` where `__contains__` returns wrong result if index is not lexically sorted or unique ([GH7724](https://github.com/pandas-dev/pandas/issues/7724))  
- BUG CSV: fix problem with trailing white space in skipped rows, ([GH8679](https://github.com/pandas-dev/pandas/issues/8679)), ([GH8661](https://github.com/pandas-dev/pandas/issues/8661)), ([GH8983](https://github.com/pandas-dev/pandas/issues/8983))  
- Regression in `Timestamp` does not parse ‘Z’ zone designator for UTC ([GH8771](https://github.com/pandas-dev/pandas/issues/8771))  
- Bug in StataWriter the produces writes strings with 244 characters irrespective of actual size ([GH8969](https://github.com/pandas-dev/pandas/issues/8969))  
- Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. ([GH8965](https://github.com/pandas-dev/pandas/issues/8965))  
- Bug in DataReader returns object dtype if there are missing values ([GH8980](https://github.com/pandas-dev/pandas/issues/8980))  
- Bug in plotting if sharex was enabled and index was a timeseries, would show labels on multiple axes ([GH3964](https://github.com/pandas-dev/pandas/issues/3964)).  
- Bug where passing a unit to the TimedeltaIndex constructor applied the to nano-second conversion twice. ([GH9011](https://github.com/pandas-dev/pandas/issues/9011)).  
- Bug in plotting of a period-like array ([GH9012](https://github.com/pandas-dev/pandas/issues/9012))  



<a id='whatsnew-0-15-2-contributors'></a>

## Contributors

A total of 49 people contributed patches to this release.  People with a
“+” by their names contributed a patch for the first time.


- Aaron Staple  
- Angelos Evripiotis +  
- Artemy Kolchinsky  
- Benoit Pointet +  
- Brian Jacobowski +  
- Charalampos Papaloizou +  
- Chris Warth +  
- David Stephens  
- Fabio Zanini +  
- Francesc Via +  
- Henry Kleynhans +  
- Jake VanderPlas +  
- Jan Schulz  
- Jeff Reback  
- Jeff Tratner  
- Joris Van den Bossche  
- Kevin Sheppard  
- Matt Suggit +  
- Matthew Brett  
- Phillip Cloud  
- Rupert Thompson +  
- Scott E Lasley +  
- Stephan Hoyer  
- Stephen Simmons +  
- Sylvain Corlay +  
- Thomas Grainger +  
- Tiago Antao +  
- Tom Augspurger  
- Trent Hauck  
- Victor Chaves +  
- Victor Salgado +  
- Vikram Bhandoh +  
- WANG Aiyong  
- Will Holmgren +  
- behzad nouri  
- broessli +  
- charalampos papaloizou +  
- immerrr  
- jnmclarty  
- jreback  
- mgilbert +  
- onesandzeroes  
- peadarcoyle +  
- rockg  
- seth-p  
- sinhrks  
- unutbu  
- wavedatalab +  
- Åsmund Hjulstad +  