Skip to content

Commit

Permalink
Merge commit 'v0.7.0rc1-73-g69d5bd8' into debian
Browse files Browse the repository at this point in the history
* commit 'v0.7.0rc1-73-g69d5bd8': (44 commits)
  BUG: integer slices should never access label-indexing, GH pandas-dev#700
  BUG: pandas-dev#680 clean up with check for py3compat
  BUG: pandas-dev#680 rears again. cut off another hydra head
  ENH: change to tree-like MultiIndex output with > 2 levels, GH pandas-dev#689
  TST: added a test related to pandas-dev#680
  BUG: related to closes pandas-dev#691, removed cruft
  BUG: closes pandas-dev#691, assignment with ix and mixed dtypes
  BUG: handle incomparable values when creating Factor, caused bug in py3
  TST: Fixes for tests on Python 3.
  BUG: pandas-dev#680, print consistently when dataframe is empty
  TST: unit test for PR pandas-dev#684
  ENH: Allow Series.to_csv to ignore the index.
  BUG: raise exception in DateRange with MonthEnd(0) instead of infinite loop, GH pandas-dev#683
  BUG: unbox 0-dimensional arrays in map_infer, GH pandas-dev#690
  updated license and credits for overview
  ENH: cythonize timestamp conversion in HDFStore
  TST: ok, this appears to work GH pandas-dev#680
  TST: even more woes GH pandas-dev#680
  TST: unicode woes on windoze GH pandas-dev#680
  TST: unicode codec test issue, GH pandas-dev#680
  ...
  • Loading branch information
yarikoptic committed Jan 27, 2012
2 parents ea66f06 + 69d5bd8 commit b4ff285
Show file tree
Hide file tree
Showing 34 changed files with 761 additions and 154 deletions.
51 changes: 49 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
Copyright (c) 2008-2011 AQR Capital Management, LLC
======================
PANDAS LICENSING TERMS
======================

pandas is licensed under the BSD 3-Clause (also known as "BSD New" or
"BSD Simplified"), as follows:

Copyright (c) 2011-2012, Lambda Foundry, Inc. and PyData Development Team
All rights reserved.

Copyright (c) 2011 Wes McKinney and pandas developers
Copyright (c) 2008-2011 AQR Capital Management, LLC
All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -31,3 +38,43 @@ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

About the Copyright Holders
===========================

AQR Capital Management began pandas development in 2008. Development was
led by Wes McKinney. AQR released the source under this license in 2009.
Wes is now an employee of Lambda Foundry, and remains the pandas project
lead.

The PyData Development Team is the collection of developers of the PyData
project. This includes all of the PyData sub-projects, including pandas. The
core team that coordinates development on GitHub can be found here:
http://github.com/pydata.

Full credits for pandas contributors can be found in the documentation.

Our Copyright Policy
====================

PyData uses a shared copyright model. Each contributor maintains copyright
over their contributions to PyData. However, it is important to note that
these contributions are typically only changes to the repositories. Thus,
the PyData source code, in its entirety, is not the copyright of any single
person or institution. Instead, it is the collective copyright of the
entire PyData Development Team. If individual contributors want to maintain
a record of what changes/contributions they have specific copyright on,
they should indicate their copyright in the commit message of the change
when they commit the change to one of the PyData repositories.

With this in mind, the following banner should be used in any source code
file to indicate the copyright and license terms:

#-----------------------------------------------------------------------------
# Copyright (c) 2012, PyData Development Team
# All rights reserved.
#
# Distributed under the terms of the BSD Simplified License.
#
# The full license is in the LICENSE file, distributed with this software.
#-----------------------------------------------------------------------------
12 changes: 12 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ pandas 0.7.0
yourself) to ``groupby`` in some cases (GH #659)
- Use ``kind`` argument to Series.order for selecting different sort kinds
(GH #668)
- Add option to Series.to_csv to omit the index (PR #684)

**Bug fixes**

Expand Down Expand Up @@ -231,6 +232,15 @@ pandas 0.7.0
- Fix bugs preventing SparseDataFrame and SparseSeries working with groupby
(GH #666)
- Use sort kind in Series.sort / argsort (GH #668)
- Fix DataFrame operations on non-scalar, non-pandas objects (GH #672)
- Don't convert DataFrame column to integer type when passing integer to
__setitem__ (GH #669)
- Fix downstream bug in pivot_table caused by integer level names in
MultiIndex (GH #678)
- Fix SparseSeries.combine_first when passed a dense Series (GH #687)
- Fix performance regression in HDFStore loading when DataFrame or Panel
stored in table format with datetimes
- Raise Exception in DateRange when offset with n=0 is passed (GH #683)

Thanks
------
Expand All @@ -253,13 +263,15 @@ Thanks
- Sam Reckoner
- Craig Reeson
- Jan Schulz
- Skipper Seabold
- Ted Square
- Graham Taylor
- Aman Thakral
- Chris Uga
- Dieter Vandenbussche
- Texas P.
- Pinxing Ye
- ... and everyone I forgot

pandas 0.6.1
============
Expand Down
84 changes: 83 additions & 1 deletion doc/source/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,94 @@ general, we were given the difficult choice between either
- Using a special sentinel value, bit pattern, or set of sentinel values to
denote ``NA`` across the dtypes

For many reasons we chose the latter. After years of production use it has
proven, at least in my opinion, to be the best decision given the state of
affairs in NumPy and Python in general. The special value ``NaN``
(Not-A-Number) is used everywhere as the ``NA`` value, and there are API
functions ``isnull`` and ``notnull`` which can be used across the dtypes to
detect NA values.

However, it comes with it a couple of trade-offs which I most certainly have
not ignored.

Support for integer ``NA``
~~~~~~~~~~~~~~~~~~~~~~~~~~

In the absence of high performance ``NA`` support being built into NumPy from
the ground up, the primary casualty is the ability to represent NAs in integer
arrays. For example:

.. ipython:: python
s = Series([1, 2, 3, 4, 5], index=list('abcde'))
s
s.dtype
s2 = s.reindex(['a', 'b', 'c', 'f', 'u'])
s2
s2.dtype
This trade-off is made largely for memory and performance reasons, and also so
that the resulting Series continues to be "numeric". One possibility is to use
``dtype=object`` arrays instead.

``NA`` type promotions
~~~~~~~~~~~~~~~~~~~~~~

When introducing NAs into an existing Series or DataFrame via ``reindex`` or
some other means, boolean and integer types will be promoted to a different
dtype in order to store the NAs. These are summarized by this table:

.. csv-table::
:header: "Typeclass","Promotion dtype for storing NAs"
:widths: 40,60

``floating``, no change
``object``, no change
``integer``, cast to ``float64``
``boolean``, cast to ``object``

While this may seem like a heavy trade-off, in practice I have found very few
cases where this is an issue in practice. Some explanation for the motivation
here in the next section.

Why not make NumPy like R?
~~~~~~~~~~~~~~~~~~~~~~~~~~

Many people have suggested that NumPy should simply emulate the ``NA`` support
present in the more domain-specific statistical programming langauge `R
<http://r-project.org>`__. Part of the reason is the NumPy type hierarchy:

.. csv-table::
:header: "Typeclass","Dtypes"
:widths: 30,70
:delim: |

``numpy.floating`` | ``float16, float32, float64, float128``
``numpy.integer`` | ``int8, int16, int32, int64``
``numpy.unsignedinteger`` | ``uint8, uint16, uint32, uint64``
``numpy.object_`` | ``object_``
``numpy.bool_`` | ``bool_``
``numpy.character`` | ``string_, unicode_``

The R language, by contrast, only has a handful of built-in data types:
``integer``, ``numeric`` (floating-point), ``character``, and
``boolean``. ``NA`` types are implemented by reserving special bit patterns for
each type to be used as the missing value. While doing this with the full NumPy
type hierarchy would be possible, it would be a more substantial trade-off
(especially for the 8- and 16-bit data types) and implementation undertaking.

An alternate approach is that of using masked arrays. A masked array is an
array of data with an associated boolean *mask* denoting whether each value
should be considered ``NA`` or not. I am personally not in love with this
approach as I feel that overall it places a fairly heavy burden on the user and
the library implementer. Additionally, it exacts a fairly high performance cost
when working with numerical data compared with the simple approach of using
``NaN``. Thus, I have chosen the Pythonic "practicality beats purity" approach
and traded integer ``NA`` capability for a much simpler approach of using a
special value in float and object arrays to denote ``NA``, and promoting
integer arrays to floating when NAs must be introduced.

Integer indexing
----------------

Expand Down Expand Up @@ -71,7 +152,8 @@ index can be somewhat complicated. For example, the following does not work:
s.ix['c':'e'+1]

A very common use case is to limit a time series to start and end at two
specific dates. To enable this, we made the design design to make label-based slicing include both endpoints:
specific dates. To enable this, we made the design design to make label-based
slicing include both endpoints:

.. ipython:: python
Expand Down
45 changes: 33 additions & 12 deletions doc/source/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,6 @@ Package overview
* Static and moving window linear and `panel regression
<http://en.wikipedia.org/wiki/Panel_data>`__

License
-------

pandas is released under a standard 3-clause BSD license

Data structures at a glance
---------------------------

Expand Down Expand Up @@ -82,16 +77,42 @@ but, for example, columns can be inserted into a DataFrame. However, the vast
majority of methods produce new objects and leave the input data untouched. In
general, though, we like to **favor immutability** where sensible.

Development Team
----------------

pandas is a part of the PyData project. The PyData Development Team is a
collection of developers focused on the improvement of Python's data
libraries. The core team that coordinates development can be found on `Github
<http://github.com/pydata>`__. If you're interested in contributing, please
visit the `project website <http://pandas.pydata.org>`__.

History
Getting Support
---------------

Users and developers are encouraged to join the `pystatsmodels mailing list
<http://groups.google.com/group/pystatsmodels>`__ or to contact Wes McKinney
directly at wesmckinn (-at-) gmail (-dot-) com.

For commercial support, training, or consulting, contact Wes at wes (-at-)
lambdafoundry (-dot-) com.

Credits
-------

pandas development began at `AQR Capital Management <http://www.aqr.com>`__ in
April 2008. It was open-sourced at the end of 2009 and continues to be actively
used and maintained.

Contact
April 2008. It was open-sourced at the end of 2009. AQR continued to provide
resources for development through the end of 2011, and continues to contribute
bug reports today.

Since January 2012, `Lambda Foundry <http://www.lambdafoundry.com>`__, has
been providing development resources, as well as commercial support,
training, and consulting for pandas.

pandas is only made possible by a group of people around the world like you
who have contributed new code, bug reports, fixes, comments and ideas. A
complete list can be found `on Github <http://www.github.com/pydata/pandas/contributors>`__.

License
-------

Please feel free to send comments or questions directly to Wes McKinney at
wesmckinn (-at-) gmail (-dot-) com or the pystatsmodels mailing list.
.. literalinclude:: ../../LICENSE
21 changes: 19 additions & 2 deletions doc/source/themes/agogo/static/agogo.css_t
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ div.header, div.content, div.footer {

div.header-wrapper {
background: {{ theme_headerbg }};
padding: 1em 1em 0;
border-bottom: 3px solid #2e3436;
min-height: 0px;
}


Expand Down Expand Up @@ -105,6 +107,11 @@ img {
border: 0;
}

pre {
background-color: #EEE;
padding: 0.5em;
}

div.admonition {
margin-top: 10px;
margin-bottom: 10px;
Expand All @@ -123,10 +130,14 @@ dt:target, .highlighted {

/* Header */

/*
div.header {
padding-top: 10px;
padding-bottom: 10px;
}
*/

div.header {}

div.header h1 {
font-family: {{ theme_headerfont }};
Expand All @@ -140,13 +151,16 @@ div.header h1 a {
}

div.header div.rel {
margin-top: 1em;
text-decoration: none;
}
/* margin-top: 1em; */

div.header div.rel a {
margin-top: 1em;
color: {{ theme_headerlinkcolor }};
letter-spacing: .1em;
text-transform: uppercase;
padding: 3px 1em;
}

p.logo {
Expand All @@ -161,9 +175,12 @@ img.logo {
/* Content */
div.content-wrapper {
background-color: white;
padding: 1em;
}
/*
padding-top: 20px;
padding-bottom: 20px;
}
*/

/* float: left; */

Expand Down
Loading

0 comments on commit b4ff285

Please sign in to comment.