Merge commit 'v0.7.0rc1-73-g69d5bd8' into debian

* commit 'v0.7.0rc1-73-g69d5bd8': (44 commits) BUG: integer slices should never access label-indexing, GH pandas-dev#700 BUG: pandas-dev#680 clean up with check for py3compat BUG: pandas-dev#680 rears again. cut off another hydra head ENH: change to tree-like MultiIndex output with > 2 levels, GH pandas-dev#689 TST: added a test related to pandas-dev#680 BUG: related to closes pandas-dev#691, removed cruft BUG: closes pandas-dev#691, assignment with ix and mixed dtypes BUG: handle incomparable values when creating Factor, caused bug in py3 TST: Fixes for tests on Python 3. BUG: pandas-dev#680, print consistently when dataframe is empty TST: unit test for PR pandas-dev#684 ENH: Allow Series.to_csv to ignore the index. BUG: raise exception in DateRange with MonthEnd(0) instead of infinite loop, GH pandas-dev#683 BUG: unbox 0-dimensional arrays in map_infer, GH pandas-dev#690 updated license and credits for overview ENH: cythonize timestamp conversion in HDFStore TST: ok, this appears to work GH pandas-dev#680 TST: even more woes GH pandas-dev#680 TST: unicode woes on windoze GH pandas-dev#680 TST: unicode codec test issue, GH pandas-dev#680 ...
neurodebian · Jan 27, 2012 · b4ff285 · b4ff285
2 parents ea66f06 + 69d5bd8
commit b4ff285
Show file tree

Hide file tree

Showing 34 changed files with 761 additions and 154 deletions.
diff --git a/LICENSE b/LICENSE
@@ -1,7 +1,14 @@
-Copyright (c) 2008-2011 AQR Capital Management, LLC
+======================
+PANDAS LICENSING TERMS
+======================
+
+pandas is licensed under the BSD 3-Clause (also known as "BSD New" or 
+"BSD Simplified"), as follows:
+
+Copyright (c) 2011-2012, Lambda Foundry, Inc. and PyData Development Team
 All rights reserved.
 
-Copyright (c) 2011 Wes McKinney and pandas developers
+Copyright (c) 2008-2011 AQR Capital Management, LLC
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
@@ -31,3 +38,43 @@ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+About the Copyright Holders
+===========================
+
+AQR Capital Management began pandas development in 2008. Development was
+led by Wes McKinney. AQR released the source under this license in 2009.
+Wes is now an employee of Lambda Foundry, and remains the pandas project
+lead.
+
+The PyData Development Team is the collection of developers of the PyData
+project. This includes all of the PyData sub-projects, including pandas. The
+core team that coordinates development on GitHub can be found here:
+http://github.com/pydata.
+
+Full credits for pandas contributors can be found in the documentation.
+
+Our Copyright Policy
+====================
+
+PyData uses a shared copyright model. Each contributor maintains copyright
+over their contributions to PyData. However, it is important to note that
+these contributions are typically only changes to the repositories. Thus,
+the PyData source code, in its entirety, is not the copyright of any single
+person or institution. Instead, it is the collective copyright of the
+entire PyData Development Team. If individual contributors want to maintain
+a record of what changes/contributions they have specific copyright on,
+they should indicate their copyright in the commit message of the change
+when they commit the change to one of the PyData repositories.
+
+With this in mind, the following banner should be used in any source code 
+file to indicate the copyright and license terms:
+
+#-----------------------------------------------------------------------------
+# Copyright (c) 2012, PyData Development Team
+# All rights reserved.
+#
+# Distributed under the terms of the BSD Simplified License.
+#
+# The full license is in the LICENSE file, distributed with this software.
+#-----------------------------------------------------------------------------
diff --git a/RELEASE.rst b/RELEASE.rst
@@ -162,6 +162,7 @@ pandas 0.7.0
     yourself) to ``groupby`` in some cases (GH #659)
   - Use ``kind`` argument to Series.order for selecting different sort kinds
     (GH #668)
+  - Add option to Series.to_csv to omit the index (PR #684)
 
 **Bug fixes**
 
@@ -231,6 +232,15 @@ pandas 0.7.0
   - Fix bugs preventing SparseDataFrame and SparseSeries working with groupby
     (GH #666)
   - Use sort kind in Series.sort / argsort (GH #668)
+  - Fix DataFrame operations on non-scalar, non-pandas objects (GH #672)
+  - Don't convert DataFrame column to integer type when passing integer to
+    __setitem__ (GH #669)
+  - Fix downstream bug in pivot_table caused by integer level names in
+    MultiIndex (GH #678)
+  - Fix SparseSeries.combine_first when passed a dense Series (GH #687)
+  - Fix performance regression in HDFStore loading when DataFrame or Panel
+    stored in table format with datetimes
+  - Raise Exception in DateRange when offset with n=0 is passed (GH #683)
 
 Thanks
 ------
@@ -253,13 +263,15 @@ Thanks
 - Sam Reckoner
 - Craig Reeson
 - Jan Schulz
+- Skipper Seabold
 - Ted Square
 - Graham Taylor
 - Aman Thakral
 - Chris Uga
 - Dieter Vandenbussche
 - Texas P.
 - Pinxing Ye
+- ... and everyone I forgot
 
 pandas 0.6.1
 ============

diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst
@@ -27,13 +27,94 @@ general, we were given the difficult choice between either
 - Using a special sentinel value, bit pattern, or set of sentinel values to
   denote ``NA`` across the dtypes
 
+For many reasons we chose the latter. After years of production use it has
+proven, at least in my opinion, to be the best decision given the state of
+affairs in NumPy and Python in general. The special value ``NaN``
+(Not-A-Number) is used everywhere as the ``NA`` value, and there are API
+functions ``isnull`` and ``notnull`` which can be used across the dtypes to
+detect NA values.
+
+However, it comes with it a couple of trade-offs which I most certainly have
+not ignored.
 
 Support for integer ``NA``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+In the absence of high performance ``NA`` support being built into NumPy from
+the ground up, the primary casualty is the ability to represent NAs in integer
+arrays. For example:
+
+.. ipython:: python
+
+   s = Series([1, 2, 3, 4, 5], index=list('abcde'))
+   s
+   s.dtype
+
+   s2 = s.reindex(['a', 'b', 'c', 'f', 'u'])
+   s2
+   s2.dtype
+
+This trade-off is made largely for memory and performance reasons, and also so
+that the resulting Series continues to be "numeric". One possibility is to use
+``dtype=object`` arrays instead.
+
 ``NA`` type promotions
 ~~~~~~~~~~~~~~~~~~~~~~
 
+When introducing NAs into an existing Series or DataFrame via ``reindex`` or
+some other means, boolean and integer types will be promoted to a different
+dtype in order to store the NAs. These are summarized by this table:
+
+.. csv-table::
+   :header: "Typeclass","Promotion dtype for storing NAs"
+   :widths: 40,60
+
+   ``floating``, no change
+   ``object``, no change
+   ``integer``, cast to ``float64``
+   ``boolean``, cast to ``object``
+
+While this may seem like a heavy trade-off, in practice I have found very few
+cases where this is an issue in practice. Some explanation for the motivation
+here in the next section.
+
+Why not make NumPy like R?
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Many people have suggested that NumPy should simply emulate the ``NA`` support
+present in the more domain-specific statistical programming langauge `R
+<http://r-project.org>`__. Part of the reason is the NumPy type hierarchy:
+
+.. csv-table::
+   :header: "Typeclass","Dtypes"
+   :widths: 30,70
+   :delim: |
+
+   ``numpy.floating`` | ``float16, float32, float64, float128``
+   ``numpy.integer`` | ``int8, int16, int32, int64``
+   ``numpy.unsignedinteger`` | ``uint8, uint16, uint32, uint64``
+   ``numpy.object_`` | ``object_``
+   ``numpy.bool_`` | ``bool_``
+   ``numpy.character`` | ``string_, unicode_``
+
+The R language, by contrast, only has a handful of built-in data types:
+``integer``, ``numeric`` (floating-point), ``character``, and
+``boolean``. ``NA`` types are implemented by reserving special bit patterns for
+each type to be used as the missing value. While doing this with the full NumPy
+type hierarchy would be possible, it would be a more substantial trade-off
+(especially for the 8- and 16-bit data types) and implementation undertaking.
+
+An alternate approach is that of using masked arrays. A masked array is an
+array of data with an associated boolean *mask* denoting whether each value
+should be considered ``NA`` or not. I am personally not in love with this
+approach as I feel that overall it places a fairly heavy burden on the user and
+the library implementer. Additionally, it exacts a fairly high performance cost
+when working with numerical data compared with the simple approach of using
+``NaN``. Thus, I have chosen the Pythonic "practicality beats purity" approach
+and traded integer ``NA`` capability for a much simpler approach of using a
+special value in float and object arrays to denote ``NA``, and promoting
+integer arrays to floating when NAs must be introduced.
+
 Integer indexing
 ----------------
 
@@ -71,7 +152,8 @@ index can be somewhat complicated. For example, the following does not work:
     s.ix['c':'e'+1]
 
 A very common use case is to limit a time series to start and end at two
-specific dates. To enable this, we made the design design to make label-based slicing include both endpoints:
+specific dates. To enable this, we made the design design to make label-based
+slicing include both endpoints:
 
 .. ipython:: python
 

diff --git a/doc/source/overview.rst b/doc/source/overview.rst
@@ -24,11 +24,6 @@ Package overview
  * Static and moving window linear and `panel regression
    <http://en.wikipedia.org/wiki/Panel_data>`__
 
-License
--------
-
-pandas is released under a standard 3-clause BSD license
-
 Data structures at a glance
 ---------------------------
 
@@ -82,16 +77,42 @@ but, for example, columns can be inserted into a DataFrame. However, the vast
 majority of methods produce new objects and leave the input data untouched. In
 general, though, we like to **favor immutability** where sensible.
 
+Development Team
+----------------
+
+pandas is a part of the PyData project. The PyData Development Team is a
+collection of developers focused on the improvement of Python's data
+libraries. The core team that coordinates development can be found on `Github
+<http://github.com/pydata>`__. If you're interested in contributing, please
+visit the `project website <http://pandas.pydata.org>`__.
 
-History
+Getting Support
+---------------
+
+Users and developers are encouraged to join the `pystatsmodels mailing list
+<http://groups.google.com/group/pystatsmodels>`__ or to contact Wes McKinney
+directly at wesmckinn (-at-) gmail (-dot-) com.
+
+For commercial support, training, or consulting, contact Wes at wes (-at-)
+lambdafoundry (-dot-) com.
+
+Credits
 -------
 
 pandas development began at `AQR Capital Management <http://www.aqr.com>`__ in
-April 2008. It was open-sourced at the end of 2009 and continues to be actively
-used and maintained.
-
-Contact
+April 2008. It was open-sourced at the end of 2009. AQR continued to provide
+resources for development through the end of 2011, and continues to contribute
+bug reports today.
+
+Since January 2012, `Lambda Foundry <http://www.lambdafoundry.com>`__, has
+been providing development resources, as well as commercial support, 
+training, and consulting for pandas.
+
+pandas is only made possible by a group of people around the world like you
+who have contributed new code, bug reports, fixes, comments and ideas. A
+complete list can be found `on Github <http://www.github.com/pydata/pandas/contributors>`__.
+
+License
 -------
 
-Please feel free to send comments or questions directly to Wes McKinney at
-wesmckinn (-at-) gmail (-dot-) com or the pystatsmodels mailing list.
+.. literalinclude:: ../../LICENSE
diff --git a/doc/source/themes/agogo/static/agogo.css_t b/doc/source/themes/agogo/static/agogo.css_t
@@ -32,7 +32,9 @@ div.header, div.content, div.footer {
 
 div.header-wrapper {
   background: {{ theme_headerbg }};
+  padding: 1em 1em 0;
   border-bottom: 3px solid #2e3436;
+  min-height: 0px;
 }
 
 
@@ -105,6 +107,11 @@ img {
   border: 0;
 }
 
+pre {
+  background-color: #EEE;
+  padding: 0.5em;
+}
+
 div.admonition {
   margin-top: 10px;
   margin-bottom: 10px;
@@ -123,10 +130,14 @@ dt:target, .highlighted {
 
 /* Header */
 
+/*
 div.header {
   padding-top: 10px;
   padding-bottom: 10px;
 }
+*/
+
+div.header {}
 
 div.header h1 {
   font-family: {{ theme_headerfont }};
@@ -140,13 +151,16 @@ div.header h1 a {
 }
 
 div.header div.rel {
-  margin-top: 1em;
+  text-decoration: none;
 }
+/*  margin-top: 1em; */
 
 div.header div.rel a {
+  margin-top: 1em;
   color: {{ theme_headerlinkcolor }};
   letter-spacing: .1em;
   text-transform: uppercase;
+  padding: 3px 1em;
 }
 
 p.logo {
@@ -161,9 +175,12 @@ img.logo {
 /* Content */
 div.content-wrapper {
   background-color: white;
+  padding: 1em;
+}
+/*
   padding-top: 20px;
   padding-bottom: 20px;
-}
+*/
 
 /*  float: left; */