Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
DOC: more miscellaneous docs about new 0.5 features
  • Loading branch information
wesm committed Oct 23, 2011
1 parent f838ff9 commit 8ec1c97
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 25 deletions.
1 change: 1 addition & 0 deletions RELEASE.rst
Expand Up @@ -197,6 +197,7 @@ Thanks
- Daniel Fortunov
- Aman Thakral
- Luca Beltrame
- Wouter Overmeire

pandas 0.4.3
============
Expand Down
6 changes: 2 additions & 4 deletions TODO.rst
Expand Up @@ -33,13 +33,11 @@ TODO docs
- DONE & and | for intersection / union
- DONE Update to reflect Python 3 support in intro
- DONE Index / MultiIndex names

- Unstack / stack by level name
- name attribute on Series
- DONE Unstack / stack by level name
- DONE name attribute on Series

- Inner join on key
- Multi-key joining

- align functions
- df[col_list]
- Panel.rename_axis
Expand Down
14 changes: 14 additions & 0 deletions doc/source/dsintro.rst
Expand Up @@ -181,6 +181,20 @@ tools for working with labeled data.
of course have the option of dropping labels with missing data via the
**dropna** function.

Name attribute
~~~~~~~~~~~~~~

Series can also have a ``name`` attribute:

.. ipython:: python
s = Series(np.random.randn(5), name='something')
s
s.name
The Series ``name`` will be assigned automatically in many cases, in particular
when taking 1D slices of DataFrame as you will see below.

.. _basics.dataframe:

DataFrame
Expand Down
73 changes: 59 additions & 14 deletions doc/source/indexing.rst
Expand Up @@ -302,6 +302,58 @@ values, though setting arbitrary vectors is not yet supported:
df2.ix[3] = np.nan
df2
.. _indexing.class:

Index objects
-------------

The pandas Index class and its subclasses can be viewed as implementing an
*ordered set* in addition to providing the support infrastructure necessary for
lookups, data alignment, and reindexing. The easiest way to create one directly
is to pass a list or other sequence to ``Index``:

.. ipython:: python
index = Index(['e', 'd', 'a', 'b'])
index
'd' in index
You can also pass a ``name`` to be stored in the index:


.. ipython:: python
index = Index(['e', 'd', 'a', 'b'], name='something')
index.name
Starting with pandas 0.5, the name, if set, will be shown in the console
display:

.. ipython:: python
index = Index(range(5), name='rows')
columns = Index(['A', 'B', 'C'], name='cols')
df = DataFrame(np.random.randn(5, 3), index=index, columns=columns)
df
df['A']
Set operations on Index objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The three main operations are ``union (|)``, ``intersection (&)``, and ``diff
(-)``. These can be directly called as instance methods or used via overloaded
operators:

.. ipython:: python
a = Index(['c', 'b', 'a'])
b = Index(['c', 'e', 'd'])
a.union(b)
a | b
a & b
a - b
.. _indexing.hierarchical:

Hierarchical indexing (MultiIndex)
Expand Down Expand Up @@ -558,14 +610,15 @@ attribute. These will get automatically assigned in various places where
Some gory internal details
~~~~~~~~~~~~~~~~~~~~~~~~~~

Internally, the ``MultiIndex`` consists of two things: the **levels** and the
**labels**:
Internally, the ``MultiIndex`` consists of a few things: the **levels**, the
integer **labels**, and the level **names**:

.. ipython:: python
index
index.levels
index.labels
index.names
You can probably guess that the labels determine which unique element is
identified with that location at each layer of the index. It's important to
Expand All @@ -584,16 +637,6 @@ To do this, use the ``swaplevels`` function:
df
df.swaplevels(0, 1)
Index methods
-------------

The pandas Index class and its subclasses can be viewed as implementing an
*ordered set* in addition to providing the support infrastructure necessary for
lookups, data alignment, and reindexing.

Set operations on Index objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Indexing internal details
-------------------------

Expand All @@ -603,13 +646,15 @@ Indexing internal details
codebase. And the source code is still the best place to look at the
specifics of how things are implemented.

In pandas there are 3 distinct objects which can serve as valid containers for
the axis labels:
In pandas there are a few objects implemented which can serve as valid
containers for the axis labels:

- ``Index``: the generic "ordered set" object, an ndarray of object dtype
assuming nothing about its contents. The labels must be hashable (and
likely immutable) and unique. Populates a dict of label to location in
Cython to do :math:`O(1)` lookups.
- ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer
data, such as time stamps
- ``MultiIndex``: the standard hierarchical index object
- ``DateRange``: fixed frequency date range generated from a time rule or
DateOffset. An ndarray of Python datetime objects
Expand Down
22 changes: 15 additions & 7 deletions doc/source/reshaping.rst
Expand Up @@ -11,9 +11,9 @@
randn = np.random.randn
np.set_printoptions(precision=4, suppress=True)
**********************
Reshaping fundamentals
**********************
**************************
Reshaping and Pivot Tables
**************************

Reshaping by pivoting DataFrame objects
---------------------------------------
Expand Down Expand Up @@ -113,7 +113,7 @@ take a prior example data set from the hierarchical indexing section:
'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two',
'one', 'two', 'one', 'two']])
index = MultiIndex.from_tuples(tuples)
index = MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = DataFrame(randn(8, 2), index=index, columns=['A', 'B'])
df2 = df[:4]
df2
Expand Down Expand Up @@ -142,6 +142,13 @@ unstacks the **last level**:
stacked.unstack(1)
stacked.unstack(0)
If the indexes have names, you can use the level names instead of specifying
the level numbers:

.. ipython:: python
stacked.unstack('second')
These functions are very intelligent about handling missing data and do not
expect each subgroup within the hierarchical index to have the same set of
labels. They also can handle the index being unsorted (but you can make it
Expand All @@ -150,7 +157,8 @@ sorted by calling ``sortlevel``, of course). Here is a more complex example:
.. ipython:: python
columns = MultiIndex.from_tuples([('A', 'cat'), ('B', 'dog'),
('B', 'cat'), ('A', 'dog')])
('B', 'cat'), ('A', 'dog')],
names=['exp', 'animal'])
df = DataFrame(randn(8, 4), index=index, columns=columns)
df2 = df.ix[[0, 1, 2, 4, 5, 7]]
df2
Expand All @@ -160,8 +168,8 @@ which level in the columns to stack:

.. ipython:: python
df2.stack(1)
df2.stack(0)
df2.stack('exp')
df2.stack('animal')
Unstacking when the columns are a ``MultiIndex`` is also careful about doing
the right thing:
Expand Down
1 change: 1 addition & 0 deletions pandas/core/common.py
Expand Up @@ -16,6 +16,7 @@
# XXX: HACK for NumPy 1.5.1 to suppress warnings
try:
np.seterr(all='ignore')
np.set_printoptions(suppress=True)
except Exception: # pragma: no cover
pass

Expand Down

0 comments on commit 8ec1c97

Please sign in to comment.