In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.DataFrame({
    'ticker': ['AAPL', 'AAPL', 'MSFT', 'IBM', 'YHOO'],
    'date': ['2015-12-30', '2015-12-31', '2015-12-30', '2015-12-30', '2015-12-30'],
    'open': [426.23, 427.81, 42.3, 101.65, 35.53]
})

## Multi-Axis Selection

The most common situation is logical indexing on the rows and
label indexing on the columns using `loc`.

In [None]:
idx = [True, True, False, True, True]
df.loc[idx, ['date', 'open']]

### Add an index 

May select by _label_ on both rows and columns.

We haven't set an index on `df` so it has the default integer index.
Let's set one now.

In [None]:
df

In [None]:
# Note that `df1` is a copy of `df`
df1 = df.set_index('ticker')

### Select by row index

In [None]:
df1

The tickers are no longer part of the values of the `DataFrame`

Consequently, we can use them for index lookups.

### Safely select rows

In [None]:
# explicitly require all columns and return a dataframe
df1.loc[['MSFT'], :]

### Unsafely select rows

In [None]:
# Select by row label
df1.loc['MSFT', :]

Defaults to all columns, but I prefer explicit selection.
Easier to figure out what your code is doing.

In [None]:
df1.loc['MSFT'] # same result but confusing

### Always return rows as a `DataFrame`

Rows may be returned as either `Series` or `DataFrame` by using `loc`.

In [None]:
df1.loc['AAPL', :]

This always returns a `DataFrame`:

In [None]:
df1.loc[['AAPL'], :]

## Modifying a DataFrame

**Goals**: 

* Create a new column `close` with the same values as `open`
* Set `close` to `5000.00` when `date == '2015-12-31'`

### Use `loc` or `iloc` for inplace modification

In [None]:
df1a = df1.copy() # don't modify df1
df1a['close'] = df1a['open']
df1a.loc[df1a.date == '2015-12-31', 'close'] = 5000.
df1a

### How not to modify a DataFrame

* `pandas` returns a "view" of a `DataFrame` when selecting a subset
* View does not create a copy; however, **cannot** be used to modify the original

In [None]:
df1b = df1.copy()
df1b['close'] = df1b['open']
df1b_view = df1b[df1b.date == '2015-12-31']
df1b_view

### How not to modify a DataFrame (cont)

Creates a copy when you assign to a view and emits a warning

In [None]:
df1b_view['close'] = 5000

In [None]:
df1b_view

### The original `DataFrame` is unchanged.

In [None]:
df1b

### Subsets without warnings

If you want to work with the subset, make a copy:

In [None]:
# What you want is
df2 = df1[df1.date != '2015-12-31'].copy()
df2

In [None]:
df2['close'] = 1
df2

## Assignment Gotchas

TODO: Maybe show new Int64 type here?

### Types of NaNs

In [None]:
# May create NAs with `np.nan`, None, or float('nan')
df2['close'] = np.nan

In [None]:
df2['close'] = None
df2.close.dtype

### Silently drop data when adding columns

Assigning a `Series` to a `DataFrame` column does an implicit left-join.

In [None]:
closes = pd.Series({'AAPL': 430.0, 'MSFT': 43.5, 'SP5': 1263.5})
df2['close'] = closes
df2

### Avoiding automatic alignment

Use a sequence (`list`, `tuple`) or a `numpy` array if you don't
want automatic alignment.

In [None]:
x = pd.Series([1, 2, 3, 4], index=list('abcd'))
df2['close'] = x
df2

In [None]:
df2['close'] = x.array
df2

## Sorting DataFrames

In [None]:
df2.sort_index()

In [None]:
df2.sort_values('open')