# CHAPTER 5 
---
# Getting Started With pandas

In [4]:
import pandas as pd
from pandas import DataFrame, Series

In [7]:
import numpy as np
import matplotlib as plt

## 5.1 Introduction to pandas Data Structures 

### Series
- Attributes
    - Series.values
    - Series.index
    - Series.name
    - Series.index.name

### DataFrame
- Attributes
    - DataFrames.T
- Possible data inputs to DataFrame constructor
    - 2D ndarray
    - dict of arrays, lists, or tuples
    - NumPy structured/record array
    - dict of Series
    - dict of dict
    - List of dicts or Series
    - List of list or tuples
    - Another DataFrame
    - NumPy MaskedArray

### Index Objects
- some Index methods and properties
    - append | 连接另一个Index对象,产生一个新的Index
    - difference | 计算差集, 并的到一个Index
    - intersection | 计算交集
    - union | 计算并集
    - isin | 是否包含
    - delete | 删除索引i
    - drop | 删除传入的值
    - insert | 插入到索引i
    - is_monotonic | 当各元素均大于等于前一个元素时,返回True
    - is_unique | 当Index没有重复值时,返回True
    - unique | 计算Index中唯一值的数组

## 5.2 Essential Functionality

### Reindexing
- `reindex` function arguments
    - `index=`
    - `method=`
    - `fill_value=`
    - `limit=`
    - `tolerance=`
    - `level=`
    - `copy=`

### Dropping Entries from an Axis
- `drop` function arguments
    - `axis=`
    - `inplace=`

### Indexing, Selection, and Filtering
- Indexing options with DataFrame
    - `df[val]`
    - `df.loc[val]`
    - `df.loc[:, val]`
    - `df.iloc[where]`
    - `df.iloc[:, where]`
    - `df.iloc[where_i, where_j]`
    - `df.at[lable_i, lable_j]`
    - `df.iat[i, j]`
    - `reindex()`
    - `get_value(), set_value()`
    

### Integer Indexes
To keep things consistent, if you have an axis index containing integers, data selection
will always be label-oriented.  
For more precise handling, use loc (for labels) or iloc (for integers).

## Exercist

### Integer Indexes

In [151]:
ser = pd.Series(np.arange(3.), index=['a', 'b', 'c']) ; ser

a    0.0
b    1.0
c    2.0
dtype: float64

In [152]:
ser[-1]

2.0

### Reindexing

In [132]:
frame = pd.DataFrame(np.arange(9).reshape((3,3)),
                     index=['a', 'c', 'd'],
                     columns=['Ohio', 'Texas', 'California'])

In [133]:
frame.reindex(index = ['a', 'b', 'c', 'd'], columns = ['Texas', 'Utah', 'California'])

Unnamed: 0,Texas,Utah,California
a,1.0,,2.0
b,,,
c,4.0,,5.0
d,7.0,,8.0


In [134]:
# many users perfer to use it exclusively:
frame.loc[['a', 'b', 'c', 'd'], ['Texas', 'Utah', 'California']]

Unnamed: 0,Texas,Utah,California
a,1.0,,2.0
b,,,
c,4.0,,5.0
d,7.0,,8.0


### Dropping Entries from an Axis

In [135]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns = ['one', 'two', 'three', 'four'])

In [136]:
obj.drop('b', axis='index')

d    4.5
a   -5.3
c    3.6
dtype: float64

In [137]:
data.drop(['Colorado', 'Ohio'])

Unnamed: 0,one,two,three,four
Utah,8,9,10,11
New York,12,13,14,15


In [138]:
data.drop(['one', 'two'], axis='columns')

Unnamed: 0,three,four
Ohio,2,3
Colorado,6,7
Utah,10,11
New York,14,15


### Indexing, Selection, and Filtering

In [139]:
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
obj[[1, 3]]

b    1.0
d    3.0
dtype: float64

In [140]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                    index=['Ohio', 'Colorado', 'Utah', 'New York'],
                    columns = ['one', 'two', 'three', 'four'])

In [141]:
data.loc[data['three'] > 5, 'three'] = 5 ; data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,5,7
Utah,8,9,5,11
New York,12,13,5,15


In [142]:
data.loc['Colorado', ['two', 'three']]

two      5
three    5
Name: Colorado, dtype: int32

In [143]:
data.iloc[[1, 2], [3, 0, 1]]

Unnamed: 0,four,one,two
Colorado,7,4,5
Utah,11,8,9


In [144]:
data.loc[:'Utah', 'two']

Ohio        1
Colorado    5
Utah        9
Name: two, dtype: int32

In [145]:
data.iloc[:, :3][data.two > 5]

Unnamed: 0,one,two,three
Utah,8,9,5
New York,12,13,5
