## Indexing and Selection Series and DataFrames
Pandas User Guide

https://pandas.pydata.org/docs/user_guide/dsintro.html

https://pandas.pydata.org/docs/user_guide/indexing.html#indexing

| Operation                      | Syntax          | Result    |
|--------------------------------|-----------------|-----------|
| Select column                  | df[col]         | Series    |
| Select row by label            | df.loc[label]   | Series    |
| Select row by integer location | df.iloc[loc]    | Series    |
| Slice rows                     | df[5:10]        | DataFrame |
| Select rows by boolean vector  | df[bool_vector] | DataFrame |

In [9]:
import numpy as np
import pandas as pd

In [10]:
d = {
   ....:     "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
   ....:     "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
   ....: }

In [11]:
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [12]:
df.loc["b"]

one    2.0
two    2.0
Name: b, dtype: float64

In [13]:
df.iloc[2]

one    3.0
two    3.0
Name: c, dtype: float64

### Basics

| Object Type | Indexers                        |
|-------------|---------------------------------|
| Series      | s.loc[indexer]                      |
| DataFrame   | df.loc[row_indexer, column_indexer] |

| Object Type | Selection      | Return Value Type               |
|-------------|----------------|---------------------------------|
| Series      | series[label]  | Scalar value                    |
| DataFrame   | frame[colname] | Series corresponding to colname |

In [17]:
dates = pd.date_range('1/1/2000', periods=8)
dates

DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08'],
              dtype='datetime64[ns]', freq='D')

In [16]:
df = pd.DataFrame(np.random.randn(8, 4),
   ...:                   index=dates, columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
2000-01-01,-1.326514,-0.064915,0.150867,0.733479
2000-01-02,-1.154847,0.272794,-1.074502,2.591011
2000-01-03,0.743008,-0.607824,-1.66604,1.449187
2000-01-04,-0.901412,-0.507857,1.046743,1.081515
2000-01-05,-1.412363,-1.512432,-0.67748,-0.271114
2000-01-06,-1.076963,2.445488,-1.469793,1.234474
2000-01-07,0.39009,-1.377107,0.724461,-0.260275
2000-01-08,-0.908638,1.797682,1.395009,-1.469394


In [18]:
s = df['A']

In [19]:
s[dates[5]]

-1.0769629343352927

In [20]:
df

Unnamed: 0,A,B,C,D
2000-01-01,-1.326514,-0.064915,0.150867,0.733479
2000-01-02,-1.154847,0.272794,-1.074502,2.591011
2000-01-03,0.743008,-0.607824,-1.66604,1.449187
2000-01-04,-0.901412,-0.507857,1.046743,1.081515
2000-01-05,-1.412363,-1.512432,-0.67748,-0.271114
2000-01-06,-1.076963,2.445488,-1.469793,1.234474
2000-01-07,0.39009,-1.377107,0.724461,-0.260275
2000-01-08,-0.908638,1.797682,1.395009,-1.469394


In [21]:
df[['B', 'A']] = df[['A', 'B']]
df

Unnamed: 0,A,B,C,D
2000-01-01,-0.064915,-1.326514,0.150867,0.733479
2000-01-02,0.272794,-1.154847,-1.074502,2.591011
2000-01-03,-0.607824,0.743008,-1.66604,1.449187
2000-01-04,-0.507857,-0.901412,1.046743,1.081515
2000-01-05,-1.512432,-1.412363,-0.67748,-0.271114
2000-01-06,2.445488,-1.076963,-1.469793,1.234474
2000-01-07,-1.377107,0.39009,0.724461,-0.260275
2000-01-08,1.797682,-0.908638,1.395009,-1.469394


pandas aligns all AXES when setting Series and DataFrame from .loc, and .iloc. 
This will not modify df because the column alignment is before value assignment.

In [22]:
df[['A','B']]

Unnamed: 0,A,B
2000-01-01,-0.064915,-1.326514
2000-01-02,0.272794,-1.154847
2000-01-03,-0.607824,0.743008
2000-01-04,-0.507857,-0.901412
2000-01-05,-1.512432,-1.412363
2000-01-06,2.445488,-1.076963
2000-01-07,-1.377107,0.39009
2000-01-08,1.797682,-0.908638


In [24]:
df.loc[:, ['B', 'A']] = df[['A', 'B']]

In [25]:
df[['A','B']]

Unnamed: 0,A,B
2000-01-01,-0.064915,-1.326514
2000-01-02,0.272794,-1.154847
2000-01-03,-0.607824,0.743008
2000-01-04,-0.507857,-0.901412
2000-01-05,-1.512432,-1.412363
2000-01-06,2.445488,-1.076963
2000-01-07,-1.377107,0.39009
2000-01-08,1.797682,-0.908638


The correct way to swap column values is by using raw values:

In [26]:
df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()

In [27]:
df[['A','B']]

Unnamed: 0,A,B
2000-01-01,-1.326514,-0.064915
2000-01-02,-1.154847,0.272794
2000-01-03,0.743008,-0.607824
2000-01-04,-0.901412,-0.507857
2000-01-05,-1.412363,-1.512432
2000-01-06,-1.076963,2.445488
2000-01-07,0.39009,-1.377107
2000-01-08,-0.908638,1.797682


### Attribute access

In [30]:
sa = pd.Series([1, 2, 3], index=list('abc'))
dfa = df.copy()

In [32]:
sa.b

2

In [33]:
dfa.A

2000-01-01   -1.326514
2000-01-02   -1.154847
2000-01-03    0.743008
2000-01-04   -0.901412
2000-01-05   -1.412363
2000-01-06   -1.076963
2000-01-07    0.390090
2000-01-08   -0.908638
Freq: D, Name: A, dtype: float64

In [35]:
sa.a = 5
sa

a    5
b    2
c    3
dtype: int64

In [36]:
dfa.A = list(range(len(dfa.index)))  # ok if A already exists
dfa

Unnamed: 0,A,B,C,D
2000-01-01,0,-0.064915,0.150867,0.733479
2000-01-02,1,0.272794,-1.074502,2.591011
2000-01-03,2,-0.607824,-1.66604,1.449187
2000-01-04,3,-0.507857,1.046743,1.081515
2000-01-05,4,-1.512432,-0.67748,-0.271114
2000-01-06,5,2.445488,-1.469793,1.234474
2000-01-07,6,-1.377107,0.724461,-0.260275
2000-01-08,7,1.797682,1.395009,-1.469394


In [37]:
dfa['E'] = list(range(len(dfa.index)))  # use this form to create a new column
dfa

Unnamed: 0,A,B,C,D,E
2000-01-01,0,-0.064915,0.150867,0.733479,0
2000-01-02,1,0.272794,-1.074502,2.591011,1
2000-01-03,2,-0.607824,-1.66604,1.449187,2
2000-01-04,3,-0.507857,1.046743,1.081515,3
2000-01-05,4,-1.512432,-0.67748,-0.271114,4
2000-01-06,5,2.445488,-1.469793,1.234474,5
2000-01-07,6,-1.377107,0.724461,-0.260275,6
2000-01-08,7,1.797682,1.395009,-1.469394,7


In [38]:
x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})
x

Unnamed: 0,x,y
0,1,3
1,2,4
2,3,5


In [39]:
x.iloc[1] = {'x': 9, 'y': 99}
x

Unnamed: 0,x,y
0,1,3
1,9,99
2,3,5


In [40]:
df = pd.DataFrame({'one': [1., 2., 3.]})
df.two = [4, 5, 6]

  df.two = [4, 5, 6]


In [41]:
df['two'] = [4.,5.,6.]
df

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,5.0
2,3.0,6.0


### Slicing ranges

In [42]:
s[:5]

2000-01-01   -1.326514
2000-01-02   -1.154847
2000-01-03    0.743008
2000-01-04   -0.901412
2000-01-05   -1.412363
Freq: D, Name: A, dtype: float64

In [43]:
s[::2]

2000-01-01   -1.326514
2000-01-03    0.743008
2000-01-05   -1.412363
2000-01-07    0.390090
Freq: 2D, Name: A, dtype: float64

In [44]:
s[::-1]

2000-01-08   -0.908638
2000-01-07    0.390090
2000-01-06   -1.076963
2000-01-05   -1.412363
2000-01-04   -0.901412
2000-01-03    0.743008
2000-01-02   -1.154847
2000-01-01   -1.326514
Freq: -1D, Name: A, dtype: float64