# Pandas
### python data analysis library

# Series and DataFrames
- a pandas Series can be viewed as a vertical array with labeled rows. We call the label row the index.
- a pandas DataFrame can be viewed as a collection of Series, all with the same index

In [2]:
import numpy as np
import pandas as pd

pd.Series(range(3))

0    0
1    1
2    2
dtype: int32

In [3]:
pd.Series(range(3), index=list('abc'))

a    0
b    1
c    2
dtype: int32

In [4]:
pd.DataFrame(np.arange(6).reshape(2,3), index = [2014, 2015], columns=list("PQR"))

Unnamed: 0,P,Q,R
2014,0,1,2
2015,3,4,5


# Columns first
In contrast to plain python and Numpy, columns come first in pandas DataFrames.

Once you understand this, things start to fall into place!

In [5]:
import numpy as np
import pandas as pd

mx = np.arange(6).reshape(2,3)
df = pd.DataFrame(mx)

print(df)
print('-' * 12)
print(mx[0])
print('-' * 12)
print(df[0])
print('-' * 12)
print(np.all(mx[0] == [0]))    # will work, not recommended
print('-' * 12)

df.index, df.columns = [2014, 2015], 'PSV Feyenoord Ajax'.split(' ')

print(df['PSV'])
print('-' * 12)
np.all(mx[0] == df.iloc[0])

   0  1  2
0  0  1  2
1  3  4  5
------------
[0 1 2]
------------
0    0
1    3
Name: 0, dtype: int32
------------
False
------------
2014    0
2015    3
Name: PSV, dtype: int32
------------


True

# Addressing
Addressing (indexing) DataFrames can give you a real headache. So many possibilities. Remember:
1. columns first in DataFrames
2. getting a single column returns a Series
3. getting more than 1 column returns a DataFrame
3. getting a single or many rows return a DataFrame
4. free yourself from headaches and try to stick to  iloc(), loc() (or ix()) to have NumPy-like addressing. And you may want to read [this thorough explanation](http://goo.gl/w6dajh) as well.

Below is our reference DataFrame for the examples to follow.

In [6]:
df = pd.DataFrame(np.arange(12).reshape(3,4), index=range(2014, 2017), columns=list("PQRS"))
df

Unnamed: 0,P,Q,R,S
2014,0,1,2,3
2015,4,5,6,7
2016,8,9,10,11


### Get the first column of a DataFrame
This returns a Series with the DataFrame index as index.

In [7]:
df['P']

2014    0
2015    4
2016    8
Name: P, dtype: int32

In [8]:
df.P

2014    0
2015    4
2016    8
Name: P, dtype: int32

### Get the first two columns of a DataFrame
This returns a DataFrame.

In [9]:
df[['P', 'Q']]

Unnamed: 0,P,Q
2014,0,1
2015,4,5
2016,8,9


In [10]:
df.ix[:,0:2]

Unnamed: 0,P,Q
2014,0,1
2015,4,5
2016,8,9


### Get the second row of a DataFrame
This returns a Series with the column names as index.
Use loc() if your index has textual labels.

In [11]:
df.iloc[1]

P    4
Q    5
R    6
S    7
Name: 2015, dtype: int32

### Get the second and third row of a DataFrame
This returns a DataFrame.

In [12]:
df.iloc[1:3]

Unnamed: 0,P,Q,R,S
2015,4,5,6,7
2016,8,9,10,11


### Get the DataFrame segment comprised of first two columns and first two rows
This returns a DataFrame

In [13]:
df.iloc[:2,:2]

Unnamed: 0,P,Q
2014,0,1
2015,4,5


In [14]:
df.iloc[:2].loc[:,'P':'Q']

Unnamed: 0,P,Q
2014,0,1
2015,4,5


In [15]:
df.loc[2014:2015, 'P':'Q']

Unnamed: 0,P,Q
2014,0,1
2015,4,5


So, keep in mind: if you slice using non position labels, slicing is _up-to-and-including_!

# Word of advice (... or wisdom, I dare say)
This is only a really, really brief introduction to pandas (and NumPy alike). Even after having made all the exercises -- which is gonna take you time! -- you're still a pandas noob. An educated noob, that is. And the adjective makes all the difference: once you've laid the foundation, however small, you've set yourself to the path of ... helping yourself.

Having struggled through the exercises (as your instructors did), looking up things in the manuals, develops a basic feel for what the package has to offer and develops your expectation for what other functionality might be available. In short: you start developing an **intuition** for what pandas (and python, NumPy, sklearn, etc.) has to offer.

It's exactly like ... learning.