**Except** in certain cases, like a database table where the columns are clearly defined, **the first row in your data will be used as the column headers.**

Therefore if your data starts from the first line and you don't actually have a header row, ensure you pass in the **names** parameter (a list of column header names) when you call the .read_*() method. 

Pandas will use the provided headers in place of your first data entry.

If you do have column titles already defined in your dataset but wish to rename them, in that case, use the **.columns** property:

my_dataframe.columns = ['new', 'column', 'header', 'labels']

## Indexing

(Uses Module 2 files)

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv(r"C:\Users\aaaaaaaa\Desktop\edX ML\Module 2\Module2\Datasets\tutorial.csv")

In [4]:
df

Unnamed: 0,col0,col1,col2,col3
0,-0.722876,-1.330682,1.309208,0.232378
1,1.160396,-0.730879,0.677368,1.044722
2,-1.06287,-0.503704,-0.238536,-1.417937
3,0.437078,0.36264,-0.111228,-1.649853


In [5]:
df.col0

0   -0.722876
1    1.160396
2   -1.062870
3    0.437078
Name: col0, dtype: float64

In [6]:
df["col0"]

0   -0.722876
1    1.160396
2   -1.062870
3    0.437078
Name: col0, dtype: float64

#### The difference between df['col0'] and df[['col0']] is that you can pass in additional comma separated column names with the latter to do multivariate selection.

In [7]:
df[["col0"]]

Unnamed: 0,col0
0,-0.722876
1,1.160396
2,-1.06287
3,0.437078


#### The .loc[ ] selector is used to select by string index label

In [8]:
df.loc[:,"col0"]

0   -0.722876
1    1.160396
2   -1.062870
3    0.437078
Name: col0, dtype: float64

In [9]:
df.loc[:,["col0"]]

Unnamed: 0,col0
0,-0.722876
1,1.160396
2,-1.06287
3,0.437078


#### .iloc[ ] to select by integer index position

In [10]:
df.iloc[:,0]

0   -0.722876
1    1.160396
2   -1.062870
3    0.437078
Name: col0, dtype: float64

In [11]:
df.iloc[:,[0]]

Unnamed: 0,col0
0,-0.722876
1,1.160396
2,-1.06287
3,0.437078


#### .ix[ ] is used whenever you want to use a hybrid approach of either.

In [12]:
df.ix[:,0]

0   -0.722876
1    1.160396
2   -1.062870
3    0.437078
Name: col0, dtype: float64

In [13]:
df.ix[:,[0]]

Unnamed: 0,col0
0,-0.722876
1,1.160396
2,-1.06287
3,0.437078


In [15]:
df[0:1]

Unnamed: 0,col0,col1,col2,col3
0,-0.722876,-1.330682,1.309208,0.232378


In [16]:
df.iloc[0:1,:]

Unnamed: 0,col0,col1,col2,col3
0,-0.722876,-1.330682,1.309208,0.232378


### Boolean Indexing

In [17]:
df.col0 < 0 

0     True
1    False
2     True
3    False
Name: col0, dtype: bool

In [19]:
df[df.col0 < 0]

Unnamed: 0,col0,col1,col2,col3
0,-0.722876,-1.330682,1.309208,0.232378
2,-1.06287,-0.503704,-0.238536,-1.417937


#### Finer boolean indexing

In [20]:
df[(df.col0<0)|(df.col1<0)]

Unnamed: 0,col0,col1,col2,col3
0,-0.722876,-1.330682,1.309208,0.232378
1,1.160396,-0.730879,0.677368,1.044722
2,-1.06287,-0.503704,-0.238536,-1.417937


#### Writing to a Slice

**Take precaution while doing this, as you may encounter issues with non-homogeneous dataframes. It is far safer, and generally makes more sense, to do this sort of operation on a per column basis rather than across your entire dataframe.**

In [21]:
df[df<0]=100

In [22]:
df

Unnamed: 0,col0,col1,col2,col3
0,100.0,100.0,1.309208,0.232378
1,1.160396,100.0,0.677368,1.044722
2,100.0,100.0,100.0,100.0
3,0.437078,0.36264,100.0,100.0
