# Data Selection 

The data selection methods for Pandas are very flexible.

### Selecting Columns

methods of selecting columns in pandas:

 1. Using a dot notation. 
 
 **`Ex: data.column_name`** (Not recommended)
 
 
 2. Using a square brackets and name of the column as a string. 
 
 **`Ex: data['column_name']`**
 
 
 3. Using numeric `indexing` and the `iloc` selector. 
 
 **`Ex: data.iloc[ : , column_Index_nmber ]`**
 
 
 4. square-brace selection with a list of column names,for selection of multiple columns 
 
 **`Ex : data[['column_name1','column_name2',..]]`**
      
     
 5. using numeric indexing with the iloc selector and a list of column numbers
 
  **`Ex : data.iloc[ : ,[col_Ind_nmber2,col_Ind_nmber1,col_Ind_nmber3,..]]`**

In [1]:
import numpy as np
import pandas as pd

### Import tennis data file

In [2]:
data = pd.read_csv('tennis.csv')

In [3]:
data

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [6]:
data.columns

Index(['outlook', 'temp', 'humidity', 'windy', 'play'], dtype='object')

In [5]:
data.index

RangeIndex(start=0, stop=14, step=1)

In [4]:
data['outlook'] # Select / View only "outlook" column

0        sunny
1        sunny
2     overcast
3        rainy
4        rainy
5        rainy
6     overcast
7        sunny
8        sunny
9        rainy
10       sunny
11    overcast
12    overcast
13       rainy
Name: outlook, dtype: object

In [32]:
data.outlook # select / view only "outlook" column using dot operator

0        sunny
1        sunny
2     overcast
3        rainy
4        rainy
5        rainy
6     overcast
7        sunny
8        sunny
9        rainy
10       sunny
11    overcast
12    overcast
13       rainy
Name: outlook, dtype: object

In [33]:
data.iloc[:,0]# Select / view all rows of "outlook" column based on index number

0        sunny
1        sunny
2     overcast
3        rainy
4        rainy
5        rainy
6     overcast
7        sunny
8        sunny
9        rainy
10       sunny
11    overcast
12    overcast
13       rainy
Name: outlook, dtype: object

In [34]:
data.iloc[:,0:3] # all rows of 3 columns

Unnamed: 0,outlook,temp,humidity
0,sunny,hot,high
1,sunny,hot,high
2,overcast,hot,high
3,rainy,mild,high
4,rainy,cool,normal
5,rainy,cool,normal
6,overcast,cool,normal
7,sunny,mild,high
8,sunny,cool,normal
9,rainy,mild,normal


In [35]:
data[['outlook','temp','humidity']]

Unnamed: 0,outlook,temp,humidity
0,sunny,hot,high
1,sunny,hot,high
2,overcast,hot,high
3,rainy,mild,high
4,rainy,cool,normal
5,rainy,cool,normal
6,overcast,cool,normal
7,sunny,mild,high
8,sunny,cool,normal
9,rainy,mild,normal


In [36]:
data.iloc[:,[2,3,4]]

Unnamed: 0,humidity,windy,play
0,high,False,no
1,high,True,no
2,high,False,yes
3,high,False,yes
4,normal,False,yes
5,normal,True,no
6,normal,True,yes
7,high,False,no
8,normal,False,yes
9,normal,False,yes


### Selecting rows

Rows in a DataFrame are selected, typically, using the `iloc/loc` selection methods, or using logical selectors (selecting based on the value of another column or variable).

The basic methods to get your heads around are:

 1. numeric row selection using the iloc selector.
 
   **`Ex : data.iloc[0:10, :] # select the first 10 rows of all columns`**
   
   
 2. label-based row selection using the loc selector (this is only applicably if you have set an “index” on your dataframe.
 
   **`Ex : data.loc[10,: ]`**
   
   
 3. logical-based row selection using evaluated statements.
 
   **`Ex : data[data['outlook'] == 'sunny'] # select the rows where `outlook` value is `sunny``**.

In [42]:
data.iloc[ 0:10, : ]

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [43]:
data.iloc[0:10,1:4]

Unnamed: 0,temp,humidity,windy
0,hot,high,False
1,hot,high,True
2,hot,high,False
3,mild,high,False
4,cool,normal,False
5,cool,normal,True
6,cool,normal,True
7,mild,high,False
8,cool,normal,False
9,mild,normal,False


In [44]:
data.iloc[0:10:2,2:5]

Unnamed: 0,humidity,windy,play
0,high,False,no
2,high,False,yes
4,normal,False,yes
6,normal,True,yes
8,normal,False,yes


In [45]:
data.loc[2]

outlook     overcast
temp             hot
humidity        high
windy          False
play             yes
Name: 2, dtype: object

In [46]:
data.loc[9] # Select / View only 9th row

outlook      rainy
temp          mild
humidity    normal
windy        False
play           yes
Name: 9, dtype: object

In [47]:
data.loc[[2,3,9]] # select 2nd,3rd,9th row

Unnamed: 0,outlook,temp,humidity,windy,play
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
9,rainy,mild,normal,False,yes


In [48]:
data.loc[[2,3,9],['outlook','play']]

Unnamed: 0,outlook,play
2,overcast,yes
3,rainy,yes
9,rainy,yes


### data.iloc

Purely integer-location based indexing for selection by position.

``.iloc[]`` is primarily integer position based (from ``0`` to
``length-1`` of the axis), but may also be used with a boolean
array.

### data.loc

Access a group of rows and columns by label(s) or a boolean array.

``.loc[]`` is primarily label based, but may also be used with a
boolean array.

In [49]:
data

Unnamed: 0,outlook,temp,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [50]:
data.index = ['R1','R2','R3','R4','R5','R6','R7','R8','R9','R10','R11','R12','R13','R14']

In [51]:
data

Unnamed: 0,outlook,temp,humidity,windy,play
R1,sunny,hot,high,False,no
R2,sunny,hot,high,True,no
R3,overcast,hot,high,False,yes
R4,rainy,mild,high,False,yes
R5,rainy,cool,normal,False,yes
R6,rainy,cool,normal,True,no
R7,overcast,cool,normal,True,yes
R8,sunny,mild,high,False,no
R9,sunny,cool,normal,False,yes
R10,rainy,mild,normal,False,yes


In [52]:
data.iloc[0:10]

Unnamed: 0,outlook,temp,humidity,windy,play
R1,sunny,hot,high,False,no
R2,sunny,hot,high,True,no
R3,overcast,hot,high,False,yes
R4,rainy,mild,high,False,yes
R5,rainy,cool,normal,False,yes
R6,rainy,cool,normal,True,no
R7,overcast,cool,normal,True,yes
R8,sunny,mild,high,False,no
R9,sunny,cool,normal,False,yes
R10,rainy,mild,normal,False,yes


In [53]:
data.loc['R1':'R10']

Unnamed: 0,outlook,temp,humidity,windy,play
R1,sunny,hot,high,False,no
R2,sunny,hot,high,True,no
R3,overcast,hot,high,False,yes
R4,rainy,mild,high,False,yes
R5,rainy,cool,normal,False,yes
R6,rainy,cool,normal,True,no
R7,overcast,cool,normal,True,yes
R8,sunny,mild,high,False,no
R9,sunny,cool,normal,False,yes
R10,rainy,mild,normal,False,yes


In [54]:
data.iloc[0:10,0:2]

Unnamed: 0,outlook,temp
R1,sunny,hot
R2,sunny,hot
R3,overcast,hot
R4,rainy,mild
R5,rainy,cool
R6,rainy,cool
R7,overcast,cool
R8,sunny,mild
R9,sunny,cool
R10,rainy,mild


In [55]:
data.iloc[0:5,0:3]

Unnamed: 0,outlook,temp,humidity
R1,sunny,hot,high
R2,sunny,hot,high
R3,overcast,hot,high
R4,rainy,mild,high
R5,rainy,cool,normal


In [56]:
data.iloc[[2,1,3],[2,1]]

Unnamed: 0,humidity,temp
R3,high,hot
R2,high,hot
R4,high,mild


In [57]:
data.loc[['R3','R2','R4'],['humidity','temp']]

Unnamed: 0,humidity,temp
R3,high,hot
R2,high,hot
R4,high,mild
