# How do I select multiple rows and columns from a pandas DataFrame?


In [2]:
import pandas as pd

In [5]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [6]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


.head() or .tail() methods provides the first few and last few rows. However loc has more functionality. 

In [7]:
ufo.loc[0,:]

City                       Ithaca
Colors Reported               NaN
Shape Reported           TRIANGLE
State                          NY
Time               6/1/1930 22:00
Name: 0, dtype: object

loc is a dataframe method for filtering rows and selecting columns by lables. Lables are Rows by index and columns by column names. 

Format for loc is 1. Rows,2. Columns in the above examle 0 defines 1st row and : defines all columns 

In [8]:
ufo.loc[[0,1,2],:]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


For selecting multiple rows a list can be passed , as in above example three rows are selected and all columns. 
However mentioning every row number is not requried and can be done in more concise way. Below is the example.

In [9]:
ufo.loc[0:2,:]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


While using 0:2 0 and 2 both are inclusive 

In [10]:
ufo.loc[:,'City'].head()

0                  Ithaca
1             Willingboro
2                 Holyoke
3                 Abilene
4    New York Worlds Fair
Name: City, dtype: object

Above code is jut opposite where all the rows of City column is extracted. 

In [11]:
ufo.loc[:,['City','State']].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In this scenario all the rows for City and State column is taken. Also similar to above piece range can be selected.

Below is the example.

In [12]:
ufo.loc[0:2,'City':'State'].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO


Similar thing can be acheived in a different process

In [13]:
ufo.head(3).drop('Time',axis=1)

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO


##### Using loc with boolean condition

In [14]:
ufo[ufo.City=='Holyoke']

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
2,Holyoke,,OVAL,CO,2/15/1931 14:00
771,Holyoke,,DISK,MA,1/20/1963 22:00
3221,Holyoke,,DISK,MA,6/5/1976 23:00
7982,Holyoke,,LIGHT,MA,4/1/1995 18:00
18211,Holyoke,,DIAMOND,MA,12/28/2000 18:00


Previously , above method was used for filtering the DataFrame, however similar result can achived using  loc 

In [15]:
ufo.loc[ufo.City=='Holyoke',:]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
2,Holyoke,,OVAL,CO,2/15/1931 14:00
771,Holyoke,,DISK,MA,1/20/1963 22:00
3221,Holyoke,,DISK,MA,6/5/1976 23:00
7982,Holyoke,,LIGHT,MA,4/1/1995 18:00
18211,Holyoke,,DIAMOND,MA,12/28/2000 18:00


using loc method is more flexible as in the above example rows are filtered where city = 'Holyoke' and this can be further customized if we want to just select state column.Below is the sample.

In [20]:
ufo.loc[ufo.City=='Holyoke','State']

2        CO
771      MA
3221     MA
7982     MA
18211    MA
Name: State, dtype: object

Agree, similar could have been achieved using the previous method, below example.

In [22]:
ufo[ufo.City=='Holyoke'].State

2        CO
771      MA
3221     MA
7982     MA
18211    MA
Name: State, dtype: object

nevertheless above method delivers same result ,but the process of code execution is different. .state method process is known as chained indexing , which causes issues in certain scenarios.

in case a boolean selection of rows is done and then column is selected. .loc method take one internal operation where above method requires two.

#### iloc - iloc is for filtering rows and selecting columns by integer  position.

In [24]:
ufo.iloc[:,[0,3]].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


similar to loc method iloc follows the same procedure of rows and then column.iloc also has option to include range.Example below.

In [25]:
ufo.iloc[:,0:4].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY


Above output shows that number of columns are from range 0 to 3 which states that in a range last number is exclusive and 1st number is inclusive. WHERE AS WITH .LOC ITS INCLUSIVE OF LAST NUMBER.

In [26]:
ufo.iloc[0:4,:].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00


Same can be noticed where rows can be selected by range.

#### Few major differences between .loc and .iloc

1. Loc for labeles , iloc for integer position.
2. loc when ranges are nclusive on both the start and stop.iloc ranges are exluded at stop.

### PTR

In [30]:
ufo[['City','State']].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


Above is one way to select columns from a DataFrame. Inner brackets is list of string. passing list of strings to the outer brackets pandas select columns.Other way to do this using .loc 

In [32]:
ufo.loc[:,['City','State']].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [33]:
ufo[0:2]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00


Above code by default refers to two rows and all columns,but to avoid confusion iloc can be used. That specifies total number of rows and columns.Below is the example where iloc is used to specify two rows and all columns from the data frame.

In [34]:
ufo.iloc[0:2,:]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00


### ix

It allows to mix lables and integers when doing selection.Its a blend between loc and iloc

In [38]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry',index_col='country')

In [40]:
drinks.head(3)

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,Asia
Albania,89,132,54,4.9,Europe
Algeria,25,0,14,0.7,Africa


In [41]:
drinks.ix['Albania',0]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


89