# How do I select multiple rows and columns from a pandas DataFrame ?

In [1]:
import pandas as pd

In [2]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [3]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


Using head() is perhaps very straight-forward, but loc is also pretty handy. It is very flexible, so we will try that out.

## loc is a dataframe method

loc is a dataframe method, so we can do something like this

In [4]:
ufo.loc[0, :]

City                       Ithaca
Colors Reported               NaN
Shape Reported           TRIANGLE
State                          NY
Time               6/1/1930 22:00
Name: 0, dtype: object

loc is for filtering rows and selecting columns by label!! By label, i mean, for rows i mean index and for columns i mean the column names.

**`loc` is for selecting things by label**

One more thing is that, with loc method, you don't use parenthesis, but you use a bracket.

In the above output, when we ran this command `ufo.loc[0,:]` pandas is returning the first row as a Series.

Suppose I wanted first 3 rows

In [5]:
ufo.loc[[0,1,2], :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


In [6]:
ufo.loc[0:2, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


In [9]:
ufo.loc[0:2, ['City', 'State']]

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO


## Using loc with boolean conditions

If I want to select all rows with city equals 'Oakland', you would do this

In [10]:
ufo.loc[ufo.City == "Ithaca", :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
4068,Ithaca,,CIGAR,NY,6/1/1979 19:00
5631,Ithaca,,OTHER,MI,6/1/1987 17:00
6961,Ithaca,,OTHER,NY,1/10/1993 0:30
7573,Ithaca,RED GREEN,LIGHT,NY,10/15/1994 18:00
9088,Ithaca,,,NY,2/16/1996 21:45
16537,Ithaca,,FLASH,MI,6/3/2000 22:35
17049,Ithaca,,TEARDROP,NY,7/30/2000 20:20


## iloc is for integer position that's what the i stands for.

I don't like to use iloc unless i have to. Period.

## ix allows you to mix integers and labels

When doing selection, ix allows you to mix integers and labels.

In [11]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry', index_col='country')

In [12]:
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,Asia
Albania,89,132,54,4.9,Europe
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,217,57,45,5.9,Africa


Let's say you want to select 89

In [13]:
drinks.ix['Albania', 0]

89

In this case, you are using Labels to refer to the rows and integer to refer to the column. This may sometimes be useful.

In [14]:
drinks.ix[1, 'beer_servings']

89

Basically, ix figures out if you are referring to a position or a label and gives you that flexibility to retrieve data from the dataframe.

In [15]:
drinks.ix['Albania':'Angola', 0:2]

Unnamed: 0_level_0,beer_servings,spirit_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Albania,89,132
Algeria,25,0
Andorra,245,138
Angola,217,57


Summary: Don't use iloc or ix unless you have to