[Pandas Data school](https://www.youtube.com/watch?v=yzIMircGU5I&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y)

In [1]:
import pandas as pd

## 1. How do i read a tabular data into pandas

In [2]:
# read_table can read directly from a url where the data file has been put.
# By default read_table assumes that data is tab seperated
# By default read_table assumes that the first row is a header row.
orders = pd.read_table('http://bit.ly/chiporders')

In [3]:
orders.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


Trying with another file having '|' as seperator

In [4]:
k = pd.read_table('http://bit.ly/movieusers', sep = '|')
k.head(10)

Unnamed: 0,1,24,M,technician,85711
0,2,53,F,other,94043
1,3,23,M,writer,32067
2,4,24,M,technician,43537
3,5,33,F,other,15213
4,6,42,M,executive,98101
5,7,57,M,administrator,91344
6,8,36,M,administrator,5201
7,9,29,M,student,1002
8,10,53,M,lawyer,90703
9,11,39,F,other,30329


Observe above that each of the feed are in the wrong column and that 1st row is interpreted as header row which is wrong.

So to tell pandas not to use 1st row as header row use `header = None`. Then it is replaced with default values which are integers and are the indices.

In [5]:
k = pd.read_table('http://bit.ly/movieusers', sep = '|', header = None)
k.head(10)

Unnamed: 0,0,1,2,3,4
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213
5,6,42,M,executive,98101
6,7,57,M,administrator,91344
7,8,36,M,administrator,5201
8,9,29,M,student,1002
9,10,53,M,lawyer,90703


Now to give column names to the dataframe. we can use 'names' parameter in the function to give column names to the dataframe.

In [6]:
user_cols = ['user_id', 'age', 'gender', 'occupation', 'zipcode']
movie_users = pd.read_table('http://bit.ly/movieusers', sep = '|', header = None, names = user_cols)
movie_users.head(10)

Unnamed: 0,user_id,age,gender,occupation,zipcode
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213
5,6,42,M,executive,98101
6,7,57,M,administrator,91344
7,8,36,M,administrator,5201
8,9,29,M,student,1002
9,10,53,M,lawyer,90703


### Tips using read_table()

If in a file contains some initial notes about the dataset and ned not be used for analysis we can skip them by using :

- 'skiprows :' - To eliminate notes at the beginning.
- 'skipfooter :' - To eliminate notes at the end.


##  2. How to select pandas series from dataframe?

In [7]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [8]:
type(ufo)

pandas.core.frame.DataFrame

In [9]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


#### METHOD 1 : Using bracket notation

In [10]:
type(ufo['City'])  # It is case-sensitive

pandas.core.series.Series

#### Method 2 : Dot notation
Pandas every time a series is added to df , its name automatically becomes an attribute of that df.

This method doesn't work in all circumstances.

In [11]:
type(ufo.City) # Also case sensitive

pandas.core.series.Series

How to select series `Colors Reported`?

Dot notation doesn't work when there is space between the name of series. So use bracket notation.

In [12]:
type(ufo['Colors Reported'])

pandas.core.series.Series

With dot notation if we use built-in attribute name eg : _shape_ or a series name with attribute name,  python will get confused and will use the in-built attribute

In [13]:
ufo.shape

(18241, 5)

#### To create a new series in a dataframe

Always use bracket notation to create a new series in dataframe.

For assignment always use bracket notation.

In [14]:
ufo['Location'] = ufo.City + ', ' + ufo.State

In [15]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time,Location
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00,"Ithaca, NY"
1,Willingboro,,OTHER,NJ,6/30/1930 20:00,"Willingboro, NJ"
2,Holyoke,,OVAL,CO,2/15/1931 14:00,"Holyoke, CO"
3,Abilene,,DISK,KS,6/1/1931 13:00,"Abilene, KS"
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00,"New York Worlds Fair, NY"


## 3.Why do some pandas commands end with parenthesis, and other commands don't?

In [16]:
movies = pd.read_csv('http://bit.ly/imdbratings')

In [17]:
movies.head()   # head ends with parenthesis

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


In [18]:
movies.describe()    # ends with parenthesis

Unnamed: 0,star_rating,duration
count,979.0,979.0
mean,7.889785,120.979571
std,0.336069,26.21801
min,7.4,64.0
25%,7.6,102.0
50%,7.8,117.0
75%,8.1,134.0
max,9.3,242.0


In [19]:
movies.shape       # does not end with parenthesis

(979, 6)

In [20]:
movies.dtypes      # tells data types of each column and does not end with parenthesis

star_rating       float64
title              object
content_rating     object
genre              object
duration            int64
actors_list        object
dtype: object

In [21]:
type(movies)    #

pandas.core.frame.DataFrame

movies is a df and as a df it has certain methods and attributes simply because it is a df. 
 
 - Methods    ----> have parenthesis ----> Eg : head and describe 
 - Attributes ----> Have no attributes ---> Eg :  shape and dtypes
 
 _Methods_ as action-oriented and _attributes_ as just like description eg: who u r?
 
 So if i was a python object `jeswin` then my methods/action will be done as : `jeswin.eat()`, `jeswin.talk()`.
 
 Some attributes might be : `jeswin.eat` , `jeswin.height`.
 
 Just like  __pd.read_csv()__ a  method may require additional optional arguments eg- for describe

In [22]:
type(movies.content_rating)

pandas.core.series.Series

In [23]:
 movies.describe(include=['object'])  # It will describe only columns with type object

Unnamed: 0,title,content_rating,genre,actors_list
count,979,976,979,979
unique,975,12,16,969
top,The Girl with the Dragon Tattoo,R,Drama,"[u'Daniel Radcliffe', u'Emma Watson', u'Rupert..."
freq,2,460,278,6


Click anywhere in parenthesis

Shift + Tab  ( 1 or 2 or 3 or 4)

## 4. How to rename columns in pandas df?

In [24]:
# To just look at the column names look at the columns attribute of a df
# It returns a list having the column names of a df
ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time',
       'Location'],
      dtype='object')

In [25]:
ufo.columns[1]

'Colors Reported'

### Method 1
- renaming using method rename()
- pass columns arguments as dictionary : key of dictionary is old_name and value is new_name
- and use `inplace =True` ----> this operation must occur inplace i.e., underlying df must be affected 

In [26]:
# 2 change two column names
ufo.rename(columns = {'Colors Reported' : 'Colors_Reported', 'Shape Reported' : 'Shape_Reported'}, inplace= True)
ufo.columns

Index(['City', 'Colors_Reported', 'Shape_Reported', 'State', 'Time',
       'Location'],
      dtype='object')

### Method 2

What if we can avoid all these typing in method 1? What if just specify the new_names and put them in order?

In [27]:
# Create a new list with all new names 
ufo_cols = ['city', 'colors reported', 'shape reported', 'state', 'time', 'location']

In [28]:
ufo.columns = ufo_cols   # this will overwrite the columns attribute with the 5 strings in the list
ufo.columns

Index(['city', 'colors reported', 'shape reported', 'state', 'time',
       'location'],
      dtype='object')

### Method 3

- Rename the columns while reading in the file.
- `names = ufo_cols` means to take column names from ufo_cols list
- `header =0` means that the row 0 of the df has the existing column names and when we pass it the new column names will overwrite the old column names. 

In [29]:
ufo_cols = ['city', 'colors reported', 'shape reported', 'state', 'time']
ufo = pd.read_csv('http://bit.ly/uforeports', names = ufo_cols, header = 0)
ufo.head()

Unnamed: 0,city,colors reported,shape reported,state,time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [30]:
# Now in case if we have many columns and we want to rename only those names having space in between and replace them with '_'
ufo.columns = ufo.columns.str.replace(' ', '_')
ufo.columns

Index(['city', 'colors_reported', 'shape_reported', 'state', 'time'], dtype='object')

## 5.How to remove columns from a pandas df?

In [31]:
ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


In [32]:
ufo.shape

(18241, 5)

- To delete `Colors Reported` column
- Use `drop()` function

In [33]:
ufo.drop('Colors Reported', axis = 1, inplace = True)

In [34]:
ufo.head()  # That column is gone

Unnamed: 0,City,Shape Reported,State,Time
0,Ithaca,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,OTHER,NJ,6/30/1930 20:00
2,Holyoke,OVAL,CO,2/15/1931 14:00
3,Abilene,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,LIGHT,NY,4/18/1933 19:00


- To remove `city` and `state` columns.
- Instead of passing string we pass list.

In [35]:
ufo.drop(['City', 'State'], axis = 1, inplace = True)
ufo.head()

Unnamed: 0,Shape Reported,Time
0,TRIANGLE,6/1/1930 22:00
1,OTHER,6/30/1930 20:00
2,OVAL,2/15/1931 14:00
3,DISK,6/1/1931 13:00
4,LIGHT,4/18/1933 19:00


## To remove rows from a df
- Just use `axis = 0` instead of `axis = 1`
- Just pass the index/label of the rows to be removed as a list.
- `axis = 0` is default

In [36]:
ufo.drop([0,1] , axis = 0, inplace = True)
ufo.head()

Unnamed: 0,Shape Reported,Time
2,OVAL,2/15/1931 14:00
3,DISK,6/1/1931 13:00
4,LIGHT,4/18/1933 19:00
5,DISK,9/15/1934 15:30
6,CIRCLE,6/15/1935 0:00


In [37]:
ufo.shape

(18239, 2)

## 6.How do i sort a pandas dataframe or series?

In [38]:
movies = pd.read_csv('http://bit.ly/imdbratings')
movies.head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


### To sort series

- To sort `title series`
- Use function `sort_values()` sorts in ascending order by default. Can change to descending using `ascending = False`
- `sort_values()` doesn't affect underlying data

In [39]:
movies['title'].sort_values()

542                   (500) Days of Summer
5                             12 Angry Men
201                       12 Years a Slave
698                              127 Hours
110                  2001: A Space Odyssey
910                                   2046
596                               21 Grams
624                              25th Hour
708                       28 Days Later...
60                                3 Idiots
225                                 3-Iron
570                                    300
555                           3:10 to Yuma
427           4 Months, 3 Weeks and 2 Days
824                                     42
597                                  50/50
203                                  8 1/2
170                       A Beautiful Mind
941                       A Bridge Too Far
571                           A Bronx Tale
266                      A Christmas Story
86                      A Clockwork Orange
716                         A Few Good Men
750        

In [40]:
movies['title'].sort_values(ascending = False)[0:10]

864                 [Rec]
526                  Zulu
615            Zombieland
677                Zodiac
955      Zero Dark Thirty
535                 Zelig
280    Young Frankenstein
96                Yojimbo
235               Yip Man
403            Ying xiong
Name: title, dtype: object

### To sort a df by series ?

In [41]:
movies.sort_values('title').head(10)  # Title in same exact order when series was sorted but now we can see the entire df.
                                    # And for each corresponding title value we have its related data.

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
542,7.8,(500) Days of Summer,PG-13,Comedy,95,"[u'Zooey Deschanel', u'Joseph Gordon-Levitt', ..."
5,8.9,12 Angry Men,NOT RATED,Drama,96,"[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals..."
201,8.1,12 Years a Slave,R,Biography,134,"[u'Chiwetel Ejiofor', u'Michael Kenneth Willia..."
698,7.6,127 Hours,R,Adventure,94,"[u'James Franco', u'Amber Tamblyn', u'Kate Mara']"
110,8.3,2001: A Space Odyssey,G,Mystery,160,"[u'Keir Dullea', u'Gary Lockwood', u'William S..."
910,7.5,2046,R,Drama,129,"[u'Tony Chiu Wai Leung', u'Ziyi Zhang', u'Faye..."
596,7.7,21 Grams,R,Crime,124,"[u'Sean Penn', u'Benicio Del Toro', u'Naomi Wa..."
624,7.7,25th Hour,R,Crime,135,"[u'Edward Norton', u'Barry Pepper', u'Philip S..."
708,7.6,28 Days Later...,R,Horror,113,"[u'Cillian Murphy', u'Naomie Harris', u'Christ..."
60,8.5,3 Idiots,PG-13,Comedy,170,"[u'Aamir Khan', u'Madhavan', u'Mona Singh']"


In [42]:
movies.sort_values('duration', ascending=False).head(10)

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
476,7.8,Hamlet,PG-13,Drama,242,"[u'Kenneth Branagh', u'Julie Christie', u'Dere..."
157,8.2,Gone with the Wind,G,Drama,238,"[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit..."
78,8.4,Once Upon a Time in America,R,Crime,229,"[u'Robert De Niro', u'James Woods', u'Elizabet..."
142,8.3,Lagaan: Once Upon a Time in India,PG,Adventure,224,"[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell..."
445,7.9,The Ten Commandments,APPROVED,Adventure,220,"[u'Charlton Heston', u'Yul Brynner', u'Anne Ba..."
85,8.4,Lawrence of Arabia,PG,Adventure,216,"[u""Peter O'Toole"", u'Alec Guinness', u'Anthony..."
204,8.1,Ben-Hur,G,Adventure,212,"[u'Charlton Heston', u'Jack Hawkins', u'Stephe..."
17,8.7,Seven Samurai,UNRATED,Drama,207,"[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K..."
767,7.6,"It's a Mad, Mad, Mad, Mad World",APPROVED,Action,205,"[u'Spencer Tracy', u'Milton Berle', u'Ethel Me..."
630,7.7,Malcolm X,PG-13,Biography,202,"[u'Denzel Washington', u'Angela Bassett', u'De..."


### To sort a df by multiple columns
A list of strings which are the column names is passed

In [43]:
movies.sort_values(['content_rating', 'duration']).head(10)  #it will 1st sort by 'content_rating' and then by 'duration'

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
713,7.6,The Jungle Book,APPROVED,Animation,78,"[u'Phil Harris', u'Sebastian Cabot', u'Louis P..."
513,7.8,Invasion of the Body Snatchers,APPROVED,Horror,80,"[u'Kevin McCarthy', u'Dana Wynter', u'Larry Ga..."
272,8.1,The Killing,APPROVED,Crime,85,"[u'Sterling Hayden', u'Coleen Gray', u'Vince E..."
703,7.6,Dracula,APPROVED,Horror,85,"[u'Bela Lugosi', u'Helen Chandler', u'David Ma..."
612,7.7,A Hard Day's Night,APPROVED,Comedy,87,"[u'John Lennon', u'Paul McCartney', u'George H..."
58,8.5,Paths of Glory,APPROVED,Drama,88,"[u'Kirk Douglas', u'Ralph Meeker', u'Adolphe M..."
210,8.1,Laura,APPROVED,Film-Noir,88,"[u'Gene Tierney', u'Dana Andrews', u'Clifton W..."
656,7.7,Snow White and the Seven Dwarfs,APPROVED,Animation,88,"[u'Adriana Caselotti', u'Harry Stockwell', u'L..."
844,7.5,Pinocchio,APPROVED,Animation,88,"[u'Dickie Jones', u'Christian Rub', u'Mel Blanc']"
233,8.1,The Night of the Hunter,APPROVED,Crime,92,"[u'Robert Mitchum', u'Shelley Winters', u'Lill..."


##  7. How do I filter rows of a pandas DataFrame by column value?

In [44]:
movies = pd.read_csv('http://bit.ly/imdbratings')
movies.head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


In [45]:
movies.shape

(979, 6)

To filter the movies df so that we see all the columns but only those rows which have duration of 200.

### Method 1: Long way using for-loop

In [46]:
booleans = []
for length in movies['duration']:
    if length >= 200:
        booleans.append(True)
    else:
        booleans.append(False)

booleans[0:5]

[False, False, True, False, False]

In [47]:
len(booleans)

979

In [48]:
# Convert pandas list to series
is_long = pd.Series(booleans)
is_long.head()

0    False
1    False
2     True
3    False
4    False
dtype: bool

In [49]:
movies[is_long]  # pass is_long using square bracket notation
# Usually [] square bracket notation used to pull out a given column from df
# But here we passed out a series of True/False and it shows only rows corresponding to True

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
7,8.9,The Lord of the Rings: The Return of the King,PG-13,Adventure,201,"[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK..."
17,8.7,Seven Samurai,UNRATED,Drama,207,"[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K..."
78,8.4,Once Upon a Time in America,R,Crime,229,"[u'Robert De Niro', u'James Woods', u'Elizabet..."
85,8.4,Lawrence of Arabia,PG,Adventure,216,"[u""Peter O'Toole"", u'Alec Guinness', u'Anthony..."
142,8.3,Lagaan: Once Upon a Time in India,PG,Adventure,224,"[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell..."
157,8.2,Gone with the Wind,G,Drama,238,"[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit..."
204,8.1,Ben-Hur,G,Adventure,212,"[u'Charlton Heston', u'Jack Hawkins', u'Stephe..."
445,7.9,The Ten Commandments,APPROVED,Adventure,220,"[u'Charlton Heston', u'Yul Brynner', u'Anne Ba..."
476,7.8,Hamlet,PG-13,Drama,242,"[u'Kenneth Branagh', u'Julie Christie', u'Dere..."


### Method 2 : Without using for-loop

In [50]:
is_long2 = movies['duration'] >= 200
is_long2.head()

0    False
1    False
2     True
3    False
4    False
Name: duration, dtype: bool

In [51]:
movies[is_long2]

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
7,8.9,The Lord of the Rings: The Return of the King,PG-13,Adventure,201,"[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK..."
17,8.7,Seven Samurai,UNRATED,Drama,207,"[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K..."
78,8.4,Once Upon a Time in America,R,Crime,229,"[u'Robert De Niro', u'James Woods', u'Elizabet..."
85,8.4,Lawrence of Arabia,PG,Adventure,216,"[u""Peter O'Toole"", u'Alec Guinness', u'Anthony..."
142,8.3,Lagaan: Once Upon a Time in India,PG,Adventure,224,"[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell..."
157,8.2,Gone with the Wind,G,Drama,238,"[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit..."
204,8.1,Ben-Hur,G,Adventure,212,"[u'Charlton Heston', u'Jack Hawkins', u'Stephe..."
445,7.9,The Ten Commandments,APPROVED,Adventure,220,"[u'Charlton Heston', u'Yul Brynner', u'Anne Ba..."
476,7.8,Hamlet,PG-13,Drama,242,"[u'Kenneth Branagh', u'Julie Christie', u'Dere..."


### Method 3 : Without using for-loop and an intermediate variable
Use _movies df's series directly_

In [52]:
movies[movies['duration'] >= 200]

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
7,8.9,The Lord of the Rings: The Return of the King,PG-13,Adventure,201,"[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK..."
17,8.7,Seven Samurai,UNRATED,Drama,207,"[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K..."
78,8.4,Once Upon a Time in America,R,Crime,229,"[u'Robert De Niro', u'James Woods', u'Elizabet..."
85,8.4,Lawrence of Arabia,PG,Adventure,216,"[u""Peter O'Toole"", u'Alec Guinness', u'Anthony..."
142,8.3,Lagaan: Once Upon a Time in India,PG,Adventure,224,"[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell..."
157,8.2,Gone with the Wind,G,Drama,238,"[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit..."
204,8.1,Ben-Hur,G,Adventure,212,"[u'Charlton Heston', u'Jack Hawkins', u'Stephe..."
445,7.9,The Ten Commandments,APPROVED,Adventure,220,"[u'Charlton Heston', u'Yul Brynner', u'Anne Ba..."
476,7.8,Hamlet,PG-13,Drama,242,"[u'Kenneth Branagh', u'Julie Christie', u'Dere..."


Now instead of pulling out the whole dataframe we need only a specific column adhering to that condition.  

In [53]:
movies[movies['duration'] >= 200]['genre']
# or
# movies[movies['duration'] >= 200].genre

2          Crime
7      Adventure
17         Drama
78         Crime
85     Adventure
142    Adventure
157        Drama
204    Adventure
445    Adventure
476        Drama
630    Biography
767       Action
Name: genre, dtype: object

In [54]:
# movies[movies['duration'] >= 200]['title','genre']   # giving error

#### Method 4: Use loc
Better way is to use __loc__ 

In [55]:
movies.loc[movies.duration >= 200, 'genre']

2          Crime
7      Adventure
17         Drama
78         Crime
85     Adventure
142    Adventure
157        Drama
204    Adventure
445    Adventure
476        Drama
630    Biography
767       Action
Name: genre, dtype: object