# The DataFrame

### Objectives
After this lesson you should be able to...
+ Know a DataFrame is composed of **columns**, an **index** and **values**
+ The columns are formally a column index and have the same type as the index
+ Most analysis will take place in a DataFrame
+ Get descriptive statistics and metadata on DataFrames after import
+ Know the most common column datatypes
+ Change display settings in the notebook
+ Create a Series by selecting a single column of a DataFrame
+ Select a subset of columns from a DataFrame
+ Select rows and columns from a DataFrame by label and integer location with **`loc`** and **`.iloc`**

### Prepare for this lesson by...
[ALWAYS READ THE DOCUMENTATION BEFORE A LESSON!](http://pandas.pydata.org/pandas-docs/stable/)
+ Read the [Intro to Data Structures](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe) - only the DataFrame section
+ Read all of the [Options and Settings](http://pandas.pydata.org/pandas-docs/stable/options.html) section
+ Read the section on [descriptive statistics](http://pandas.pydata.org/pandas-docs/stable/basics.html#descriptive-statistics)

### Introduction

The DataFrame is the most common object you will be working with during your analysis and it is important for you to grasp all parts of it while using the correct terminology. There are three components to a DataFrame, the **index**, the **columns** and the **data**. 

![dataframe anatomy](images/dataframe_anatomy.png)

* The index labels the rows and the columns label the columns
* An individual element of the index is an index label
* An individual element of the columns is a column name
* The index and the columns are always in bold font
* Collectively the index and the columns are known as axes
* pandas also refers to each axis by an integer. 0 for the index and 1 for the columns. This is borrowed directly from NumPy
* The actual data is always in normal font
* Data is also referred to as values
* Missing values are displayed as NaN (not a number)

Operations on DataFrames can be applied to all elements or by row or by column. The technical term **axis** refers to the horizontal and vertical components of the frame. The row axis is numbered 0 and the column axis is numbered 1, which is convention borrowed from NumPy where ndarrays can have limitless axes beginning with 0. The **`axis`** argument shows up in nearly all DataFrame methods, meaning you can choose to do an operation over the columns or the rows.

Just as with Series, alignment of indexes silently takes place behind the scenes for DataFrames, so care needs to be taken when operating on 2 different DataFrames at the same time.

### The Index Object

Both the index and the columns are stored as pandas **`Index`** objects. There are many unique Index objects each having the word **`Index`** apart of them. All these index objects can be used to label the rows or columns. These pandas objects are **`Index, DatetimeIndex, CategoricalIndex, MultiIndex, IntervalIndex, RangeIndex, Int64Index, Float64Index, PeriodIndex, TimedeltaIndex, UInt64Index`** They all are very similar objects and have the same base functionality.

### MultiIndex DataFrames
pandas allows for indexes to have more than one dimension. This is one of the more confusing aspects of pandas. MultiIndex DataFrames will be covered in a different notebook.

### More info on the DataFrame
When a Series is outputted, the data type of the values is printed to the screen. This doesn't happen with a DataFrame. Let's read in the employee data set and use the **`info`** method to get the data type and number of non-missing values for each column.

The dataset was downloaded from the [city of Houston](http://data.ohouston.org/dataset/city-of-houston-current-employee-roster). Information for 2000 employees such as name, department, salary, race and others are provided.

In [1]:
import pandas as pd
import numpy as np

employee = pd.read_csv('data/employee.csv')
employee.head()

Unnamed: 0,UNIQUE_ID,POSITION_TITLE,DEPARTMENT,BASE_SALARY,RACE,EMPLOYMENT_TYPE,GENDER,EMPLOYMENT_STATUS,HIRE_DATE,JOB_DATE
0,0,ASSISTANT DIRECTOR (EX LVL),Municipal Courts Department,121862.0,Hispanic/Latino,Full Time,Female,Active,2006-06-12,2012-10-13
1,1,LIBRARY ASSISTANT,Library,26125.0,Hispanic/Latino,Full Time,Female,Active,2000-07-19,2010-09-18
2,2,POLICE OFFICER,Houston Police Department-HPD,45279.0,White,Full Time,Male,Active,2015-02-03,2015-02-03
3,3,ENGINEER/OPERATOR,Houston Fire Department (HFD),63166.0,White,Full Time,Male,Active,1982-02-08,1991-05-25
4,4,ELECTRICIAN,General Services Department,56347.0,White,Full Time,Male,Active,1989-06-19,1994-10-22


In [2]:
employee.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 10 columns):
UNIQUE_ID            2000 non-null int64
POSITION_TITLE       2000 non-null object
DEPARTMENT           2000 non-null object
BASE_SALARY          1886 non-null float64
RACE                 1965 non-null object
EMPLOYMENT_TYPE      2000 non-null object
GENDER               2000 non-null object
EMPLOYMENT_STATUS    2000 non-null object
HIRE_DATE            2000 non-null object
JOB_DATE             1997 non-null object
dtypes: float64(1), int64(1), object(8)
memory usage: 156.3+ KB


### Lots of info above
+ the object type (DataFrame)
+ the row index type (a RangeIndex)
+ the column names and their types ('object' is used for strings)
+ Number of non-missing values
+ A summary of the datatypes (8 object, 1 integer, 1 float) 
+ the memory usage

## DataFrame Dimensions
+ **`shape`** attribute - tuple of number of rows and columns
+ **`size`** attribute - total number of elements. rows times columns
+ **`len`** function - number of rows
+ **`ndim`** attribute - number of dimensions. Always two

In [3]:
employee.shape

(2000, 10)

In [4]:
employee.size

20000

In [5]:
len(employee)

2000

In [6]:
employee.ndim

2

### DataFrame by Pieces
Each DataFrame has both a column **Index** and a row **Index**. The **values** (the actual data) are represented by a 2 dimensional NumPy array. Let's see those pieces.

In [7]:
# Row index is retrieved by .index
employee.index

RangeIndex(start=0, stop=2000, step=1)

In [8]:
# column index is just .columns
employee.columns

Index(['UNIQUE_ID', 'POSITION_TITLE', 'DEPARTMENT', 'BASE_SALARY', 'RACE',
       'EMPLOYMENT_TYPE', 'GENDER', 'EMPLOYMENT_STATUS', 'HIRE_DATE',
       'JOB_DATE'],
      dtype='object')

In [9]:
# the values are a numpy array
employee.values

array([[0, 'ASSISTANT DIRECTOR (EX LVL)', 'Municipal Courts Department',
        ..., 'Active', '2006-06-12', '2012-10-13'],
       [1, 'LIBRARY ASSISTANT', 'Library', ..., 'Active', '2000-07-19',
        '2010-09-18'],
       [2, 'POLICE OFFICER', 'Houston Police Department-HPD', ...,
        'Active', '2015-02-03', '2015-02-03'],
       ..., 
       [1997, 'POLICE OFFICER', 'Houston Police Department-HPD', ...,
        'Active', '2014-10-13', '2015-10-13'],
       [1998, 'POLICE OFFICER', 'Houston Police Department-HPD', ...,
        'Active', '2009-01-20', '2011-07-02'],
       [1999, 'FIRE FIGHTER', 'Houston Fire Department (HFD)', ...,
        'Active', '2009-01-12', '2010-07-12']], dtype=object)

### Get summary statistics for numeric columns
Basic summary statistics for numeric columns can be retrieved through the **`describe`** method. By default, **`describe`** ignores non-numeric columns.

In [10]:
employee.describe()

Unnamed: 0,UNIQUE_ID,BASE_SALARY
count,2000.0,1886.0
mean,999.5,55767.931601
std,577.494589,21693.706679
min,0.0,24960.0
25%,499.75,40170.0
50%,999.5,54461.0
75%,1499.25,66614.0
max,1999.0,275000.0


## Different summary statistics for non-numeric columns
Summary statistics for columns of specific data type may be specified with the **`include`** parameter.

In [11]:
# get summary statistics on only the 'objects'
employee.describe(include=['object'])

Unnamed: 0,POSITION_TITLE,DEPARTMENT,RACE,EMPLOYMENT_TYPE,GENDER,EMPLOYMENT_STATUS,HIRE_DATE,JOB_DATE
count,2000,2000,1965,2000,2000,2000,2000,1997
unique,330,24,6,5,2,2,999,947
top,SENIOR POLICE OFFICER,Houston Police Department-HPD,Black or African American,Full Time,Male,Active,2016-03-28,2002-01-05
freq,220,638,700,1954,1397,1991,11,34


# Column Data Types
Each column of data in your DataFrame has a specific data type. It is important to know what each data type is and how to refer to each one. You may refer to data types in pandas by either their numpy object name or their string name. Here is a complete list of [all numpy data types](https://docs.scipy.org/doc/numpy/user/basics.types.html). 

<table>
<thead>
<td>Common Data Type Name</td>
<td>NumPy/pandas Object Pandas</td>
<td>String Name</td>
<td>Notes</td>
</thead>
<tr>
<td>Boolean</td>
<td>np.bool</td> 
<td>bool</td> 
<td>Stored as a single byte</td>
</tr>
<tr>
<td>Integer</td> 
<td>np.int</td> 
<td>int</td> 
<td>Defaulted to 64 bits. Also available are unsigned ints - np.uint</td>
</tr>
<tr>
<td>Float</td> 
<td>np.float</td>
<td>float</td>
<td>Defaulted to 64 bit</td>
</tr>
<tr>
<td>Object</td> 
<td>np.object</td> 
<td>O, object</td> 
<td>Typically strings but is a catchall for columns with multiple different types or other Python objects (tuples, lists, dicts, etc...)</td> 
</tr>
<tr>
<td>Datetime</td> 
<td>np.datetime64, pd.Timestamp</td> 
<td>datetime64 </td> 
<td>A specific moment in time with nanosecond precision</td> 
</tr>
<tr>
<td>Timedelta </td> 
<td>np.timedelta64, pd.Timedelta</td> 
<td>timedelta64 </td> 
<td>Represents an amount of time from days to nanoseconds</td> 

</tr>
<tr>
<td>Categorical </td> 
<td>pd.Categorical </td> 
<td>category </td> 
<td>Specific only to pandas. Useful
for object columns with
relatively few values</td> 
</tr>
</table>

### Using either the string or object name
pandas allows the data type to be either in string form or be given by the exact object.

In [12]:
employee.describe(include=['float'])

Unnamed: 0,BASE_SALARY
count,1886.0
mean,55767.931601
std,21693.706679
min,24960.0
25%,40170.0
50%,54461.0
75%,66614.0
max,275000.0


In [13]:
employee.describe(include=[np.float])

Unnamed: 0,BASE_SALARY
count,1886.0
mean,55767.931601
std,21693.706679
min,24960.0
25%,40170.0
50%,54461.0
75%,66614.0
max,275000.0


#### Use a list of data types you would like descriptions on
The default is to select all numeric columns - those that are either integer or float.

In [14]:
employee.describe(include=[np.float, 'int'])

Unnamed: 0,UNIQUE_ID,BASE_SALARY
count,2000.0,1886.0
mean,999.5,55767.931601
std,577.494589,21693.706679
min,0.0,24960.0
25%,499.75,40170.0
50%,999.5,54461.0
75%,1499.25,66614.0
max,1999.0,275000.0


### Hidden Columns
Let's load the movie dataset and inspect the head. Scroll to the right and you will notice that not all the columns are displayed to the screen.

In [15]:
movie = pd.read_csv('data/movie.csv', index_col='movie_title')
movie.head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


### Changing Display Settings

 Pandas comes with default values for a couple dozen display settings to help control output. One of these parameters is the number of columns displayed to the screen. The options can all be found under **`pd.options.display`**. The all the display settings that can be changed below.

In [16]:
dir(pd.options.display)

['chop_threshold',
 'colheader_justify',
 'column_space',
 'date_dayfirst',
 'date_yearfirst',
 'encoding',
 'expand_frame_repr',
 'float_format',
 'height',
 'html',
 'large_repr',
 'latex',
 'line_width',
 'max_categories',
 'max_columns',
 'max_colwidth',
 'max_info_columns',
 'max_info_rows',
 'max_rows',
 'max_seq_items',
 'memory_usage',
 'mpl_style',
 'multi_sparse',
 'notebook_repr_html',
 'pprint_nest_depth',
 'precision',
 'show_dimensions',
 'unicode',
 'width']

In [17]:
# Lets see the current settings
pd.options.display.max_columns

20

Only 20 of the 30 columns are printed out in the notebook. We can adjust this setting easily by reassigning its value. More info on [pandas options settings](http://pandas.pydata.org/pandas-docs/stable/options.html).

In [18]:
# reassign max_columns
pd.options.display.max_columns = 40

In [19]:
# and now inspect the dataframe again
movie.head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,886204,4834,Wes Studi,0.0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_t...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,471220,48350,Jack Davenport,0.0,goddess|marriage ceremony|marriage proposal|pi...,http://www.imdb.com/title/tt0449088/?ref_=fn_t...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,275868,11700,Stephanie Sigman,1.0,bomb|espionage|sequel|spy|terrorist,http://www.imdb.com/title/tt2379713/?ref_=fn_t...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,Tom Hardy,1144337,106759,Joseph Gordon-Levitt,0.0,deception|imprisonment|lawlessness|police offi...,http://www.imdb.com/title/tt1345836/?ref_=fn_t...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,Doug Walker,8,143,,0.0,,http://www.imdb.com/title/tt5289954/?ref_=fn_t...,,,,,,,12.0,7.1,,0


### Transpose a DataFrame
Occasionally its useful to transpose a DataFrame - swap the columns and the rows. In this case, the output will be much more readable. Use the **`.T`** attribute.

In [20]:
employee.describe(include=['object']).T

Unnamed: 0,count,unique,top,freq
POSITION_TITLE,2000,330,SENIOR POLICE OFFICER,220
DEPARTMENT,2000,24,Houston Police Department-HPD,638
RACE,1965,6,Black or African American,700
EMPLOYMENT_TYPE,2000,5,Full Time,1954
GENDER,2000,2,Male,1397
EMPLOYMENT_STATUS,2000,2,Active,1991
HIRE_DATE,2000,999,2016-03-28,11
JOB_DATE,1997,947,2002-01-05,34


### Always know your Index
Its important to be aware of your (row) index in a DataFrame. The column index is usually obvious as it's just the column names.

In [21]:
# when the read_csv is not given an index, the default is make the index integers starting from 0
employee.index

RangeIndex(start=0, stop=2000, step=1)

### RangeIndex
A **`RangeIndex`** is the default index pandas gives you if you don't explicitly set one. It is similar to the built-in **`range`** object. It is a delayed object, meaning that only the minimal amount of information needed to describe it fully is stored in memory, that being the **start**, **stop** and **step**. Its mainly used to save space as there is no need to store all the values.

### Underlying NumPy Array
Under all indexes and the data lies a numpy array. You can use the values index attribute to return the array.

In [22]:
# See the values of a RangeIndex
employee.index.values

array([   0,    1,    2, ..., 1997, 1998, 1999])

### Columns are an Index Object
The columns and the index must be one of the pandas Index objects. By default, the columns will be of type **`pd.Index`**. You can get the underlying NumPy array by again using the values index attribute.

In [23]:
employee.columns

Index(['UNIQUE_ID', 'POSITION_TITLE', 'DEPARTMENT', 'BASE_SALARY', 'RACE',
       'EMPLOYMENT_TYPE', 'GENDER', 'EMPLOYMENT_STATUS', 'HIRE_DATE',
       'JOB_DATE'],
      dtype='object')

In [24]:
employee.columns.values

array(['UNIQUE_ID', 'POSITION_TITLE', 'DEPARTMENT', 'BASE_SALARY', 'RACE',
       'EMPLOYMENT_TYPE', 'GENDER', 'EMPLOYMENT_STATUS', 'HIRE_DATE',
       'JOB_DATE'], dtype=object)

## The [ ] is completely different for DataFrames than for Series
The bracket's primary use is to retrieve a column(s) from a DataFrame. Simply write the name of the column into the brackets and a Series will be returned. This behavior for the [ ] is completely different for Series where it is used to retrieve value(s) by the index.

In [25]:
employee['DEPARTMENT'].head(10)

0       Municipal Courts Department
1                           Library
2     Houston Police Department-HPD
3     Houston Fire Department (HFD)
4       General Services Department
5     Houston Police Department-HPD
6    Public Works & Engineering-PWE
7      Houston Airport System (HAS)
8    Public Works & Engineering-PWE
9      Houston Airport System (HAS)
Name: DEPARTMENT, dtype: object

### Use a list to access multiple columns

In [26]:
# get three columns - put names in a list
# Returns a DataFrame

employee[['DEPARTMENT', 'GENDER', 'RACE']].head()

Unnamed: 0,DEPARTMENT,GENDER,RACE
0,Municipal Courts Department,Female,Hispanic/Latino
1,Library,Female,Hispanic/Latino
2,Houston Police Department-HPD,Male,White
3,Houston Fire Department (HFD),Male,White
4,General Services Department,Male,White


### Use a one item list to select a single column as a DataFrame

In [27]:
# If you want to return a single column as a dataframe use a list with one column name
employee[['DEPARTMENT']].head(10)

Unnamed: 0,DEPARTMENT
0,Municipal Courts Department
1,Library
2,Houston Police Department-HPD
3,Houston Fire Department (HFD)
4,General Services Department
5,Houston Police Department-HPD
6,Public Works & Engineering-PWE
7,Houston Airport System (HAS)
8,Public Works & Engineering-PWE
9,Houston Airport System (HAS)


### You have a Series when you access a single column
It's important to know that when you use the **`[ ]`** operator for a DataFrame and pass a single column name into it, you are back to having a Series. All the Series functions will work as before.

In [28]:
# use the same Series functions as before

# use max series method
employee['BASE_SALARY'].max()

275000.0

In [29]:
employee['BASE_SALARY'].mean()

55767.93160127253

In [30]:
# find the standard deviation in salaries
employee['BASE_SALARY'].std()

21693.706679449504

## Selecting data by integer location or by index label
DataFrames use the **`.iloc`** and **`.loc`** indexers to select data. We will demonstrate the usage of the indexers by first making selections based upon rows then by columns and finally by both rows and columns simultaneously.

### Using .iloc for DataFrames
**`.iloc`** accepts integers, slices and lists of integers and returns all rows with matching those integer locations.

#### Single integer with .iloc
A single scalar returns a Series with columns now in the index.

In [31]:
movie.iloc[10].head()

color                            Color
director_name              Zack Snyder
num_critic_for_reviews             673
duration                           183
director_facebook_likes              0
Name: Batman v Superman: Dawn of Justice, dtype: object

#### Slicing with .iloc

In [32]:
movie.iloc[10:15]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Batman v Superman: Dawn of Justice,Color,Zack Snyder,673.0,183.0,0.0,2000.0,Lauren Cohan,15000.0,330249062.0,Action|Adventure|Sci-Fi,Henry Cavill,371639,24450,Alan D. Purwin,0.0,based on comic book|batman|sequel to a reboot|...,http://www.imdb.com/title/tt2975590/?ref_=fn_t...,3018.0,English,USA,PG-13,250000000.0,2016.0,4000.0,6.9,2.35,197000
Superman Returns,Color,Bryan Singer,434.0,169.0,0.0,903.0,Marlon Brando,18000.0,200069408.0,Action|Adventure|Sci-Fi,Kevin Spacey,240396,29991,Frank Langella,0.0,crystal|epic|lex luthor|lois lane|return to earth,http://www.imdb.com/title/tt0348150/?ref_=fn_t...,2367.0,English,USA,PG-13,209000000.0,2006.0,10000.0,6.1,2.35,0
Quantum of Solace,Color,Marc Forster,403.0,106.0,395.0,393.0,Mathieu Amalric,451.0,168368427.0,Action|Adventure,Giancarlo Giannini,330784,2023,Rory Kinnear,1.0,action hero|attempted rape|bond girl|official ...,http://www.imdb.com/title/tt0830515/?ref_=fn_t...,1243.0,English,UK,PG-13,200000000.0,2008.0,412.0,6.7,2.35,0
Pirates of the Caribbean: Dead Man's Chest,Color,Gore Verbinski,313.0,151.0,563.0,1000.0,Orlando Bloom,40000.0,423032628.0,Action|Adventure|Fantasy,Johnny Depp,522040,48486,Jack Davenport,2.0,box office hit|giant squid|heart|liar's dice|m...,http://www.imdb.com/title/tt0383574/?ref_=fn_t...,1832.0,English,USA,PG-13,225000000.0,2006.0,5000.0,7.3,2.35,5000
The Lone Ranger,Color,Gore Verbinski,450.0,150.0,563.0,1000.0,Ruth Wilson,40000.0,89289910.0,Action|Adventure|Western,Johnny Depp,181792,45757,Tom Wilkinson,1.0,horse|outlaw|texas|texas ranger|train,http://www.imdb.com/title/tt1210819/?ref_=fn_t...,711.0,English,USA,PG-13,215000000.0,2013.0,2000.0,6.5,2.35,48000


In [33]:
movie.iloc[100:300:50]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
The Fast and the Furious,Color,Rob Cohen,187.0,106.0,357.0,4000.0,Vin Diesel,23000.0,144512310.0,Action|Crime|Thriller,Paul Walker,272223,45327,Jordana Brewster,2.0,eighteen wheeler|illegal street racing|truck|t...,http://www.imdb.com/title/tt0232500/?ref_=fn_t...,988.0,English,USA,PG-13,38000000.0,2001.0,14000.0,6.7,2.35,14000
Armageddon,Color,Michael Bay,167.0,153.0,0.0,537.0,Steve Buscemi,13000.0,201573391.0,Action|Adventure|Sci-Fi|Thriller,Bruce Willis,322395,26029,Will Patton,0.0,asteroid|astronaut|bomb|meteorite|outer space,http://www.imdb.com/title/tt0120591/?ref_=fn_t...,1171.0,English,USA,PG-13,140000000.0,1998.0,12000.0,6.6,2.35,11000
Harry Potter and the Sorcerer's Stone,Color,Chris Columbus,258.0,159.0,0.0,645.0,Fiona Shaw,11000.0,317557891.0,Adventure|Family|Fantasy,Daniel Radcliffe,444683,13191,Verne Troyer,4.0,based on novel|birthday|evil wizard|quidditch|...,http://www.imdb.com/title/tt0241527/?ref_=fn_t...,1571.0,English,UK,PG,125000000.0,2001.0,687.0,7.5,2.35,16000
The Patriot,Color,Roland Emmerich,192.0,142.0,776.0,1000.0,Adam Baldwin,13000.0,113330342.0,Action|Drama|History|War,Heath Ledger,207613,19454,Tom Wilkinson,1.0,american revolution|british|french|hero|standoff,http://www.imdb.com/title/tt0187393/?ref_=fn_t...,1144.0,English,USA,R,110000000.0,2000.0,2000.0,7.1,2.35,4000


#### Lists of integers with .iloc

In [34]:
movie.iloc[[1,100,1000]]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,471220,48350,Jack Davenport,0.0,goddess|marriage ceremony|marriage proposal|pi...,http://www.imdb.com/title/tt0449088/?ref_=fn_t...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
The Fast and the Furious,Color,Rob Cohen,187.0,106.0,357.0,4000.0,Vin Diesel,23000.0,144512310.0,Action|Crime|Thriller,Paul Walker,272223,45327,Jordana Brewster,2.0,eighteen wheeler|illegal street racing|truck|t...,http://www.imdb.com/title/tt0232500/?ref_=fn_t...,988.0,English,USA,PG-13,38000000.0,2001.0,14000.0,6.7,2.35,14000
The Life Aquatic with Steve Zissou,Color,Wes Anderson,259.0,119.0,0.0,639.0,Anjelica Huston,13000.0,24006726.0,Adventure|Comedy|Drama,Bill Murray,139535,15757,Matthew Gray Gubler,8.0,expedition|oceanographer|sea|shark|team,http://www.imdb.com/title/tt0362270/?ref_=fn_t...,632.0,English,USA,R,50000000.0,2004.0,1000.0,7.3,2.35,0


In [35]:
# single item list returns a dataframe
movie.iloc[[10]]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Batman v Superman: Dawn of Justice,Color,Zack Snyder,673.0,183.0,0.0,2000.0,Lauren Cohan,15000.0,330249062.0,Action|Adventure|Sci-Fi,Henry Cavill,371639,24450,Alan D. Purwin,0.0,based on comic book|batman|sequel to a reboot|...,http://www.imdb.com/title/tt2975590/?ref_=fn_t...,3018.0,English,USA,PG-13,250000000.0,2016.0,4000.0,6.9,2.35,197000


### Using .loc
**`.loc`** only works with index labels, so you must know the exact string (or integer) in the label.

#### Single String with .loc

In [36]:
movie.loc['The Fast and the Furious'].head()

color                          Color
director_name              Rob Cohen
num_critic_for_reviews           187
duration                         106
director_facebook_likes          357
Name: The Fast and the Furious, dtype: object

#### Slicing with .loc

In [37]:
movie.index[200]

"Harry Potter and the Sorcerer's Stone"

In [38]:
movie.index[600]

'Wall Street: Money Never Sleeps'

In [39]:
movie.loc["Harry Potter and the Sorcerer's Stone":'Wall Street: Money Never Sleeps'].head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Harry Potter and the Sorcerer's Stone,Color,Chris Columbus,258.0,159.0,0.0,645.0,Fiona Shaw,11000.0,317557891.0,Adventure|Family|Fantasy,Daniel Radcliffe,444683,13191,Verne Troyer,4.0,based on novel|birthday|evil wizard|quidditch|...,http://www.imdb.com/title/tt0241527/?ref_=fn_t...,1571.0,English,UK,PG,125000000.0,2001.0,687.0,7.5,2.35,16000
R.I.P.D.,Color,Robert Schwentke,208.0,96.0,124.0,1000.0,Jeff Bridges,16000.0,33592415.0,Action|Comedy|Fantasy,Ryan Reynolds,91640,31549,Stephanie Szostak,2.0,drug dealer|gold|partner|police|undead,http://www.imdb.com/title/tt0790736/?ref_=fn_t...,210.0,English,USA,PG-13,130000000.0,2013.0,12000.0,5.6,2.35,20000
Pirates of the Caribbean: The Curse of the Black Pearl,Color,Gore Verbinski,271.0,143.0,563.0,1000.0,Orlando Bloom,40000.0,305388685.0,Action|Adventure|Fantasy,Johnny Depp,809474,48184,Jack Davenport,3.0,caribbean|curse|governor|pirate|undead,http://www.imdb.com/title/tt0325980/?ref_=fn_t...,2113.0,English,USA,PG-13,140000000.0,2003.0,5000.0,8.1,2.35,10000
Harry Potter and the Deathly Hallows: Part I,Color,Matt Birch,4.0,,0.0,1000.0,Toby Jones,10000.0,,Fantasy,Rupert Grint,252,14719,Alfred Enoch,1.0,,http://www.imdb.com/title/tt1571403/?ref_=fn_t...,2.0,English,UK,,,2010.0,2000.0,6.4,,25
The Hunger Games: Mockingjay - Part 1,Color,Francis Lawrence,403.0,123.0,508.0,14000.0,Philip Seymour Hoffman,34000.0,337103873.0,Adventure|Sci-Fi|Thriller,Jennifer Lawrence,305008,81385,Josh Hutcherson,1.0,based on young adult novel|manipulation|rebell...,http://www.imdb.com/title/tt1951265/?ref_=fn_t...,591.0,English,USA,PG-13,125000000.0,2014.0,22000.0,6.7,2.35,52000


In [40]:
# slice to the end
movie.loc['The Hunger Games: Mockingjay - Part 1':].head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
The Hunger Games: Mockingjay - Part 1,Color,Francis Lawrence,403.0,123.0,508.0,14000.0,Philip Seymour Hoffman,34000.0,337103873.0,Adventure|Sci-Fi|Thriller,Jennifer Lawrence,305008,81385,Josh Hutcherson,1.0,based on young adult novel|manipulation|rebell...,http://www.imdb.com/title/tt1951265/?ref_=fn_t...,591.0,English,USA,PG-13,125000000.0,2014.0,22000.0,6.7,2.35,52000
The Da Vinci Code,Color,Ron Howard,294.0,174.0,2000.0,362.0,Seth Gabel,15000.0,217536138.0,Mystery|Thriller,Tom Hanks,314253,16008,Jürgen Prochnow,2.0,based on supposedly true story|holy grail|mary...,http://www.imdb.com/title/tt0382625/?ref_=fn_t...,1966.0,English,USA,PG-13,125000000.0,2006.0,574.0,6.6,2.35,0
Rio 2,Color,Carlos Saldanha,159.0,101.0,107.0,56.0,Rachel Crow,688.0,131536019.0,Adventure|Animation|Comedy|Family|Musical,Miguel Ferrer,58498,1031,Jeffrey Garcia,0.0,amazon|bird|father in law|jungle|no opening cr...,http://www.imdb.com/title/tt2357291/?ref_=fn_t...,99.0,English,USA,G,103000000.0,2014.0,237.0,6.4,2.35,0
X-Men 2,Color,Bryan Singer,289.0,134.0,0.0,346.0,Bruce Davison,20000.0,214948780.0,Action|Adventure|Fantasy|Sci-Fi|Thriller,Hugh Jackman,405973,20952,Aaron Stanford,4.0,mutant|prison|professor|school|x men,http://www.imdb.com/title/tt0290334/?ref_=fn_t...,1055.0,English,Canada,PG-13,110000000.0,2003.0,505.0,7.5,2.35,0
Fast Five,Color,Justin Lin,342.0,132.0,681.0,12000.0,Vin Diesel,23000.0,209805005.0,Action|Crime|Thriller,Paul Walker,284792,55345,Dwayne Johnson,3.0,drug lord|drugs|federal agent|heist|police,http://www.imdb.com/title/tt1596343/?ref_=fn_t...,366.0,English,USA,PG-13,125000000.0,2011.0,14000.0,7.3,2.35,54000


#### Lists with .loc

In [41]:
movie_list = ['Fast Five', 'Avatar', 'My Big Fat Greek Wedding']
movie.loc[movie_list]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Fast Five,Color,Justin Lin,342.0,132.0,681.0,12000.0,Vin Diesel,23000.0,209805005.0,Action|Crime|Thriller,Paul Walker,284792,55345,Dwayne Johnson,3.0,drug lord|drugs|federal agent|heist|police,http://www.imdb.com/title/tt1596343/?ref_=fn_t...,366.0,English,USA,PG-13,125000000.0,2011.0,14000.0,7.3,2.35,54000
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,886204,4834,Wes Studi,0.0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_t...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
My Big Fat Greek Wedding,Color,Joel Zwick,168.0,95.0,38.0,249.0,Louis Mandylor,567.0,241437427.0,Comedy|Family|Romance,Nia Vardalos,102071,1495,Lainie Kazan,3.0,baptism|greek|greek restaurant|prejudice|trave...,http://www.imdb.com/title/tt0259446/?ref_=fn_t...,756.0,English,USA,PG,5000000.0,2002.0,312.0,6.6,1.85,5000


In [42]:
# single item list as a DataFrame and not a Series
movie.loc[['Avatar']]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,886204,4834,Wes Studi,0.0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_t...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000


## Selecting columns with .iloc and .loc
We can select columns through the .iloc and .loc indexers as well. Both these indexer use a **comma** to separate the row selection from the column selection. To do only column selection we use a **colon**, which selects all rows followed by a **comma** and then by the usual integer location or label selection.

### Select all rows with some columns using .iloc

In [43]:
# select all rows with column at the 5th position
movie.iloc[:, 5].head()

movie_title
Avatar                                          855.0
Pirates of the Caribbean: At World's End       1000.0
Spectre                                         161.0
The Dark Knight Rises                         23000.0
Star Wars: Episode VII - The Force Awakens        NaN
Name: actor_3_facebook_likes, dtype: float64

In [44]:
# select all rows with columns 3,6 and 10
movie.iloc[:, [3, 6, 10]].head()

Unnamed: 0_level_0,duration,actor_2_name,actor_1_name
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,178.0,Joel David Moore,CCH Pounder
Pirates of the Caribbean: At World's End,169.0,Orlando Bloom,Johnny Depp
Spectre,148.0,Rory Kinnear,Christoph Waltz
The Dark Knight Rises,164.0,Christian Bale,Tom Hardy
Star Wars: Episode VII - The Force Awakens,,Rob Walker,Doug Walker


In [45]:
# Select all the rows but slice the columns from 5 to 10
movie.iloc[:, 5:10].head()

Unnamed: 0_level_0,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Avatar,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi
Pirates of the Caribbean: At World's End,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy
Spectre,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller
The Dark Knight Rises,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller
Star Wars: Episode VII - The Force Awakens,,Rob Walker,131.0,,Documentary


### Select all rows with some columns using .loc

In [46]:
movie.loc[:, 'director_name'].head()

movie_title
Avatar                                            James Cameron
Pirates of the Caribbean: At World's End         Gore Verbinski
Spectre                                              Sam Mendes
The Dark Knight Rises                         Christopher Nolan
Star Wars: Episode VII - The Force Awakens          Doug Walker
Name: director_name, dtype: object

In [47]:
movie.loc[:, ['director_name', 'actor_1_name', 'actor_2_name']].head()

Unnamed: 0_level_0,director_name,actor_1_name,actor_2_name
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,James Cameron,CCH Pounder,Joel David Moore
Pirates of the Caribbean: At World's End,Gore Verbinski,Johnny Depp,Orlando Bloom
Spectre,Sam Mendes,Christoph Waltz,Rory Kinnear
The Dark Knight Rises,Christopher Nolan,Tom Hardy,Christian Bale
Star Wars: Episode VII - The Force Awakens,Doug Walker,Doug Walker,Rob Walker


In [48]:
# slice up to and including actor_1_name
movie.loc[:, :'actor_1_name'].head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,Tom Hardy
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,Doug Walker


## Selecting Rows and Columns simultaneously with .iloc and .loc
Put selections both before and after the comma to select rows and columns

### Selecting rows and columns simultaneously with .iloc

In [49]:
movie.iloc[4, 10]

'Doug Walker'

In [50]:
movie.iloc[4:8, 10]

movie_title
Star Wars: Episode VII - The Force Awakens     Doug Walker
John Carter                                   Daryl Sabara
Spider-Man 3                                  J.K. Simmons
Tangled                                       Brad Garrett
Name: actor_1_name, dtype: object

In [51]:
movie.iloc[4, 10:15]

actor_1_name                 Doug Walker
num_voted_users                        8
cast_total_facebook_likes            143
actor_3_name                         NaN
facenumber_in_poster                   0
Name: Star Wars: Episode VII - The Force Awakens, dtype: object

In [52]:
movie.iloc[:4, 10:15]

Unnamed: 0_level_0,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Avatar,CCH Pounder,886204,4834,Wes Studi,0.0
Pirates of the Caribbean: At World's End,Johnny Depp,471220,48350,Jack Davenport,0.0
Spectre,Christoph Waltz,275868,11700,Stephanie Sigman,1.0
The Dark Knight Rises,Tom Hardy,1144337,106759,Joseph Gordon-Levitt,0.0


In [53]:
movie.iloc[[1,100,50], [10, 5, 17]]

Unnamed: 0_level_0,actor_1_name,actor_3_facebook_likes,num_user_for_reviews
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Pirates of the Caribbean: At World's End,Johnny Depp,1000.0,1238.0
The Fast and the Furious,Paul Walker,4000.0,988.0
The Great Gatsby,Leonardo DiCaprio,77.0,753.0


In [54]:
movie.iloc[[2], 6:10]

Unnamed: 0_level_0,actor_2_name,actor_1_facebook_likes,gross,genres
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Spectre,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller


### Selecting rows and columns with .loc

In [55]:
movie.loc['My Big Fat Greek Wedding', 'director_name']

'Joel Zwick'

In [56]:
movie_list = ['The Rock', 'Spider-Man']
col_list = ['director_name', 'title_year']
movie.loc[movie_list, col_list]

Unnamed: 0_level_0,director_name,title_year
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1
The Rock,Michael Bay,1996.0
Spider-Man,Sam Raimi,2002.0


In [57]:
movie.loc[:'The Rock':100, 'duration']

movie_title
Avatar                                   178.0
The Fast and the Furious                 106.0
Harry Potter and the Sorcerer's Stone    159.0
Epic                                     102.0
102 Dalmatians                           100.0
Pompeii                                  105.0
Name: duration, dtype: float64

In [58]:
movie.loc[movie_list, 'duration':]

Unnamed: 0_level_0,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
The Rock,136.0,0.0,904.0,Michael Biehn,12000.0,134006721.0,Action|Adventure|Thriller,Nicolas Cage,259492,15999,Bokeem Woodbine,1.0,alcatraz|fbi|general|hostage|rocket,http://www.imdb.com/title/tt0117500/?ref_=fn_t...,415.0,English,USA,R,75000000.0,1996.0,2000.0,7.4,2.35,51000
Spider-Man,121.0,0.0,4000.0,James Franco,24000.0,403706375.0,Action|Adventure|Fantasy|Romance,J.K. Simmons,544665,40484,Kirsten Dunst,0.0,evil|goblin|spider|spider man|superhero,http://www.imdb.com/title/tt0145487/?ref_=fn_t...,2012.0,English,USA,PG-13,139000000.0,2002.0,11000.0,7.3,1.85,5000


## Selections to Avoid
It was written earlier that only column names go inside of brackets for DataFrames. This isn't entirely true. It is possible to pass a slice to the brackets to return rows. I would highly recommend against doing this and use brackets only for column names, **.loc** for index labeled based selection and **.iloc** for integer position selection.

In [59]:
# select the first 5 rows with brackets - bad idea!
movie[:5]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,886204,4834,Wes Studi,0.0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_t...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,Johnny Depp,471220,48350,Jack Davenport,0.0,goddess|marriage ceremony|marriage proposal|pi...,http://www.imdb.com/title/tt0449088/?ref_=fn_t...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,Christoph Waltz,275868,11700,Stephanie Sigman,1.0,bomb|espionage|sequel|spy|terrorist,http://www.imdb.com/title/tt2379713/?ref_=fn_t...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,Tom Hardy,1144337,106759,Joseph Gordon-Levitt,0.0,deception|imprisonment|lawlessness|police offi...,http://www.imdb.com/title/tt1345836/?ref_=fn_t...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,Doug Walker,8,143,,0.0,,http://www.imdb.com/title/tt5289954/?ref_=fn_t...,,,,,,,12.0,7.1,,0


### Using **`.ix`** to mix both integer and label indexing (THIS IS DEPRECATED: DO NOT USE)
The **`.ix`** operator provided some flexibility to select by both integer position and index label simultaneously. This was a bad idea and it has been deprecated. You will still see it used online especially on stackoverflow. Do not be tempted to use it!

### Selecting single elements with `.at` and `.iat`
In the rare case that you would like to select exactly one cell of data, you can use **`.at`** for label based selection and **`.iat`** for integer based selection. They work analogously to **`.loc`** and **`.iloc`** and don't provide any extra functionality just are faster. Technically, a single element is called a **scalar** value.

In [60]:
movie.iat[5, 10]

'Daryl Sabara'

In [61]:
movie.at['My Big Fat Greek Wedding', 'actor_1_name']

'Nia Vardalos'

# Your Turn

### Begin by running the code below
This ensures that you are using the correct data.

In [62]:
movie = pd.read_csv('data/movie.csv', index_col='movie_title')

### Problem 1
<span  style="color:green; font-size:16px">Use the **`describe`** method on the movie dataset and include only the object columns. Transpose the output.</span>

In [63]:
# your code here

### Problem 2
<span  style="color:green; font-size:16px">Use the **`type`** function to output the object type for both the index and the columns.</span>

In [64]:
# your code here

### Problem 3
<span  style="color:green; font-size:16px">Select all three actor name columns.</span>

In [65]:
# your code here

### Problem 4
<span  style="color:green; font-size:16px">Select the content rating column as a Series and then as a DataFrame</span>

In [66]:
# your code here

### Problem 5
<span  style="color:green; font-size:16px">Select the 3rd and 5th rows from the movie dataset</span>

In [67]:
# your code here

### Problem 6
<span  style="color:green; font-size:16px">Select the 3rd and 5th columns from the movie dataset</span>

In [68]:
# your code here

### Problem 7
<span  style="color:green; font-size:16px">Select the first 5 rows and the last 5 columns</span>

In [69]:
# your code here

### Problem 8
<span  style="color:green; font-size:16px">Select the movie 'The Dark Night Rises'</span>

In [70]:
# your code here

### Problem 9
<span  style="color:green; font-size:16px">The values of the index are stored in a numpy array. Numpy arrays use only integer location for selection. Output the movie title from 50 to 100 using the values of the index. </span>

In [71]:
# your code here

### Problem 10
<span  style="color:green; font-size:16px">Select the first 3 rows and 3 columns using **`.loc`**</span>

In [72]:
# your code here

### Problem 11
<span  style="color:green; font-size:16px">Select a single scalar value using **`.loc`** and then do the same things with **`.at`** and use %timeit to see the speed difference.</span>

In [73]:
# your code here