# Data Manipulation

## Introduction 



One of the reasons Pandas has become such a popular tool for data analysts over the last few years is because it makes data transformation and manipulation much faster and easier. In this lesson, we will take an introductory look at how to rename and restructure data as we prepare it to be analyzed.



For this lesson, we will be using the same vehicles data set that we practiced importing and exporting in the Import and Export lesson. Let's go ahead and import the CSV version of the data set and see what it actually looks like.

In [1]:
import pandas as pd

In [2]:
# read 'vehicles/vehicles.csv' as data
data = pd.read_csv('vehicles/vehicles.csv')
data.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


## Renaming Columns 



Data will often come either without column names or with column names that are not as intuitive as they could be. When this is the case, we want to assign descriptive names to the columns so that we remember what the values in each column represent. Intuitively naming your columns before diving in and analyzing your data is a good habit to develop.

Pandas provides us with a couple different ways to modify column names. For example, the **.columns method** will return a list of all the column names in the data set.

In [5]:
# display columns
data.columns

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')

If you want to set the column names for every column in the data set, or change the names of multiple columns, you can just **pass the columns method a list *with the same number of column names* as the data has columns, and Pandas will update all the column names.** 

In the example below, we are updating the **Make column name to Manufacturer** and the **Engine Displacement column name to Displacement** using this method.

In [6]:
# Rename columns
data.columns = ['Manufacturer','Model','Year','Displacement',
                'Cylinders','Transmission','Drivetrain',
                'Vehicle Class','Fuel Type','Fuel Barrels/Year',
                'City MPG','Highway MPG','Combined MPG',
                'CO2 Emission Grams/Mile','Fuel Cost/Year']
 
data.columns

Index(['Manufacturer', 'Model', 'Year', 'Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')

If you want to rename just a single column, or just a few columns, you can use the **.rename method** and pass a dictionary containing the existing column names and new column names to the columns parameter. Below, we will change the column names we modified in the previous example back to their original column names using the rename method. 

In [7]:
#rename columns using .rename 
data.rename(columns = {'Manufacturer' : 'Make'}) 

Unnamed: 0,Make,Model,Year,Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


## Changing Column Order



You can also **reorder columns** in a data frame. To do this, you would create a list containing the data frame's column names in the order you would like them. Then you can just recreate the data frame with the customized ordering as follows.

In [9]:
data = pd.read_csv('vehicles/vehicles.csv')
data.columns

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')

In [11]:
# change column order: swap year and model, vehicle class and transmiss, ect. 
column_order = ['Make','Year','Model','Vehicle Class',
                'Transmission','Drivetrain','Fuel Type',
                'Cylinders','Engine Displacement','Fuel Barrels/Year'
                ,'Highway MPG','Combined MPG',
                'CO2 Emission Grams/Mile','Fuel Cost/Year']


In [16]:
data = data[column_order]
#data[['Year','Make']]

In [17]:
data.columns

Index(['Make', 'Year', 'Model', 'Vehicle Class', 'Transmission', 'Drivetrain',
       'Fuel Type', 'Cylinders', 'Engine Displacement', 'Fuel Barrels/Year',
       'Highway MPG', 'Combined MPG', 'CO2 Emission Grams/Mile',
       'Fuel Cost/Year'],
      dtype='object')

## Filtering Records



When working with data, analysts often need to **filter the data based on one or more conditional statements**. This is similar to adding a WHERE clause to a query in SQL. 

For example, suppose we needed to filter our data set for all Ford vehicles that had a 6 or more cylinders and a combined MPG of less than 18. We could enter our conditions inside square brackets to subset the data set for just the records that meet the conditions we've specified.

In [19]:
data

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,1984,DJ Po Vehicle 2WD,Special Purpose Vehicle 2WD,Automatic 3-spd,2-Wheel Drive,Regular,4.0,2.5,19.388824,17,17,522.764706,1950
1,AM General,1984,FJ8c Post Office,Special Purpose Vehicle 2WD,Automatic 3-spd,2-Wheel Drive,Regular,6.0,4.2,25.354615,13,13,683.615385,2550
2,AM General,1985,Post Office DJ5 2WD,Special Purpose Vehicle 2WD,Automatic 3-spd,Rear-Wheel Drive,Regular,4.0,2.5,20.600625,17,16,555.437500,2100
3,AM General,1985,Post Office DJ8 2WD,Special Purpose Vehicle 2WD,Automatic 3-spd,Rear-Wheel Drive,Regular,6.0,4.2,25.354615,13,13,683.615385,2550
4,ASC Incorporated,1987,GNX,Midsize Cars,Automatic 4-spd,Rear-Wheel Drive,Premium,6.0,3.8,20.600625,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,2013,fortwo coupe,Two Seaters,Auto(AM5),Rear-Wheel Drive,Premium,3.0,1.0,9.155833,38,36,244.000000,1100
35948,smart,2014,fortwo coupe,Two Seaters,Auto(AM5),Rear-Wheel Drive,Premium,3.0,1.0,9.155833,38,36,243.000000,1100
35949,smart,2015,fortwo coupe,Two Seaters,Auto(AM5),Rear-Wheel Drive,Premium,3.0,1.0,9.155833,38,36,244.000000,1100
35950,smart,2016,fortwo coupe,Two Seaters,Auto(AM6),Rear-Wheel Drive,Premium,3.0,0.9,9.155833,39,36,246.000000,1100


In [21]:
# filter data where Make == Ford, Cylinders => 6 AND 'Combined MPG' < 18
new_data = data[(data['Make'] == 'Ford') & (data['Cylinders'] >= 6) & (data['Combined MPG'] < 18)]
new_data.head()

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
11442,Ford,1986,Aerostar Van,Vans,Automatic 4-spd,Rear-Wheel Drive,Regular,6.0,2.8,19.388824,21,17,522.764706,1950
11450,Ford,1988,Aerostar Van,Vans,Automatic 4-spd,Rear-Wheel Drive,Regular,6.0,3.0,19.388824,20,17,522.764706,1950
11452,Ford,1989,Aerostar Van,Vans,Automatic 4-spd,Rear-Wheel Drive,Regular,6.0,3.0,19.388824,21,17,522.764706,1950
11456,Ford,1990,Aerostar Van,Vans,Automatic 4-spd,Rear-Wheel Drive,Regular,6.0,4.0,19.388824,20,17,522.764706,1950
11459,Ford,1991,Aerostar Van,Vans,Automatic 4-spd,Rear-Wheel Drive,Regular,6.0,4.0,19.388824,20,17,522.764706,1950


There are a couple of important things to note here: 

- First, when you want to apply **multiple conditions**, you need to use an "and" operator (&) or an "or" operator (|) between your conditions. The "and" operator will return records where both of the conditions surrounding it are true, and the "or" operator will return records where either of the conditions surrounding it are true. 
- The second thing to note is that all our conditional statements are **enclosed in parentheses**. This is easy to forget, but necessary or else your results will be incorrect.

In [25]:
new_data_bmw_audi = data[((data['Make'] == 'BMW') & (data['Year'] >= 2016)) | ((data['Make'] == 'Audi') & (data['Year'] >= 2016))]
new_data_bmw_audi

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
638,Audi,2016,A3,Subcompact Cars,Auto(AM-S6),Front-Wheel Drive,Regular,4.0,1.8,12.207778,33,27,328.0,1250
640,Audi,2016,A3 Cabriolet,Subcompact Cars,Auto(AM-S6),Front-Wheel Drive,Regular,4.0,1.8,11.771786,35,28,314.0,1200
642,Audi,2016,A3 Cabriolet quattro,Subcompact Cars,Auto(AM-S6),All-Wheel Drive,Regular,4.0,2.0,12.677308,32,26,334.0,1300
643,Audi,2016,A3 e-tron,Compact Cars,Auto(AM-S6),Front-Wheel Drive,Premium and Electricity,4.0,1.4,5.863280,37,35,158.0,1150
644,Audi,2016,A3 e-tron ultra,Compact Cars,Auto(AM-S6),Front-Wheel Drive,Premium and Electricity,4.0,1.4,5.115503,41,39,138.0,1050
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3038,BMW,2016,Z4 sDrive28i,Two Seaters,Manual 6-spd,Rear-Wheel Drive,Premium,4.0,2.0,12.677308,34,26,337.0,1550
3058,BMW,2016,Z4 sDrive35i,Two Seaters,Auto(AM-S7),Rear-Wheel Drive,Premium,6.0,3.0,16.480500,24,20,454.0,2000
3064,BMW,2016,Z4 sDrive35is,Two Seaters,Auto(AM-S7),Rear-Wheel Drive,Premium,6.0,3.0,16.480500,24,20,454.0,2000
3071,BMW,2016,i3 REX,Subcompact Cars,Automatic (A1),Rear-Wheel Drive,Premium Gas or Electricity,2.0,0.6,1.563190,37,39,37.0,1050


In [28]:
new_data_bmw_audi = data[((data['Make'] == 'BMW') | (data['Make'] == 'Audi')) & (data['Year'] >= 2016)]
new_data_bmw_audi

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
638,Audi,2016,A3,Subcompact Cars,Auto(AM-S6),Front-Wheel Drive,Regular,4.0,1.8,12.207778,33,27,328.0,1250
640,Audi,2016,A3 Cabriolet,Subcompact Cars,Auto(AM-S6),Front-Wheel Drive,Regular,4.0,1.8,11.771786,35,28,314.0,1200
642,Audi,2016,A3 Cabriolet quattro,Subcompact Cars,Auto(AM-S6),All-Wheel Drive,Regular,4.0,2.0,12.677308,32,26,334.0,1300
643,Audi,2016,A3 e-tron,Compact Cars,Auto(AM-S6),Front-Wheel Drive,Premium and Electricity,4.0,1.4,5.863280,37,35,158.0,1150
644,Audi,2016,A3 e-tron ultra,Compact Cars,Auto(AM-S6),Front-Wheel Drive,Premium and Electricity,4.0,1.4,5.115503,41,39,138.0,1050
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3038,BMW,2016,Z4 sDrive28i,Two Seaters,Manual 6-spd,Rear-Wheel Drive,Premium,4.0,2.0,12.677308,34,26,337.0,1550
3058,BMW,2016,Z4 sDrive35i,Two Seaters,Auto(AM-S7),Rear-Wheel Drive,Premium,6.0,3.0,16.480500,24,20,454.0,2000
3064,BMW,2016,Z4 sDrive35is,Two Seaters,Auto(AM-S7),Rear-Wheel Drive,Premium,6.0,3.0,16.480500,24,20,454.0,2000
3071,BMW,2016,i3 REX,Subcompact Cars,Automatic (A1),Rear-Wheel Drive,Premium Gas or Electricity,2.0,0.6,1.563190,37,39,37.0,1050


## Retrieving Information from the Dataframe



You can also retrieve "high-level" information from the data frame using the following methods.



In [29]:
# Then, inspect the dtypes
data.dtypes

Make                        object
Year                         int64
Model                       object
Vehicle Class               object
Transmission                object
Drivetrain                  object
Fuel Type                   object
Cylinders                  float64
Engine Displacement        float64
Fuel Barrels/Year          float64
Highway MPG                  int64
Combined MPG                 int64
CO2 Emission Grams/Mile    float64
Fuel Cost/Year               int64
dtype: object

In [30]:
# First, use describe method 
data.describe()

Unnamed: 0,Year,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
count,35952.0,35952.0,35952.0,35952.0,35952.0,35952.0,35952.0,35952.0
mean,2000.7164,5.765076,3.338493,17.609056,23.880646,19.929322,475.316339,1892.598465
std,10.08529,1.755268,1.359395,4.467283,5.890876,5.112409,119.060773,506.958627
min,1984.0,2.0,0.6,0.06,9.0,7.0,37.0,600.0
25%,1991.0,4.0,2.2,14.699423,20.0,16.0,395.0,1500.0
50%,2001.0,6.0,3.0,17.347895,24.0,19.0,467.736842,1850.0
75%,2010.0,6.0,4.3,20.600625,27.0,23.0,555.4375,2200.0
max,2017.0,16.0,8.4,47.087143,61.0,56.0,1269.571429,5800.0


In [31]:
data.describe().transpose()   

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year,35952.0,2000.7164,10.08529,1984.0,1991.0,2001.0,2010.0,2017.0
Cylinders,35952.0,5.765076,1.755268,2.0,4.0,6.0,6.0,16.0
Engine Displacement,35952.0,3.338493,1.359395,0.6,2.2,3.0,4.3,8.4
Fuel Barrels/Year,35952.0,17.609056,4.467283,0.06,14.699423,17.347895,20.600625,47.087143
Highway MPG,35952.0,23.880646,5.890876,9.0,20.0,24.0,27.0,61.0
Combined MPG,35952.0,19.929322,5.112409,7.0,16.0,19.0,23.0,56.0
CO2 Emission Grams/Mile,35952.0,475.316339,119.060773,37.0,395.0,467.736842,555.4375,1269.571429
Fuel Cost/Year,35952.0,1892.598465,506.958627,600.0,1500.0,1850.0,2200.0,5800.0


## .loc and .iloc

We saw that we were able to select and retrieve specific information in our data frame using techniques such as conditional filtering. We also saw that we could simply select any number of columns from a data frame using the double bracket syntax. However, now suppose that we want to select not only specific columns, but also a (or any number of) row(s). In that case, we can use the .loc and .iloc methods.

.loc is short for location and let's you select a subset of the data based on the location of the row index and column name. Consider the following example. 

In [34]:
data.head(20)

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,1984,DJ Po Vehicle 2WD,Special Purpose Vehicle 2WD,Automatic 3-spd,2-Wheel Drive,Regular,4.0,2.5,19.388824,17,17,522.764706,1950
1,AM General,1984,FJ8c Post Office,Special Purpose Vehicle 2WD,Automatic 3-spd,2-Wheel Drive,Regular,6.0,4.2,25.354615,13,13,683.615385,2550
2,AM General,1985,Post Office DJ5 2WD,Special Purpose Vehicle 2WD,Automatic 3-spd,Rear-Wheel Drive,Regular,4.0,2.5,20.600625,17,16,555.4375,2100
3,AM General,1985,Post Office DJ8 2WD,Special Purpose Vehicle 2WD,Automatic 3-spd,Rear-Wheel Drive,Regular,6.0,4.2,25.354615,13,13,683.615385,2550
4,ASC Incorporated,1987,GNX,Midsize Cars,Automatic 4-spd,Rear-Wheel Drive,Premium,6.0,3.8,20.600625,21,16,555.4375,2550
5,Acura,1997,2.2CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,4.0,2.2,14.982273,26,22,403.954545,1500
6,Acura,1997,2.2CL/3.0CL,Subcompact Cars,Manual 5-spd,Front-Wheel Drive,Regular,4.0,2.2,13.73375,28,24,370.291667,1400
7,Acura,1997,2.2CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,6.0,3.0,16.4805,26,20,444.35,1650
8,Acura,1998,2.3CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,4.0,2.3,14.982273,27,22,403.954545,1500
9,Acura,1998,2.3CL/3.0CL,Subcompact Cars,Manual 5-spd,Front-Wheel Drive,Regular,4.0,2.3,13.73375,29,24,370.291667,1400


In [32]:
data.loc[20, 'Year']

1998

In [None]:
data.loc[20, ['Year', 'Model']]

In [35]:
data.loc[20]

Make                                   Acura
Year                                    1998
Model                            2.5TL/3.2TL
Vehicle Class                   Compact Cars
Transmission                 Automatic 4-spd
Drivetrain                 Front-Wheel Drive
Fuel Type                            Premium
Cylinders                                  6
Engine Displacement                      3.2
Fuel Barrels/Year                    17.3479
Highway MPG                               22
Combined MPG                              19
CO2 Emission Grams/Mile              467.737
Fuel Cost/Year                          2150
Name: 20, dtype: object

**Important:** Keep in mind that single brackets will return a pd.Series, whereas double brackets will return a pd.DataFrame object. 

In [38]:
data.loc[[20]]

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
20,Acura,1998,2.5TL/3.2TL,Compact Cars,Automatic 4-spd,Front-Wheel Drive,Premium,6.0,3.2,17.347895,22,19,467.736842,2150


In [39]:
data.loc[[2034]] 

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
2034,BMW,1986,5 Series,Compact Cars,Automatic 4-spd,Rear-Wheel Drive,Regular,6.0,3.4,20.600625,20,16,555.4375,2100


The .loc method can also be used to retrieve a specific subset of the data based on a particular value. 

In [40]:
data.loc[data['Make'] == "BMW"] 

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
1398,BMW,2010,128ci Convertible,Subcompact Cars,Automatic (S6),Rear-Wheel Drive,Premium,6.0,3.0,15.695714,27,21,423.190476,1950
1399,BMW,2010,128ci Convertible,Subcompact Cars,Manual 6-spd,Rear-Wheel Drive,Premium,6.0,3.0,14.982273,28,22,403.954545,1850
1400,BMW,2011,128ci Convertible,Subcompact Cars,Manual 6-spd,Rear-Wheel Drive,Premium,6.0,3.0,14.982273,28,22,403.954545,1850
1401,BMW,2011,128ci Convertible,Subcompact Cars,Automatic (S6),Rear-Wheel Drive,Premium,6.0,3.0,15.695714,27,21,423.190476,1950
1402,BMW,2012,128ci Convertible,Subcompact Cars,Automatic (S6),Rear-Wheel Drive,Premium,6.0,3.0,15.695714,27,21,423.190476,1950
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3070,BMW,2015,i3 REX,Subcompact Cars,Automatic (A1),Rear-Wheel Drive,Premium Gas or Electricity,2.0,0.6,1.563190,37,39,40.000000,1050
3071,BMW,2016,i3 REX,Subcompact Cars,Automatic (A1),Rear-Wheel Drive,Premium Gas or Electricity,2.0,0.6,1.563190,37,39,37.000000,1050
3072,BMW,2014,i8,Subcompact Cars,Automatic 6-spd,All-Wheel Drive,Premium and Electricity,3.0,1.5,7.356924,29,28,198.000000,1450
3073,BMW,2015,i8,Subcompact Cars,Automatic 6-spd,All-Wheel Drive,Premium and Electricity,3.0,1.5,7.356924,29,28,198.000000,1450


**Important:** This is equivalent to the following. 

In [41]:
data[data['Make'] == "BMW"]

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
1398,BMW,2010,128ci Convertible,Subcompact Cars,Automatic (S6),Rear-Wheel Drive,Premium,6.0,3.0,15.695714,27,21,423.190476,1950
1399,BMW,2010,128ci Convertible,Subcompact Cars,Manual 6-spd,Rear-Wheel Drive,Premium,6.0,3.0,14.982273,28,22,403.954545,1850
1400,BMW,2011,128ci Convertible,Subcompact Cars,Manual 6-spd,Rear-Wheel Drive,Premium,6.0,3.0,14.982273,28,22,403.954545,1850
1401,BMW,2011,128ci Convertible,Subcompact Cars,Automatic (S6),Rear-Wheel Drive,Premium,6.0,3.0,15.695714,27,21,423.190476,1950
1402,BMW,2012,128ci Convertible,Subcompact Cars,Automatic (S6),Rear-Wheel Drive,Premium,6.0,3.0,15.695714,27,21,423.190476,1950
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3070,BMW,2015,i3 REX,Subcompact Cars,Automatic (A1),Rear-Wheel Drive,Premium Gas or Electricity,2.0,0.6,1.563190,37,39,40.000000,1050
3071,BMW,2016,i3 REX,Subcompact Cars,Automatic (A1),Rear-Wheel Drive,Premium Gas or Electricity,2.0,0.6,1.563190,37,39,37.000000,1050
3072,BMW,2014,i8,Subcompact Cars,Automatic 6-spd,All-Wheel Drive,Premium and Electricity,3.0,1.5,7.356924,29,28,198.000000,1450
3073,BMW,2015,i8,Subcompact Cars,Automatic 6-spd,All-Wheel Drive,Premium and Electricity,3.0,1.5,7.356924,29,28,198.000000,1450


Now suppose that we want to retrieve information based on the row and/or column **index**. In that case, we can use the .iloc method.

In [43]:
# Retrieve row 7-18
data.iloc[7:18]  

Unnamed: 0,Make,Year,Model,Vehicle Class,Transmission,Drivetrain,Fuel Type,Cylinders,Engine Displacement,Fuel Barrels/Year,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
7,Acura,1997,2.2CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,6.0,3.0,16.4805,26,20,444.35,1650
8,Acura,1998,2.3CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,4.0,2.3,14.982273,27,22,403.954545,1500
9,Acura,1998,2.3CL/3.0CL,Subcompact Cars,Manual 5-spd,Front-Wheel Drive,Regular,4.0,2.3,13.73375,29,24,370.291667,1400
10,Acura,1998,2.3CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,6.0,3.0,16.4805,26,20,444.35,1650
11,Acura,1999,2.3CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,4.0,2.3,14.982273,27,22,403.954545,1500
12,Acura,1999,2.3CL/3.0CL,Subcompact Cars,Manual 5-spd,Front-Wheel Drive,Regular,4.0,2.3,13.73375,29,24,370.291667,1400
13,Acura,1999,2.3CL/3.0CL,Subcompact Cars,Automatic 4-spd,Front-Wheel Drive,Regular,6.0,3.0,16.4805,26,20,444.35,1650
14,Acura,1995,2.5TL,Compact Cars,Automatic 4-spd,Front-Wheel Drive,Premium,5.0,2.5,16.4805,23,20,444.35,2000
15,Acura,1996,2.5TL/3.2TL,Compact Cars,Automatic 4-spd,Front-Wheel Drive,Premium,5.0,2.5,16.4805,23,20,444.35,2000
16,Acura,1996,2.5TL/3.2TL,Compact Cars,Automatic 4-spd,Front-Wheel Drive,Premium,6.0,3.2,17.347895,22,19,467.736842,2150


In [49]:
data['Transmission']

0        Automatic 3-spd
1        Automatic 3-spd
2        Automatic 3-spd
3        Automatic 3-spd
4        Automatic 4-spd
              ...       
35947          Auto(AM5)
35948          Auto(AM5)
35949          Auto(AM5)
35950          Auto(AM6)
35951       Manual 5-spd
Name: Transmission, Length: 35952, dtype: object

In [50]:
data.iloc[:, 4]

0        Automatic 3-spd
1        Automatic 3-spd
2        Automatic 3-spd
3        Automatic 3-spd
4        Automatic 4-spd
              ...       
35947          Auto(AM5)
35948          Auto(AM5)
35949          Auto(AM5)
35950          Auto(AM6)
35951       Manual 5-spd
Name: Transmission, Length: 35952, dtype: object

In [51]:
# Retrieve row 7-18 from column 4 
data.iloc[7:18, 4]

7     Automatic 4-spd
8     Automatic 4-spd
9        Manual 5-spd
10    Automatic 4-spd
11    Automatic 4-spd
12       Manual 5-spd
13    Automatic 4-spd
14    Automatic 4-spd
15    Automatic 4-spd
16    Automatic 4-spd
17    Automatic 4-spd
Name: Transmission, dtype: object

In [52]:
# Retrieve row 7-18 from column 4 
data.iloc[7:18, [4]]

Unnamed: 0,Transmission
7,Automatic 4-spd
8,Automatic 4-spd
9,Manual 5-spd
10,Automatic 4-spd
11,Automatic 4-spd
12,Manual 5-spd
13,Automatic 4-spd
14,Automatic 4-spd
15,Automatic 4-spd
16,Automatic 4-spd


In [53]:
# Retrieve row 7-18 from columns 4, 8 and 2 
data.iloc[291:300, [4,8,2]]

Unnamed: 0,Transmission,Engine Displacement,Model
291,Manual 6-spd,2.4,TSX
292,Automatic (S5),3.5,TSX
293,Automatic (S5),2.4,TSX Wagon
294,Automatic (S5),2.4,TSX Wagon
295,Automatic (S5),2.4,TSX Wagon
296,Automatic (S5),2.4,TSX Wagon
297,Automatic 4-spd,2.5,Vigor
298,Manual 5-spd,2.5,Vigor
299,Automatic 4-spd,2.5,Vigor


## Summary 

In this lesson, we learned a variety of ways to manipulate data frames. We started by covering how to change the names of a data frame's columns and the order in which those columns appear. After that, we looked at how we could easily obtain useful "meta-data" about the data frame as a whole. At the end of this lesson, we looked at the .loc and .iloc methods. 

In the following lesson, we will look at more advanced data frame operations. 