# Dataframe Calculations

## Introduction

There are cases where we might want to augment our Pandas DataFrame with calculated columns. Pandas enables us to perform these calculations and easily store them in a new column.

## Working with Constants 



We can add new calculated columns using an existing column and a constant value.

In [46]:
import numpy as np
import pandas as pd

In [47]:
# Read animals 
animals = pd.read_csv('https://raw.githubusercontent.com/loukjsmalbil/datasets_ws/master/animals.csv')
animals

Unnamed: 0,brainwt,bodywt,animal
0,3.385,44.500,Arctic_fox
1,0.480,15.499,Owl_monkey
2,1.350,8.100,Beaver
3,464.983,423.012,Cow
4,36.328,119.498,Gray_wolf
...,...,...,...
57,160.004,169.000,Brazilian_tapir
58,0.900,2.600,Tenrec
59,1.620,11.400,Phalanger
60,0.104,2.500,Tree_shrew


In [48]:
# Multiply animals['bodywt'] (in pounds) by 0.45359237 and store in brainwtkg column
animals['brainwtkg'] = animals['brainwt']  * 0.45359237

animals.head()

Unnamed: 0,brainwt,bodywt,animal,brainwtkg
0,3.385,44.5,Arctic_fox,1.53541
1,0.48,15.499,Owl_monkey,0.217724
2,1.35,8.1,Beaver,0.61235
3,464.983,423.012,Cow,210.912741
4,36.328,119.498,Gray_wolf,16.478104


Note that we used the head function to look at the first 5 rows for every column. We do this to confirm that the changes we made to the DataFrame worked as expected.

## Combining Two (Or More) Columns



We can perform calculations using a combination of two or more column. We write an equation that correctly refers to the columns in the DataFrame and assign the calculation to a new column.



For example, we can compute the ratio of body weight to brain weight for all animals in our data and assign this value to a new column.

In [49]:
# Compute the ratio of animals['bodywt'] /  and animals['brainwt'] and store this as animals['wtratio']

#df_new['newcolname']
animals['wtratio'] = animals['bodywt'] / animals['brainwt'] 
animals.head()


Unnamed: 0,brainwt,bodywt,animal,brainwtkg,wtratio
0,3.385,44.5,Arctic_fox,1.53541,13.146233
1,0.48,15.499,Owl_monkey,0.217724,32.289583
2,1.35,8.1,Beaver,0.61235,6.0
3,464.983,423.012,Cow,210.912741,0.909736
4,36.328,119.498,Gray_wolf,16.478104,3.289419


## Conditional Calculations



It is possible to perform more complex calculations. For example, you may have noticed that we used division in the previous example without checking whether the denominator is zero. This can cause quite a bit of problems. 

Therefore, we can introduce a condition in our assignment. 

We can create conditional functions using the *where function* in numpy. We pass 3 arguments to the function. The *first* argument is the **condition**, the *second* is the value in case the condition is **true**, and the *third* is the value in case the condition is **false**.

In [51]:
# Make a new column with 'high' and 'low' values using np.where, where brainweight > 5

                                #Condition.             #If true
animals['Low/High'] = np.where(((animals['brainwtkg'] > 5) & (animals['brainwtkg'] < 10)), 'High', 'Low')
animals

Unnamed: 0,brainwt,bodywt,animal,brainwtkg,wtratio,Low/High
0,3.385,44.500,Arctic_fox,1.535410,13.146233,Low
1,0.480,15.499,Owl_monkey,0.217724,32.289583,Low
2,1.350,8.100,Beaver,0.612350,6.000000,Low
3,464.983,423.012,Cow,210.912741,0.909736,Low
4,36.328,119.498,Gray_wolf,16.478104,3.289419,Low
...,...,...,...,...,...,...
57,160.004,169.000,Brazilian_tapir,72.576594,1.056224,Low
58,0.900,2.600,Tenrec,0.408233,2.888889,Low
59,1.620,11.400,Phalanger,0.734820,7.037037,Low
60,0.104,2.500,Tree_shrew,0.047174,24.038462,Low


In [13]:
np.where(animals['brainwtkg'] > 5)

(array([ 3,  4,  5,  6, 18, 20, 21, 27, 28, 29, 31, 32, 35, 41, 43, 44, 45,
        48, 55, 57]),)

## Calculations Using Functions 


As we have learned in a previous lesson, Pandas DataFrames have 3 components: 

- rows; 
- columns; 
- data. 

The rows and columns are also called axes. Axis zero is the row axis and axis one is the column axis. Therefore, we can apply functions to the column axis in order to summarize all columns at once.



Let's say we want to take a sum of all numeric columns in the animals DataFrame. We can do this by using the sum function and passing axis=1 as an argument to the function.

In [14]:
animals = pd.read_csv('https://raw.githubusercontent.com/loukjsmalbil/datasets_ws/master/animals.csv')

animals.head()

Unnamed: 0,brainwt,bodywt,animal
0,3.385,44.5,Arctic_fox
1,0.48,15.499,Owl_monkey
2,1.35,8.1,Beaver
3,464.983,423.012,Cow
4,36.328,119.498,Gray_wolf


In [26]:
# Sum weight values
animals['sum'] = animals[['brainwt', 'bodywt']].sum(axis=1)
animals.head()

Unnamed: 0,brainwt,bodywt,animal,sum,Total,mean
0,3.385,44.500,Arctic_fox,23.9425,47.885,23.9425
1,0.480,15.499,Owl_monkey,7.9895,15.979,7.9895
2,1.350,8.100,Beaver,4.7250,9.450,4.7250
3,464.983,423.012,Cow,443.9975,887.995,443.9975
4,36.328,119.498,Gray_wolf,77.9130,155.826,77.9130
...,...,...,...,...,...,...
57,160.004,169.000,Brazilian_tapir,164.5020,329.004,164.5020
58,0.900,2.600,Tenrec,1.7500,3.500,1.7500
59,1.620,11.400,Phalanger,6.5100,13.020,6.5100
60,0.104,2.500,Tree_shrew,1.3020,2.604,1.3020


In [24]:
# Simply add the two columns and create the column sum
animals['Total'] = animals['brainwt'] + animals['bodywt']
animals.head() 

Unnamed: 0,brainwt,bodywt,animal,sum,Total
0,3.385,44.5,Arctic_fox,47.885,47.885
1,0.48,15.499,Owl_monkey,15.979,15.979
2,1.35,8.1,Beaver,9.45,9.45
3,464.983,423.012,Cow,887.995,887.995
4,36.328,119.498,Gray_wolf,155.826,155.826


## Part II Other Useful Functionalities



Apart from the basic operations we can carry out, Pandas also offers us usefull functionalities to improve the efficiency and effectiveness of the calculations. 

In [27]:
# Read csv cars
cars = pd.read_csv('vehicles/vehicles.csv')
cars.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [32]:
# Grouping the data: Make and Fuel Barrels/Year grouped by 'Make', obtaining the mean of Fuel Barrels/Year
cars[['Make', 'Fuel Barrels/Year']].groupby('Make').mean()

Unnamed: 0_level_0,Fuel Barrels/Year
Make,Unnamed: 1_level_1
AM General,22.674670
ASC Incorporated,20.600625
Acura,15.673371
Alfa Romeo,17.208234
American Motors Corporation,18.758092
...,...
Volkswagen,14.594784
Volvo,16.186996
Wallace Environmental,24.404196
Yugo,13.206218


## Isin method



The isin-method is a very easy way to check whether some specific value is present and, subsequently, to make a subset out of that. 

In [34]:
cars_years = cars[cars['Year'].isin(['2004', '2017'])]
cars_years.head()  

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
38,Acura,3.5RL,2004,3.5,6.0,Automatic 4-spd,Front-Wheel Drive,Midsize Cars,Premium,18.311667,16,22,18,493.722222,2250
46,Acura,ILX,2017,2.4,4.0,Auto(AM-S8),Front-Wheel Drive,Compact Cars,Premium,11.365862,25,35,29,309.0,1400
126,Acura,MDX 4WD,2004,3.5,6.0,Automatic 5-spd,4-Wheel or All-Wheel Drive,Sport Utility Vehicle - 4WD,Premium,19.388824,15,21,17,522.764706,2400
140,Acura,MDX AWD,2017,3.5,6.0,Automatic (S9),All-Wheel Drive,Small Sport Utility Vehicle 4WD,Premium,14.982273,19,26,22,404.0,1850
141,Acura,MDX AWD,2017,3.5,6.0,Automatic (S9),All-Wheel Drive,Small Sport Utility Vehicle 4WD,Premium,15.695714,18,26,21,424.0,1950


In [41]:
# Cars isin 2014
cars[cars['Make'].isin(['Acura'])]

cars[cars['Make'].isin(['Audi', 'Lexus'])]

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
503,Audi,100,1989,2.3,5.0,Automatic 3-spd,Front-Wheel Drive,Midsize Cars,Regular,18.311667,17,20,18,493.722222,1850
504,Audi,100,1989,2.3,5.0,Manual 5-spd,Front-Wheel Drive,Midsize Cars,Regular,17.347895,16,23,19,467.736842,1750
505,Audi,100,1990,2.3,5.0,Automatic 3-spd,Front-Wheel Drive,Midsize Cars,Regular,18.311667,16,20,18,493.722222,1850
506,Audi,100,1990,2.3,5.0,Automatic 4-spd,Front-Wheel Drive,Midsize Cars,Regular,18.311667,16,22,18,493.722222,1850
507,Audi,100,1991,2.3,5.0,Automatic 4-spd,Front-Wheel Drive,Midsize Cars,Regular,18.311667,16,22,18,493.722222,1850
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21520,Lexus,SC 430,2006,4.3,8.0,Automatic (S6),Rear-Wheel Drive,Minicompact Cars,Premium,17.347895,17,23,19,467.736842,2150
21521,Lexus,SC 430,2007,4.3,8.0,Automatic (S6),Rear-Wheel Drive,Minicompact Cars,Premium,17.347895,16,23,19,467.736842,2150
21522,Lexus,SC 430,2008,4.3,8.0,Automatic (S6),Rear-Wheel Drive,Minicompact Cars,Premium,17.347895,16,23,19,467.736842,2150
21523,Lexus,SC 430,2009,4.3,8.0,Automatic (S6),Rear-Wheel Drive,Minicompact Cars,Premium,17.347895,16,23,19,467.736842,2150


In [69]:
# Not; use tilde
cars[~cars['Make'].isin(['Acura'])]

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1970-01-01 00:00:00.000001984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1970-01-01 00:00:00.000001984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1970-01-01 00:00:00.000001985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1970-01-01 00:00:00.000001985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1970-01-01 00:00:00.000001987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,1970-01-01 00:00:00.000002013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,1970-01-01 00:00:00.000002014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,1970-01-01 00:00:00.000002015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,1970-01-01 00:00:00.000002016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [70]:
#Not: tilde
cars[~cars['Fuel Type'].isin(['Regular'])]

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
4,ASC Incorporated,GNX,1970-01-01 00:00:00.000001987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
14,Acura,2.5TL,1970-01-01 00:00:00.000001995,2.5,5.0,Automatic 4-spd,Front-Wheel Drive,Compact Cars,Premium,16.480500,18,23,20,444.350000,2000
15,Acura,2.5TL/3.2TL,1970-01-01 00:00:00.000001996,2.5,5.0,Automatic 4-spd,Front-Wheel Drive,Compact Cars,Premium,16.480500,18,23,20,444.350000,2000
16,Acura,2.5TL/3.2TL,1970-01-01 00:00:00.000001996,3.2,6.0,Automatic 4-spd,Front-Wheel Drive,Compact Cars,Premium,17.347895,17,22,19,467.736842,2150
17,Acura,2.5TL/3.2TL,1970-01-01 00:00:00.000001997,2.5,5.0,Automatic 4-spd,Front-Wheel Drive,Compact Cars,Premium,16.480500,18,23,20,444.350000,2000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,1970-01-01 00:00:00.000002013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,1970-01-01 00:00:00.000002014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,1970-01-01 00:00:00.000002015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,1970-01-01 00:00:00.000002016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


# Sort Values



The sort values, as the name suggests, lets you sort values. 

In [54]:
# Sort by year using .sort_values by method
cars.sort_values(by = 'Year', ascending = True)

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
16337,GMC,T15 (S15) Pickup 4WD,1984,2.8,6.0,Manual 4-spd,4-Wheel or All-Wheel Drive,Standard Pickup Trucks 4WD,Regular,19.388824,15,21,17,522.764706,1950
16336,GMC,T15 (S15) Pickup 4WD,1984,2.8,6.0,Manual 5-spd,4-Wheel or All-Wheel Drive,Standard Pickup Trucks 4WD,Regular,18.311667,15,22,18,493.722222,1850
5636,Chevrolet,El Camino Pickup 2WD,1984,3.8,6.0,Automatic 3-spd,2-Wheel Drive,Standard Pickup Trucks 2WD,Regular,19.388824,15,19,17,522.764706,1950
5637,Chevrolet,El Camino Pickup 2WD,1984,3.8,6.0,Automatic 4-spd,2-Wheel Drive,Standard Pickup Trucks 2WD,Regular,18.311667,15,22,18,493.722222,1850
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1968,BMW,340i xDrive,2017,3.0,6.0,Automatic (S8),All-Wheel Drive,Compact Cars,Premium,13.184400,21,31,25,357.000000,1600
3137,Bentley,Continental GT,2017,4.0,8.0,Automatic (S8),All-Wheel Drive,Compact Cars,Premium,17.347895,15,25,19,475.000000,2150
1965,BMW,340i,2017,3.0,6.0,Manual 6-spd,Rear-Wheel Drive,Compact Cars,Premium,14.330870,19,29,23,395.000000,1750
21091,Land Rover,Range Rover Evoque Convertible,2017,2.0,4.0,Automatic (S9),4-Wheel Drive,Small Sport Utility Vehicle 4WD,Premium,14.330870,20,28,23,391.000000,1750


## Count



Another useful method is the .count method which allows us to see how many 'pieces' there are of a certain entity in our dataframe.  

In [55]:
cars[['Model', 'Year']].groupby('Year').count() 

Unnamed: 0_level_0,Model
Year,Unnamed: 1_level_1
1984,645
1985,1581
1986,1188
1987,1198
1988,1119
1989,1127
1990,1068
1991,1122
1992,1107
1993,1077


In [None]:
# Use count to determine how many different models were built in a certain year (use groupby('year'))
cars[['Model', 'Year']].groupby('Year').count()

In [56]:
# Use value counts to determine the total amount of cars['Make'] 
cars['Make'].value_counts()

Chevrolet                           3643
Ford                                2946
Dodge                               2360
GMC                                 2347
Toyota                              1836
                                    ... 
Aurora Cars Ltd                        1
Volga Associated Automobile            1
Environmental Rsch and Devp Corp       1
Lambda Control Systems                 1
Mahindra                               1
Name: Make, Length: 127, dtype: int64

# Info() and isnull()



The info method allows us to get useful information about the data base. For instance, how many columns and rows there are as well as their types. 

In [57]:
# Info 
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35952 entries, 0 to 35951
Data columns (total 15 columns):
Make                       35952 non-null object
Model                      35952 non-null object
Year                       35952 non-null int64
Engine Displacement        35952 non-null float64
Cylinders                  35952 non-null float64
Transmission               35952 non-null object
Drivetrain                 35952 non-null object
Vehicle Class              35952 non-null object
Fuel Type                  35952 non-null object
Fuel Barrels/Year          35952 non-null float64
City MPG                   35952 non-null int64
Highway MPG                35952 non-null int64
Combined MPG               35952 non-null int64
CO2 Emission Grams/Mile    35952 non-null float64
Fuel Cost/Year             35952 non-null int64
dtypes: float64(4), int64(5), object(6)
memory usage: 4.1+ MB


Another very useful tool is the .isnull() method which locates the missing values (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike). 

In [58]:
# Read housing df
housing = pd.read_csv('https://raw.githubusercontent.com/loukjsmalbil/datasets_ws/master/housing_prices.csv')
housing

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68.0,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


In [59]:
# Check for null values using isnull() and value_counts() for Alley
housing['Alley'].isnull().value_counts()

True     1369
False      91
Name: Alley, dtype: int64

In [60]:
# Check for null values using isnull() and value_counts() for Alley
housing['Alley'].isna().value_counts()

True     1369
False      91
Name: Alley, dtype: int64

## Datetime



Sometimes we want to convert the numerical types to datetime objects. 

In [61]:
# Display the column types
cars = pd.read_csv('vehicles/vehicles.csv')
cars.head()

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [62]:
cars.dtypes

Make                        object
Model                       object
Year                         int64
Engine Displacement        float64
Cylinders                  float64
Transmission                object
Drivetrain                  object
Vehicle Class               object
Fuel Type                   object
Fuel Barrels/Year          float64
City MPG                     int64
Highway MPG                  int64
Combined MPG                 int64
CO2 Emission Grams/Mile    float64
Fuel Cost/Year               int64
dtype: object

In [63]:
# convert year to datetime object using pd.to_datetime
cars['Year'] = pd.to_datetime(cars['Year'])
cars.dtypes

Make                               object
Model                              object
Year                       datetime64[ns]
Engine Displacement               float64
Cylinders                         float64
Transmission                       object
Drivetrain                         object
Vehicle Class                      object
Fuel Type                          object
Fuel Barrels/Year                 float64
City MPG                            int64
Highway MPG                         int64
Combined MPG                        int64
CO2 Emission Grams/Mile           float64
Fuel Cost/Year                      int64
dtype: object

## Correlation



We can also compute the correlation coefficient. 

In [64]:
# Compute the correlation on the animals set using Pearson's method. 
animals_corr = animals.corr() # OR animals_corr = animals.corr(method='pearson')

animals_corr

Unnamed: 0,brainwt,bodywt,brainwtkg,wtratio
brainwt,1.0,0.934154,1.0,-0.206684
bodywt,0.934154,1.0,0.934154,-0.199439
brainwtkg,1.0,0.934154,1.0,-0.206684
wtratio,-0.206684,-0.199439,-0.206684,1.0


In [65]:
car_corr = cars.corr()
car_corr

Unnamed: 0,Engine Displacement,Cylinders,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
Engine Displacement,1.0,0.901858,0.789752,-0.740317,-0.715039,-0.746782,0.80352,0.769678
Cylinders,0.901858,1.0,0.739517,-0.703866,-0.650287,-0.698648,0.752393,0.778153
Fuel Barrels/Year,0.789752,0.739517,1.0,-0.877752,-0.909664,-0.909743,0.986189,0.916208
City MPG,-0.740317,-0.703866,-0.877752,1.0,0.923856,0.985457,-0.894139,-0.858645
Highway MPG,-0.715039,-0.650287,-0.909664,0.923856,1.0,0.969392,-0.926405,-0.851404
Combined MPG,-0.746782,-0.698648,-0.909743,0.985457,0.969392,1.0,-0.926229,-0.875185
CO2 Emission Grams/Mile,0.80352,0.752393,0.986189,-0.894139,-0.926405,-0.926229,1.0,0.930865
Fuel Cost/Year,0.769678,0.778153,0.916208,-0.858645,-0.851404,-0.875185,0.930865,1.0


# Unique method



Lastly, we can obtain all unique values in a dataframe column. 

In [66]:
# Select all unique car manufacturers
cars['Make'].unique() 

array(['AM General', 'ASC Incorporated', 'Acura', 'Alfa Romeo',
       'American Motors Corporation', 'Aston Martin', 'Audi',
       'Aurora Cars Ltd', 'Autokraft Limited', 'BMW', 'BMW Alpina',
       'Bentley', 'Bertone', 'Bill Dovell Motor Car Company',
       'Bitter Gmbh and Co. Kg', 'Bugatti', 'Buick', 'CCC Engineering',
       'CX Automotive', 'Cadillac', 'Chevrolet', 'Chrysler',
       'Consulier Industries Inc', 'Dabryan Coach Builders Inc', 'Dacia',
       'Daewoo', 'Daihatsu', 'Dodge', 'E. P. Dutton, Inc.', 'Eagle',
       'Environmental Rsch and Devp Corp', 'Evans Automobiles',
       'Excalibur Autos', 'Federal Coach', 'Ferrari', 'Fiat', 'Fisker',
       'Ford', 'GMC', 'General Motors', 'Genesis', 'Geo', 'Goldacre',
       'Grumman Allied Industries', 'Grumman Olson', 'Honda', 'Hummer',
       'Hyundai', 'Import Foreign Auto Sales Inc',
       'Import Trade Services', 'Infiniti', 'Isis Imports Ltd', 'Isuzu',
       'J.K. Motors', 'JBA Motorcars, Inc.', 'Jaguar', 'Jeep', 'Ki

## Summary 


In this lesson we learned different ways to create calculated columns. We computed a new column by combining existing data with a constant. We also computed a calculated column using two existing columns as well as using a conditional function to create a calculated column. Finally, we looked at some additional functionalities such as correlation matrices and methods to detect missing values. 