# Pandas: grouping

In [118]:
import pandas as pd
import numpy as np

In [119]:
cars = pd.read_csv("vehicles.csv")

In [120]:
cars[0:500]

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Aston Martin,Vantage GT,2016,4.7,8.0,Auto(AM7),Rear-Wheel Drive,Two Seaters,Premium,20.600625,14,21,16,552.000000,2550
496,Aston Martin,Virage,2012,5.9,12.0,Automatic (S6),Rear-Wheel Drive,Minicompact Cars,Premium,21.974000,13,18,15,592.466667,2700
497,Aston Martin,Virage Saloon,1991,5.3,8.0,Automatic 3-spd,Rear-Wheel Drive,Subcompact Cars,Premium,27.467500,10,14,12,740.583333,3400
498,Aston Martin,Virage Saloon,1991,5.3,8.0,Manual 5-spd,Rear-Wheel Drive,Subcompact Cars,Premium,25.354615,11,16,13,683.615385,3100


In [121]:
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35952 entries, 0 to 35951
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Make                     35952 non-null  object 
 1   Model                    35952 non-null  object 
 2   Year                     35952 non-null  int64  
 3   Engine Displacement      35952 non-null  float64
 4   Cylinders                35952 non-null  float64
 5   Transmission             35952 non-null  object 
 6   Drivetrain               35952 non-null  object 
 7   Vehicle Class            35952 non-null  object 
 8   Fuel Type                35952 non-null  object 
 9   Fuel Barrels/Year        35952 non-null  float64
 10  City MPG                 35952 non-null  int64  
 11  Highway MPG              35952 non-null  int64  
 12  Combined MPG             35952 non-null  int64  
 13  CO2 Emission Grams/Mile  35952 non-null  float64
 14  Fuel Cost/Year        

First exploration of the dataset:

- How many observations does it have?
- Look at all the columns: do you understand what they mean?
- Look at the raw data: do you see anything weird?
- Look at the data types: are they the expected ones for the information the column contains?

In [122]:
cars.shape

(35952, 15)

### Cleaning and wrangling data

- Some car brand names refer to the same brand. Replace all brand names that contain the word "Dutton" for simply "Dutton". If you find similar examples, clean their names too. Use `loc` with boolean indexing.

- Convert CO2 Emissions from Grams/Mile to Grams/Km

- Create a binary column that solely indicates if the transmission of a car is automatic or manual. Use `pandas.Series.str.startswith` and .

- convert MPG columns to km_per_liter

In [123]:
cars['Make'].value_counts()

Chevrolet                             3643
Ford                                  2946
Dodge                                 2360
GMC                                   2347
Toyota                                1836
                                      ... 
Fisker                                   1
S and S Coach Company  E.p. Dutton       1
Environmental Rsch and Devp Corp         1
Lambda Control Systems                   1
London Taxi                              1
Name: Make, Length: 127, dtype: int64

In [124]:
list(cars['Make'].unique())

['AM General',
 'ASC Incorporated',
 'Acura',
 'Alfa Romeo',
 'American Motors Corporation',
 'Aston Martin',
 'Audi',
 'Aurora Cars Ltd',
 'Autokraft Limited',
 'BMW',
 'BMW Alpina',
 'Bentley',
 'Bertone',
 'Bill Dovell Motor Car Company',
 'Bitter Gmbh and Co. Kg',
 'Bugatti',
 'Buick',
 'CCC Engineering',
 'CX Automotive',
 'Cadillac',
 'Chevrolet',
 'Chrysler',
 'Consulier Industries Inc',
 'Dabryan Coach Builders Inc',
 'Dacia',
 'Daewoo',
 'Daihatsu',
 'Dodge',
 'E. P. Dutton, Inc.',
 'Eagle',
 'Environmental Rsch and Devp Corp',
 'Evans Automobiles',
 'Excalibur Autos',
 'Federal Coach',
 'Ferrari',
 'Fiat',
 'Fisker',
 'Ford',
 'GMC',
 'General Motors',
 'Genesis',
 'Geo',
 'Goldacre',
 'Grumman Allied Industries',
 'Grumman Olson',
 'Honda',
 'Hummer',
 'Hyundai',
 'Import Foreign Auto Sales Inc',
 'Import Trade Services',
 'Infiniti',
 'Isis Imports Ltd',
 'Isuzu',
 'J.K. Motors',
 'JBA Motorcars, Inc.',
 'Jaguar',
 'Jeep',
 'Kia',
 'Laforza Automobile Inc',
 'Lambda Control

In [125]:
cars['Make'].unique()

array(['AM General', 'ASC Incorporated', 'Acura', 'Alfa Romeo',
       'American Motors Corporation', 'Aston Martin', 'Audi',
       'Aurora Cars Ltd', 'Autokraft Limited', 'BMW', 'BMW Alpina',
       'Bentley', 'Bertone', 'Bill Dovell Motor Car Company',
       'Bitter Gmbh and Co. Kg', 'Bugatti', 'Buick', 'CCC Engineering',
       'CX Automotive', 'Cadillac', 'Chevrolet', 'Chrysler',
       'Consulier Industries Inc', 'Dabryan Coach Builders Inc', 'Dacia',
       'Daewoo', 'Daihatsu', 'Dodge', 'E. P. Dutton, Inc.', 'Eagle',
       'Environmental Rsch and Devp Corp', 'Evans Automobiles',
       'Excalibur Autos', 'Federal Coach', 'Ferrari', 'Fiat', 'Fisker',
       'Ford', 'GMC', 'General Motors', 'Genesis', 'Geo', 'Goldacre',
       'Grumman Allied Industries', 'Grumman Olson', 'Honda', 'Hummer',
       'Hyundai', 'Import Foreign Auto Sales Inc',
       'Import Trade Services', 'Infiniti', 'Isis Imports Ltd', 'Isuzu',
       'J.K. Motors', 'JBA Motorcars, Inc.', 'Jaguar', 'Jeep', 'Ki

In [126]:
cars['Make'] = list(map(lambda x: "BMW" if ( "BMW" in x ) else x, cars['Make']))

In [127]:
cars['Make'] = list(map(lambda x: "AMG" if ( "AM" in x ) else x, cars['Make']))

In [128]:
cars['Make'] = list(map(lambda x: "ASC" if ( "ASC " in x ) else x, cars['Make']))

In [129]:
cars['Make'] = list(map(lambda x: "Grumman" if ( "Grumman " in x ) else x, cars['Make']))

In [130]:
cars['Make'] = list(map(lambda x: "PAS, Inc" if ( "PAS " in x ) else x, cars['Make']))

In [131]:
cars['Make'].value_counts()

Chevrolet                 3643
Ford                      2946
Dodge                     2360
GMC                       2347
Toyota                    1836
                          ... 
Lambda Control Systems       1
ASC                          1
Goldacre                     1
Mahindra                     1
London Taxi                  1
Name: Make, Length: 124, dtype: int64

Converting Grams/Mile to Grams/Km

1 Mile = 1.60934 Km

Grams/Mile * Mile/Km -> Grams/Mile * 1 Mile/1.60934Km

$$ \frac{Grams}{Mile} * \frac{Mile}{Km} $$

$$ \frac{Grams}{Mile} * \frac{1 Mile}{1.60934Km}  $$

In [132]:
list(cars.columns)

['Make',
 'Model',
 'Year',
 'Engine Displacement',
 'Cylinders',
 'Transmission',
 'Drivetrain',
 'Vehicle Class',
 'Fuel Type',
 'Fuel Barrels/Year',
 'City MPG',
 'Highway MPG',
 'Combined MPG',
 'CO2 Emission Grams/Mile',
 'Fuel Cost/Year']

In [133]:
cars['CO2 Emission Grams/Km'] = list(map(lambda x: x / 1.60934  ,cars['CO2 Emission Grams/Mile']))

In [134]:
list(cars.columns)

['Make',
 'Model',
 'Year',
 'Engine Displacement',
 'Cylinders',
 'Transmission',
 'Drivetrain',
 'Vehicle Class',
 'Fuel Type',
 'Fuel Barrels/Year',
 'City MPG',
 'Highway MPG',
 'Combined MPG',
 'CO2 Emission Grams/Mile',
 'Fuel Cost/Year',
 'CO2 Emission Grams/Km']

In [135]:
cars = cars.drop(columns="CO2 Emission Grams/Mile")
#cars.drop(columns="CO2 Emission Grams/Mile", inplace=True)

In [136]:
list(cars.columns)

['Make',
 'Model',
 'Year',
 'Engine Displacement',
 'Cylinders',
 'Transmission',
 'Drivetrain',
 'Vehicle Class',
 'Fuel Type',
 'Fuel Barrels/Year',
 'City MPG',
 'Highway MPG',
 'Combined MPG',
 'Fuel Cost/Year',
 'CO2 Emission Grams/Km']

Replacing the column `Transmission` with either Transmission or Manual

In [137]:
cars['Transmission'].head()

0    Automatic 3-spd
1    Automatic 3-spd
2    Automatic 3-spd
3    Automatic 3-spd
4    Automatic 4-spd
Name: Transmission, dtype: object

In [138]:
cars['Transmission'].unique()

array(['Automatic 3-spd', 'Automatic 4-spd', 'Manual 5-spd',
       'Automatic (S5)', 'Manual 6-spd', 'Automatic 5-spd', 'Auto(AM8)',
       'Auto(AM-S8)', 'Auto(AV-S7)', 'Automatic (S6)', 'Automatic (S9)',
       'Automatic (S4)', 'Auto(AM-S9)', 'Automatic (S7)', 'Auto(AM7)',
       'Auto(AM-S7)', 'Auto(AM6)', 'Automatic 6-spd', 'Manual 4-spd',
       'Automatic (S8)', 'Manual(M7)', 'Auto(AM-S6)',
       'Automatic (variable gear ratios)', 'Automatic (AV)',
       'Auto(AV-S8)', 'Automatic (AM6)', 'Automatic 8-spd', 'Auto(A1)',
       'Automatic (A1)', 'Automatic (A6)', 'Auto(AV-S6)', 'Manual 3-spd',
       'Manual 7-spd', 'Automatic 9-spd', 'Auto (AV)', 'Automatic 6spd',
       'Auto(L4)', 'Auto(L3)', 'Auto (AV-S6)', 'Auto (AV-S8)',
       'Automatic (AV-S6)', 'Automatic 7-spd', 'Manual 5 spd',
       'Auto(AM5)', 'Automatic (AM5)'], dtype=object)

In [139]:
cars['Transmission'] = list( map(lambda x: "Automatic" if ("Auto" in x) else "Manual",cars['Transmission']) )

convert MPG columns to km_per_liter

MPG = Miles/Gallon -> Km/Liter

1 Mile = 1.60934 Km

1 Gallon = 3.78541 Liters

$$ \frac{Miles}{Gallon} -> \frac{Miles}{Gallon} * \frac{Km}{Miles} * \frac{Gallon}{Liters}$$

$$ \frac{Miles}{Gallon} -> \frac{Miles}{Gallon} * \frac{1.60934Km}{ 1Miles} * \frac{1 Gallon}{3.78541 Liters}$$

* ( 1.60934 / 3.78541 )


In [140]:
list(cars.columns)

['Make',
 'Model',
 'Year',
 'Engine Displacement',
 'Cylinders',
 'Transmission',
 'Drivetrain',
 'Vehicle Class',
 'Fuel Type',
 'Fuel Barrels/Year',
 'City MPG',
 'Highway MPG',
 'Combined MPG',
 'Fuel Cost/Year',
 'CO2 Emission Grams/Km']

In [141]:
cars['City Km/Liter'] = list( map(lambda x: x * ( 1.60934 / 3.78541 ),cars['City MPG']) )

In [142]:
cars.drop(columns="City MPG", inplace=True)

In [143]:
cars['Highway Km/Liter'] = list( map(lambda x: x * ( 1.60934 / 3.78541 ),cars['Highway MPG']) )
cars.drop(columns="Highway MPG", inplace=True)

In [144]:
cars['Combined Km/Liter'] = list( map(lambda x: x * ( 1.60934 / 3.78541 ),cars['Combined MPG']) )
cars.drop(columns="Combined MPG", inplace=True)

### Gathering insights:

- How many car makers are there? How many models? Which car maker has the most cars in the dataset?

- When were these cars made?

- How big is the engine of these cars?

- What's the frequency of different transmissions, drivetrains and fuel types?

- What's the car that consumes the least/most fuel?

How many makes

In [145]:
len(cars['Make'].unique().tolist())

124

In [146]:
cars['Make'].value_counts()

Chevrolet                 3643
Ford                      2946
Dodge                     2360
GMC                       2347
Toyota                    1836
                          ... 
Lambda Control Systems       1
ASC                          1
Goldacre                     1
Mahindra                     1
London Taxi                  1
Name: Make, Length: 124, dtype: int64

How many models

In [147]:
len(cars['Model'].unique().tolist())

3608

In [148]:
cars['Model'].value_counts()

F150 Pickup 2WD            197
F150 Pickup 4WD            179
Truck 2WD                  173
Mustang                    170
Jetta                      169
                          ... 
FX37 AWD                     1
AMG S65 Coupe                1
Magnus                       1
Metris (Cargo Van)           1
550 Maranello/Barchetta      1
Name: Model, Length: 3608, dtype: int64

Which car Maker has more cars

In [149]:
make = cars['Make'].value_counts().index[0]
make

'Chevrolet'

group by the data by the Make  using count function

In [150]:
cars.count()

Make                     35952
Model                    35952
Year                     35952
Engine Displacement      35952
Cylinders                35952
Transmission             35952
Drivetrain               35952
Vehicle Class            35952
Fuel Type                35952
Fuel Barrels/Year        35952
Fuel Cost/Year           35952
CO2 Emission Grams/Km    35952
City Km/Liter            35952
Highway Km/Liter         35952
Combined Km/Liter        35952
dtype: int64

In [151]:
cars.groupby('Make')['Model'].count().describe()

count     124.000000
mean      289.935484
std       593.639001
min         1.000000
25%         2.000000
50%        15.000000
75%       343.500000
max      3643.000000
Name: Model, dtype: float64

In [152]:
cars.groupby('Make').count()['Model']

Make
AMG                               4
ASC                               1
Acura                           302
Alfa Romeo                       41
American Motors Corporation      22
                               ... 
Volkswagen                     1047
Volvo                           717
Wallace Environmental            32
Yugo                              8
smart                            20
Name: Model, Length: 124, dtype: int64

When the cars of the Make which has more cars were made?

In [153]:
cars[ cars['Make'] == "Chevrolet" ][['Make','Model','Year','Engine Displacement']] 

Unnamed: 0,Make,Model,Year,Engine Displacement
4275,Chevrolet,Astro 2WD (cargo),1985,2.5
4276,Chevrolet,Astro 2WD (cargo),1985,4.3
4277,Chevrolet,Astro 2WD (cargo),1985,4.3
4278,Chevrolet,Astro 2WD (cargo),1985,4.3
4279,Chevrolet,Astro 2WD (cargo),1985,2.5
...,...,...,...,...
7913,Chevrolet,Volt,2013,1.4
7914,Chevrolet,Volt,2014,1.4
7915,Chevrolet,Volt,2015,1.4
7916,Chevrolet,Volt,2016,1.5


In [154]:
cars['Transmission'].value_counts()

Automatic    24290
Manual       11662
Name: Transmission, dtype: int64

In [155]:
cars.columns

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'Fuel Cost/Year', 'CO2 Emission Grams/Km',
       'City Km/Liter', 'Highway Km/Liter', 'Combined Km/Liter'],
      dtype='object')

In [156]:
cars['Drivetrain'].value_counts()

Front-Wheel Drive             13044
Rear-Wheel Drive              12726
4-Wheel or All-Wheel Drive     6503
All-Wheel Drive                2039
4-Wheel Drive                  1058
2-Wheel Drive                   423
Part-time 4-Wheel Drive         158
2-Wheel Drive, Front              1
Name: Drivetrain, dtype: int64

In [157]:
cars['Fuel Type'].value_counts()

Regular                        23587
Premium                         9921
Gasoline or E85                 1195
Diesel                           911
Premium or E85                   121
Midgrade                          74
CNG                               60
Gasoline or natural gas           20
Premium and Electricity           20
Premium Gas or Electricity        17
Regular Gas and Electricity       16
Gasoline or propane                8
Regular Gas or Electricity         2
Name: Fuel Type, dtype: int64

Cars which consumes more(max) or less(min) at year.

Fuel Barrels/Year

In [158]:
cars['Fuel Barrels/Year'].max()

47.08714285714285

In [159]:
cars[ cars['Fuel Barrels/Year'] == cars['Fuel Barrels/Year'].max()]

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,Fuel Cost/Year,CO2 Emission Grams/Km,City Km/Liter,Highway Km/Liter,Combined Km/Liter
20894,Lamborghini,Countach,1986,5.2,12.0,Manual,Rear-Wheel Drive,Two Seaters,Premium,47.087143,5800,788.877073,2.550857,4.251429,2.976
20895,Lamborghini,Countach,1987,5.2,12.0,Manual,Rear-Wheel Drive,Two Seaters,Premium,47.087143,5800,788.877073,2.550857,4.251429,2.976
20896,Lamborghini,Countach,1988,5.2,12.0,Manual,Rear-Wheel Drive,Two Seaters,Premium,47.087143,5800,788.877073,2.550857,4.251429,2.976
20897,Lamborghini,Countach,1989,5.2,12.0,Manual,Rear-Wheel Drive,Two Seaters,Premium,47.087143,5800,788.877073,2.550857,4.251429,2.976
20898,Lamborghini,Countach,1990,5.2,12.0,Manual,Rear-Wheel Drive,Two Seaters,Premium,47.087143,5800,788.877073,2.550857,4.251429,2.976


In [160]:
cars[ cars['Fuel Barrels/Year'] == cars['Fuel Barrels/Year'].min()]

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,Fuel Cost/Year,CO2 Emission Grams/Km,City Km/Liter,Highway Km/Liter,Combined Km/Liter
17395,Honda,Civic Natural Gas,2012,1.8,4.0,Automatic,Front-Wheel Drive,Compact Cars,CNG,0.06,1000,142.104437,11.478857,16.155428,13.179428
17396,Honda,Civic Natural Gas,2013,1.8,4.0,Automatic,Front-Wheel Drive,Compact Cars,CNG,0.06,1000,135.459257,11.478857,16.155428,13.179428
17397,Honda,Civic Natural Gas,2014,1.8,4.0,Automatic,Front-Wheel Drive,Compact Cars,CNG,0.06,1000,135.459257,11.478857,16.155428,13.179428
17398,Honda,Civic Natural Gas,2015,1.8,4.0,Automatic,Front-Wheel Drive,Compact Cars,CNG,0.06,1000,135.459257,11.478857,16.155428,13.179428


Drop the column "Combined MPG"

In [161]:
cars.drop(columns="Combined Km/Liter",inplace=True)

In [162]:
cars.columns

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'Fuel Cost/Year', 'CO2 Emission Grams/Km',
       'City Km/Liter', 'Highway Km/Liter'],
      dtype='object')

In [163]:
# Change column names to these ones:
col_names = ["Brand", "Model", "Year", "Engine_cc", "Cyl", "Trans", "Drivetrain", "Class", "Fuel_type", "Barrels_per_year", "City_MPG", "Highway_MPG", "CO2_grams_per_km", "Fuel_cost_per_year"]

In [164]:
col_names = [ item.replace(" ","_") for item in cars.columns ]
cars.columns = col_names

In [165]:
conversion = {"Make": "Brand", "Model":"Model","Year": "Year", "Engine Displacement": "Engine_cc", 
 "Cylinders":"Cyl", "Transmission":"Trans", "Drivetrain": "Drivetrain", "Vehicle Class":"Class",
 "Fuel Type":"Fuel_Type", "Fuel Barrels/Year": "Barrels_per_year"}

In [166]:
cars.rename(columns=conversion, inplace = True)

In [167]:
cars.columns

Index(['Brand', 'Model', 'Year', 'Engine_Displacement', 'Cyl', 'Trans',
       'Drivetrain', 'Vehicle_Class', 'Fuel_Type', 'Fuel_Barrels/Year',
       'Fuel_Cost/Year', 'CO2_Emission_Grams/Km', 'City_Km/Liter',
       'Highway_Km/Liter'],
      dtype='object')

What brand has the most cars?

In [168]:
cars\
.groupby("Brand")\
.count()\
.sort_values("Model", ascending=False)\
.reset_index()\
.iloc[0,0]

'Chevrolet'

In [179]:
cars

Unnamed: 0,Brand,Model,Year,Engine_Displacement,Cyl,Trans,Drivetrain,Vehicle_Class,Fuel_Type,Fuel_Barrels/Year,Fuel_Cost/Year,CO2_Emission_Grams/Km,City_Km/Liter,Highway_Km/Liter
0,AMG,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,1950,324.831736,7.652571,7.227428
1,AMG,FJ8c Post Office,1984,4.2,6.0,Automatic,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,2550,424.779962,5.526857,5.526857
2,AMG,Post Office DJ5 2WD,1985,2.5,4.0,Automatic,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,2100,345.133719,6.802286,7.227428
3,AMG,Post Office DJ8 2WD,1985,4.2,6.0,Automatic,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,2550,424.779962,5.526857,5.526857
4,ASC,GNX,1987,3.8,6.0,Automatic,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,2550,345.133719,5.952000,8.928000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,151.614948,14.454857,16.155428
35948,smart,fortwo coupe,2014,1.0,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,150.993575,14.454857,16.155428
35949,smart,fortwo coupe,2015,1.0,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,151.614948,14.454857,16.155428
35950,smart,fortwo coupe,2016,0.9,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,152.857693,14.454857,16.580571


In [170]:
cars.sort_values(by='Brand') 

Unnamed: 0,Brand,Model,Year,Engine_Displacement,Cyl,Trans,Drivetrain,Vehicle_Class,Fuel_Type,Fuel_Barrels/Year,Fuel_Cost/Year,CO2_Emission_Grams/Km,City_Km/Liter,Highway_Km/Liter
0,AMG,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,1950,324.831736,7.652571,7.227428
1,AMG,FJ8c Post Office,1984,4.2,6.0,Automatic,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,2550,424.779962,5.526857,5.526857
2,AMG,Post Office DJ5 2WD,1985,2.5,4.0,Automatic,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,2100,345.133719,6.802286,7.227428
3,AMG,Post Office DJ8 2WD,1985,4.2,6.0,Automatic,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,2550,424.779962,5.526857,5.526857
4,ASC,GNX,1987,3.8,6.0,Automatic,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,2550,345.133719,5.952000,8.928000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35933,smart,fortwo cabriolet,2011,1.0,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,153.392764,14.029714,17.430857
35932,smart,fortwo cabriolet,2010,1.0,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,153.392764,14.029714,17.430857
35950,smart,fortwo coupe,2016,0.9,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,152.857693,14.454857,16.580571
35940,smart,fortwo convertible,2008,1.0,3.0,Automatic,Rear-Wheel Drive,Two Seaters,Premium,9.155833,1100,153.392764,14.029714,17.430857


What brand has the worse CO2 Emissions on average?

Hint: use the function `sort_values()`

In [219]:
grouped = cars.groupby('Brand')['CO2_Emission_Grams/Km'].mean()


cars.groupby('Brand')['CO2_Emission_Grams/Km'].mean()

Brand
AMG                            379.881345
ASC                            345.133719
Acura                          262.583000
Alfa Romeo                     288.287195
American Motors Corporation    314.264744
                                  ...    
Volkswagen                     244.038998
Volvo                          270.796572
Wallace Environmental          408.857065
Yugo                           221.251107
smart                          153.498052
Name: CO2_Emission_Grams/Km, Length: 124, dtype: float64

In [221]:
cars.groupby('Brand').mean().count()
#.sort_values("Model", ascending=False)\
#.reset_index()\
#.iloc[0,0]

#grouped.describe()

Year                     124
Engine_Displacement      124
Cyl                      124
Fuel_Barrels/Year        124
Fuel_Cost/Year           124
CO2_Emission_Grams/Km    124
City_Km/Liter            124
Highway_Km/Liter         124
dtype: int64

In [216]:
import pandas as pd
import numpy as np

rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': rng.randint(0, 10, 6)},
                   columns = ['key', 'data1', 'data2'])
L = [0, 1, 0, 1, 2, 0]

df.groupby(L).sum()
df.groupby(L).groups
{0: [0, 2, 5], 1: [1, 3], 2: [4]}
#The groups parameter gives you the elements of each group

{0: [0, 2, 5], 1: [1, 3], 2: [4]}

<b>show the average CO2_Emission_Grams/Km  by Brand

In [172]:
mean(cars.sort_values("CO2_Emission_Grams/Km", ascending=False))

<b>show the average CO2_Emission_Grams/Km  by Brand ... sorted

In [173]:
### your code us here

In [174]:
### your code us here

Use `pd.cut` or `pd.qcut` to create 4 groups (bins) of cars, by Year. We want to explore how cars have evolved decade by decade.

In [175]:
cars['Year'].describe()

count    35952.00000
mean      2000.71640
std         10.08529
min       1984.00000
25%       1991.00000
50%       2001.00000
75%       2010.00000
max       2017.00000
Name: Year, dtype: float64

In [176]:
## your code here

In [177]:
cars[['Year','Decade']]

KeyError: "['Decade'] not in index"

In [None]:
cars.loc[:,['Year','Decade']]

In [None]:
cars["Year_range"]= pd.cut(cars["Year"], 
                             bins = [1980,1989,1999,2009,2019],
                             labels=["80s", "90s", "00s", "10s"])

cars.loc[:,['Year','Decade','Year_range']]

### Did cars consume more gas in the eighties?

In [None]:
cars.columns

show the average City_Km/Liter by year_range

In [None]:
### your code is here

Which brands are more environment friendly?

In [None]:
### your code is here

Does the drivetrain affect fuel consumption?

In [None]:
# We can also sort by 2 columns 
# (the second column only matters in case there's a tie sorting by the first one)
cars.groupby("Drivetrain")[["Highway_Km/Liter","City_Km/Liter"]].mean().sort_values("City_Km/Liter",ascending=False)

Do cars with automatic transmission consume more fuel than cars with manual transmission?

In [None]:
cars.columns

In [None]:
cars.groupby("Trans")[["City_Km/Liter"]].mean().sort_values("City_Km/Liter",ascending=False)

Use `groupby` and `aggregate` with different aggregation measures for different columns:

aggregate with average City_Km/Liter and the count of the Trans

In [None]:
## your code is here

aggregate with average City_Km/Liter and the minimum of the Trans

In [None]:
### your code is here