# Data Preparation Basics
## Part 5 - Grouping and data aggregation

In [1]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

### Grouping data by column index

The mtcars dataset appears to contain a variety of information about cars, including their fuel efficiency, engine characteristics, weight, and performance-related features. It's a common type of dataset used for analyzing and comparing different car models.

## Here's a description of each column in the dataset:
**ar_names**: This column likely contains the names or identifiers of different car models. Each row represents a specific car in the dataset.


**mpg**: This column typically represents the miles per gallon (mpg) or fuel efficiency of each car. It measures how many miles a car can travel per gallon of fuel. Higher values indicate better fuel efficiency.


**cyl**: This column likely represents the number of cylinders in the car's engine. It's a measure of engine size and power. Common values include 4, 6, or 8 cylinders.


**disp**: This column might represent engine displacement, which is the total volume of air/fuel mixture an engine can draw in during one complete engine cycle. It is typically measured in cubic inches (or liters). Engine displacement is related to the car's performance and power.


**hp**: This column is likely the horsepower (hp) rating of the car's engine. Horsepower is a measure of the engine's power output. Higher values generally indicate more powerful engines.


**drat**: This column could represent the "drivetrain" or "rear axle ratio." The rear axle ratio affects the car's acceleration and fuel efficiency. It represents the ratio of the number of revolutions of the driveshaft to one revolution of the rear wheel.


**wt**: This column likely represents the weight (in some units, such as pounds or kilograms) of the car. The car's weight can impact its handling, fuel efficiency, and performance.


**qsec**: This column might represent the time (in seconds) it takes for the car to accelerate from 0 to 60 miles per hour (or a similar measure). It's a performance metric, and lower values generally indicate faster acceleration.


**vs**: This column could represent the "engine type" or "engine configuration." Common values include 0 and 1, which might indicate a V-shaped engine (V6 or V8) and an inline engine (4 or 6 cylinders), respectively.


**am**: This column might represent whether the car has an automatic (0) or manual (1) transmission. It's an indicator of the car's transmission type.


**gear**: This column likely represents the number of forward gears in the car's transmission. It can affect the car's performance and driving experience.


**carb**: This column could represent the number of carburetors in the car's engine. Carburetors are components that mix air and fuel for combustion in older vehicles. The number of carburetors can affect engine performance and fuel delivery.




In [2]:
address = '/content/sample_data/mtcars.csv'

cars = pd.read_csv(address)

cars.columns = ['car_names', 'mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
cars.head(10)

Unnamed: 0,car_names,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
5,Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
6,Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
7,Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
8,Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
9,Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [None]:
cars_groups = cars.groupby(cars['cyl'])
cars_groups.mean()

Unnamed: 0_level_0,mpg,disp,hp,drat,wt,qsec,vs,am,gear,carb
cyl,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
4,26.663636,105.136364,82.636364,4.070909,2.285727,19.137273,0.909091,0.727273,4.090909,1.545455
6,19.742857,183.314286,122.285714,3.585714,3.117143,17.977143,0.571429,0.428571,3.857143,3.428571
8,15.1,353.1,209.214286,3.229286,3.999214,16.772143,0.0,0.142857,3.285714,3.5


In [None]:
cars.describe

<bound method NDFrame.describe of               car_names   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  \
0             Mazda RX4  21.0    6  160.0  110  3.90  2.620  16.46   0   1   
1         Mazda RX4 Wag  21.0    6  160.0  110  3.90  2.875  17.02   0   1   
2            Datsun 710  22.8    4  108.0   93  3.85  2.320  18.61   1   1   
3        Hornet 4 Drive  21.4    6  258.0  110  3.08  3.215  19.44   1   0   
4     Hornet Sportabout  18.7    8  360.0  175  3.15  3.440  17.02   0   0   
5               Valiant  18.1    6  225.0  105  2.76  3.460  20.22   1   0   
6            Duster 360  14.3    8  360.0  245  3.21  3.570  15.84   0   0   
7             Merc 240D  24.4    4  146.7   62  3.69  3.190  20.00   1   0   
8              Merc 230  22.8    4  140.8   95  3.92  3.150  22.90   1   0   
9              Merc 280  19.2    6  167.6  123  3.92  3.440  18.30   1   0   
10            Merc 280C  17.8    6  167.6  123  3.92  3.440  18.90   1   0   
11           Merc 450SE  16.4 