# Basic Column Analysis

The basic column analysis includes the following:

1. Viewing all the columns in a dataset
2. View info and statistical details of the columns
3. Renaming of columns
4. Rearanging of columns
5. Selecting specific columns

In [1]:
import pandas as pd
data = pd.read_csv('../00_Datasets/Toyota.csv')

## Viewing the Columns of the Dataset

In [2]:
cols = list(data.columns)
cols

['Unnamed: 0',
 'Price',
 'Age',
 'KM',
 'FuelType',
 'HP',
 'MetColor',
 'Automatic',
 'CC',
 'Doors',
 'Weight']

## Viewing Statistical Details of Columns

In [3]:
# Viewing general info about the dataset:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1436 entries, 0 to 1435
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  1436 non-null   int64  
 1   Price       1436 non-null   int64  
 2   Age         1336 non-null   float64
 3   KM          1436 non-null   object 
 4   FuelType    1336 non-null   object 
 5   HP          1436 non-null   object 
 6   MetColor    1286 non-null   float64
 7   Automatic   1436 non-null   int64  
 8   CC          1436 non-null   int64  
 9   Doors       1436 non-null   object 
 10  Weight      1436 non-null   int64  
dtypes: float64(2), int64(5), object(4)
memory usage: 123.5+ KB


In [4]:
# Viewing statistics of numerical columns:

data.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Unnamed: 0,1436.0,717.5,414.681806,0.0,358.75,717.5,1076.25,1435.0
Price,1436.0,10730.824513,3626.964585,4350.0,8450.0,9900.0,11950.0,32500.0
Age,1336.0,55.672156,18.589804,1.0,43.0,60.0,70.0,80.0
MetColor,1286.0,0.674961,0.468572,0.0,0.0,1.0,1.0,1.0
Automatic,1436.0,0.05571,0.229441,0.0,0.0,0.0,0.0,1.0
CC,1436.0,1566.827994,187.182436,1300.0,1400.0,1600.0,1600.0,2000.0
Weight,1436.0,1072.45961,52.64112,1000.0,1040.0,1070.0,1085.0,1615.0


In [5]:
# Viewing memory usage by each column:

data.memory_usage()

Index           128
Unnamed: 0    11488
Price         11488
Age           11488
KM            11488
FuelType      11488
HP            11488
MetColor      11488
Automatic     11488
CC            11488
Doors         11488
Weight        11488
dtype: int64

In [6]:
# Viewing datatype of each column:

data.dtypes

Unnamed: 0      int64
Price           int64
Age           float64
KM             object
FuelType       object
HP             object
MetColor      float64
Automatic       int64
CC              int64
Doors          object
Weight          int64
dtype: object

In [14]:
# Viewing number of nulls present in each column:

data.isnull().sum()

id              0
fuel          100
price           0
age           100
km              0
hp              0
metalcolor    150
auto            0
cc              0
doors           0
weight          0
dtype: int64

## Renaming Columns

In [8]:
# Creating a list of renamed column names to rename existing column:

renamed_cols = [
 'id',
 'price',
 'age',
 'km',
 'fuel',
 'hp',
 'metalcolor',
 'auto',
 'cc',
 'doors',
 'weight'
]

# Renaming columns:

data.columns = renamed_cols

# View the dataset post rename:
data.head(10)

Unnamed: 0,id,price,age,km,fuel,hp,metalcolor,auto,cc,doors,weight
0,0,13500,23.0,46986,Diesel,90,1.0,0,2000,three,1165
1,1,13750,23.0,72937,Diesel,90,1.0,0,2000,3,1165
2,2,13950,24.0,41711,Diesel,90,,0,2000,3,1165
3,3,14950,26.0,48000,Diesel,90,0.0,0,2000,3,1165
4,4,13750,30.0,38500,Diesel,90,0.0,0,2000,3,1170
5,5,12950,32.0,61000,Diesel,90,0.0,0,2000,3,1170
6,6,16900,27.0,??,Diesel,????,,0,2000,3,1245
7,7,18600,30.0,75889,,90,1.0,0,2000,3,1245
8,8,21500,27.0,19700,Petrol,192,0.0,0,1800,3,1185
9,9,12950,23.0,71138,Diesel,????,,0,1900,3,1105


## Rearranging Columns

In [9]:
# Crearting rearranged column list:

rearranged_cols = [
 'id',
 'fuel',
 'price',
 'age',
 'km',
 'hp',
 'metalcolor',
 'auto',
 'cc',
 'doors',
 'weight'
]

# Rearranging columns 

data = data.reindex(columns = rearranged_cols)

# Viewing the dataset post rearrangement:

data.head(10)

Unnamed: 0,id,fuel,price,age,km,hp,metalcolor,auto,cc,doors,weight
0,0,Diesel,13500,23.0,46986,90,1.0,0,2000,three,1165
1,1,Diesel,13750,23.0,72937,90,1.0,0,2000,3,1165
2,2,Diesel,13950,24.0,41711,90,,0,2000,3,1165
3,3,Diesel,14950,26.0,48000,90,0.0,0,2000,3,1165
4,4,Diesel,13750,30.0,38500,90,0.0,0,2000,3,1170
5,5,Diesel,12950,32.0,61000,90,0.0,0,2000,3,1170
6,6,Diesel,16900,27.0,??,????,,0,2000,3,1245
7,7,,18600,30.0,75889,90,1.0,0,2000,3,1245
8,8,Petrol,21500,27.0,19700,192,0.0,0,1800,3,1185
9,9,Diesel,12950,23.0,71138,????,,0,1900,3,1105


## Selecting Specific Columns

In [11]:
# Columns to include:

required_cols = [
 'id',
 'fuel',
 'price',
]

# Creating a data sub-set:

chunk = data[required_cols]

# View the dataset:

chunk.head(10)

Unnamed: 0,id,fuel,price
0,0,Diesel,13500
1,1,Diesel,13750
2,2,Diesel,13950
3,3,Diesel,14950
4,4,Diesel,13750
5,5,Diesel,12950
6,6,Diesel,16900
7,7,,18600
8,8,Petrol,21500
9,9,Diesel,12950
