# 1. What is data explorations 

Data exploration is the first step in any data analysis task.

It helps you understand the structure, quality, patterns, and distribution of your dataset before performing deeper analysis or model building.

# 2. Checking Data Info

.info()

Shows the summary of the DataFrame.

Includes number of rows, columns, data types, non-null counts, and memory usage.

Useful for detecting missing values and understanding data types.

.shape

Returns a tuple (rows, columns).

Helps understand how big your dataset is.

.columns

Shows the names of all columns.

Helps confirm data loaded correctly and names are as expected.

These methods give a quick structural overview.

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('Toyota.csv')

# Shows full summary: columns, types, non-null values
df.info()

# Shape of dataset (rows, columns)
print(df.shape)

# List of column names
print(df.columns)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1436 entries, 0 to 1435
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  1436 non-null   int64  
 1   Price       1436 non-null   int64  
 2   Age         1336 non-null   float64
 3   KM          1436 non-null   object 
 4   FuelType    1336 non-null   object 
 5   HP          1436 non-null   object 
 6   MetColor    1286 non-null   float64
 7   Automatic   1436 non-null   int64  
 8   CC          1436 non-null   int64  
 9   Doors       1436 non-null   object 
 10  Weight      1436 non-null   int64  
dtypes: float64(2), int64(5), object(4)
memory usage: 123.5+ KB
(1436, 11)
Index(['Unnamed: 0', 'Price', 'Age', 'KM', 'FuelType', 'HP', 'MetColor',
       'Automatic', 'CC', 'Doors', 'Weight'],
      dtype='object')


# 2. Previewing Data

.head()

Displays the top 5 rows by default.

Helps view sample data, check formatting, or confirm loading.

.tail()

Shows the last 5 rows.

Helpful for checking end-of-file anomalies or data ordering.

These help you visually inspect the dataset.

In [None]:
# First 5 rows
print(df.head())

# Last 5 rows
print(df.tail())

   Unnamed: 0  Price   Age     KM FuelType  HP  MetColor  Automatic    CC  \
0           0  13500  23.0  46986   Diesel  90       1.0          0  2000   
1           1  13750  23.0  72937   Diesel  90       1.0          0  2000   
2           2  13950  24.0  41711   Diesel  90       NaN          0  2000   
3           3  14950  26.0  48000   Diesel  90       0.0          0  2000   
4           4  13750  30.0  38500   Diesel  90       0.0          0  2000   

   Doors  Weight  
0  three    1165  
1      3    1165  
2      3    1165  
3      3    1165  
4      3    1170  
      Unnamed: 0  Price   Age     KM FuelType   HP  MetColor  Automatic    CC  \
1431        1431   7500   NaN  20544   Petrol   86       1.0          0  1300   
1432        1432  10845  72.0     ??   Petrol   86       0.0          0  1300   
1433        1433   8500   NaN  17016   Petrol   86       0.0          0  1300   
1434        1434   7250  70.0     ??      NaN   86       1.0          0  1300   
1435        1435  

# 4. Descriptive Statistics

.describe()

Provides statistical summary for numerical columns.

Includes mean, min, max, std, percentiles, count.


In [None]:
# Statistical summary of numerical columns
df.describe()

Unnamed: 0.1,Unnamed: 0,Price,Age,MetColor,Automatic,CC,Weight
count,1436.0,1436.0,1336.0,1286.0,1436.0,1436.0,1436.0
mean,717.5,10730.824513,55.672156,0.674961,0.05571,1566.827994,1072.45961
std,414.681806,3626.964585,18.589804,0.468572,0.229441,187.182436,52.64112
min,0.0,4350.0,1.0,0.0,0.0,1300.0,1000.0
25%,358.75,8450.0,43.0,0.0,0.0,1400.0,1040.0
50%,717.5,9900.0,60.0,1.0,0.0,1600.0,1070.0
75%,1076.25,11950.0,70.0,1.0,0.0,1600.0,1085.0
max,1435.0,32500.0,80.0,1.0,1.0,2000.0,1615.0


# 5. Value Counts

.value_counts()

Counts how many times each unique value appears in a column.

Very useful for:

categorical data

class distribution

detecting imbalance (classification tasks)

Helps you understand frequency and distribution of categories.

In [None]:
# For a specific column, example: 'City'
df['FuelType'].value_counts()

# With sorting disabled
df['FuelType'].value_counts(sort=False)


FuelType
Diesel     144
Petrol    1177
CNG         15
Name: count, dtype: int64

# 6. Checking Unique Values

.unique()

Returns list of all unique values in a column.

Useful for exploring categories or identifying inconsistent entries.

.nunique()

Returns number of unique values.

Helps determine variety or cardinality of a column.

Great for categorical data exploration.

In [None]:
# Unique values in a column
print(df['FuelType'].unique())

# Number of unique values
print(df['FuelType'].nunique())


['Diesel' nan 'Petrol' 'CNG']
3


# 7. Correlation and Covariance

.corr()

Shows correlation between numerical variables.

Values range from -1 to +1.

Helps identify:

relationships

patterns

multicollinearity

High correlation â†’ strong relationship.

.cov()

Shows covariance between variables.

Indicates how two variables change together.

More raw and less standardized than correlation.

Used for feature selection, statistical analysis, and modelling decisions.

In [None]:
# Correlation matrix
print(df.corr(numeric_only=True))

# Covariance matrix
print(df.cov(numeric_only=True))


            Unnamed: 0     Price       Age  MetColor  Automatic        CC  \
Unnamed: 0    1.000000 -0.738289  0.907090 -0.078616   0.066299 -0.184490   
Price        -0.738289  1.000000 -0.878407  0.112041   0.033081  0.165067   
Age           0.907090 -0.878407  1.000000 -0.099659   0.032573 -0.120706   
MetColor     -0.078616  0.112041 -0.099659  1.000000  -0.013973  0.029189   
Automatic     0.066299  0.033081  0.032573 -0.013973   1.000000 -0.069321   
CC           -0.184490  0.165067 -0.120706  0.029189  -0.069321  1.000000   
Weight       -0.414577  0.581198 -0.464299  0.057142   0.057249  0.651450   

              Weight  
Unnamed: 0 -0.414577  
Price       0.581198  
Age        -0.464299  
MetColor    0.057142  
Automatic   0.057249  
CC          0.651450  
Weight      1.000000  
              Unnamed: 0         Price           Age    MetColor  Automatic  \
Unnamed: 0  1.719610e+05 -1.110414e+06   6976.838709  -15.300569   6.308014   
Price      -1.110414e+06  1.315487e+07 -5