## Min, Max and Range of Data

#### Data range for Continuous Variable

* Minimum Value of the variable
* Maximum value of the variable
* Range of the variable

#### Minimum Value

Minimum value is the smallest observation of the data.

E.g. What is the minimum current balance for a customer across all customer?


#### Maximum Value
Maximum value is the largest observation of the data.

E.g. What is the maximum current month debit for a customer across all customers?


#### Range
Range is the set of all data points which lie between Minimum and Maximum value of the data.

Range is the numerical indication of the span of our data.

Ex. Difference between the age of eldest and youngest Customer.

In [1]:
#import libraries
import pandas as pd
import numpy as np

This is the dataset for __Customer Churn Problem__. 


In [2]:
# importing dataset
data = pd.read_csv('churn_prediction.csv')

Identification of __Datatypes__

In [9]:
data.dtypes

customer_id                         int64
vintage                             int64
age                                 int64
gender                             object
dependents                        float64
occupation                         object
city                              float64
customer_nw_category                int64
branch_code                         int64
current_balance                   float64
previous_month_end_balance        float64
average_monthly_balance_prevQ     float64
average_monthly_balance_prevQ2    float64
current_month_credit              float64
previous_month_credit             float64
current_month_debit               float64
previous_month_debit              float64
current_month_balance             float64
previous_month_balance            float64
churn                               int64
last_transaction                   object
dtype: object

## Isolating numerical columns

Storing indices of  __Integer and Float__ in numercial_cols because we are dealing with __numerical variables__

In [10]:
# storing indices of all numerical data types in numerical_cols
numerical_cols = data.select_dtypes(include=['int', 'float']).columns

# checking
numerical_cols

Index(['dependents', 'city', 'current_balance', 'previous_month_end_balance',
       'average_monthly_balance_prevQ', 'average_monthly_balance_prevQ2',
       'current_month_credit', 'previous_month_credit', 'current_month_debit',
       'previous_month_debit', 'current_month_balance',
       'previous_month_balance'],
      dtype='object')

### Min obseravtion

In [5]:
# observation with minimum current balance
data[data['current_balance'] == data['current_balance'].min()]

Unnamed: 0,customer_id,vintage,age,gender,dependents,occupation,city,customer_nw_category,branch_code,current_balance,...,average_monthly_balance_prevQ,average_monthly_balance_prevQ2,current_month_credit,previous_month_credit,current_month_debit,previous_month_debit,current_month_balance,previous_month_balance,churn,last_transaction
12608,13467,2140,80,Male,0.0,retired,1096.0,1,27,-5503.96,...,1694.57,868.26,9471.01,2680.04,15229.44,7859.37,1050.17,2002.97,1,2019-12-26


* Customer's id is 13467
* Customer has __minimum current balance__ is -5503.96 


### Max observation

In [6]:
# obseravtion with maxximum current month debit
data[data['current_month_debit'] == data['current_month_debit'].max()]

Unnamed: 0,customer_id,vintage,age,gender,dependents,occupation,city,customer_nw_category,branch_code,current_balance,...,average_monthly_balance_prevQ,average_monthly_balance_prevQ2,current_month_credit,previous_month_credit,current_month_debit,previous_month_debit,current_month_balance,previous_month_balance,churn,last_transaction
24095,25712,1902,90,Male,0.0,retired,1020.0,2,5,46.5,...,11728.39,111617.41,12269845.39,0.21,7637857.36,0.21,8399.62,24270.54,1,2019-12-13


* Customer's id is 25712
* Customer has __maximum current month debit__ is   7637857.36 


### Range 

 __Range of Age__ in our datase indicating the difference of Age between the oldest and youngest customers

In [7]:
# Range of Age 

print(data['age'].min(),  data['age'].max())

1 90


* Oldest Customer Age is 90
* Youngest Customer Age is 1
* Range is [1,90]

### Max, Min, Range for each column

In [11]:
# Printing Max of evey numerical column
data[numerical_cols].max()

dependents                              52.00
city                                  1649.00
current_balance                    5905904.03
previous_month_end_balance         5740438.63
average_monthly_balance_prevQ      5700289.57
average_monthly_balance_prevQ2     5010170.10
current_month_credit              12269845.39
previous_month_credit              2361808.29
current_month_debit                7637857.36
previous_month_debit               1414168.06
current_month_balance              5778184.77
previous_month_balance             5720144.50
dtype: float64

* Maximum value of vintage for a customer is 12899.
* Maximum age of a customer in our dataset is 90
* Maximum number of dependents in our dataset is 52
* Maximum day since last transaction is 365
* Maximum values for __current_balance, previous_month_end_balance,average_monthly_balance_prevQ,    current_month_balance, previous_month_balance__ are close to 57 lakhs.
* Maximum value for current_month_credit is 12269845.39
* Maximum value for previous_month_credit is 2361808.29
* maximum value for current_month_debit and previous_month debit is respectively 7637857.36 and 1414168.06.
* The features like __customer_id, city, customer_nw_category, branch_code, churn__ are required to be treated as categorcial variable so their maximum value don't represent numerical significance.


In [15]:
# printing min of every numercial column
data[numerical_cols].min()

dependents                            0.00
city                                  0.00
current_balance                   -5503.96
previous_month_end_balance        -3149.57
average_monthly_balance_prevQ      1428.69
average_monthly_balance_prevQ2   -16506.10
current_month_credit                  0.01
previous_month_credit                 0.01
current_month_debit                   0.01
previous_month_debit                  0.01
current_month_balance             -3374.18
previous_month_balance            -5171.92
dtype: float64

In [12]:
for col in numerical_cols:
    print("range of {}{}{}{}{}{}{}{}".format(col,":"," ","[",data[col].min(), ", ",data[col].max(),"]"))

range of dependents: [0.0, 52.0]
range of city: [0.0, 1649.0]
range of current_balance: [-5503.96, 5905904.03]
range of previous_month_end_balance: [-3149.57, 5740438.63]
range of average_monthly_balance_prevQ: [1428.69, 5700289.57]
range of average_monthly_balance_prevQ2: [-16506.1, 5010170.1]
range of current_month_credit: [0.01, 12269845.39]
range of previous_month_credit: [0.01, 2361808.29]
range of current_month_debit: [0.01, 7637857.36]
range of previous_month_debit: [0.01, 1414168.06]
range of current_month_balance: [-3374.18, 5778184.77]
range of previous_month_balance: [-5171.92, 5720144.5]


* Range of current_month_credit is highest among all features.
* Range of days_since_last_transaction is 1 year.