### Data Manipulation on Loan DataSet

* Below are the operations I am going to perform
    * Connecting to AWS Database
    * Get all existing tables from database
    * Load database tables into pandas dataframe
    * Check whether all columns have right datatypes
    * Merging tables (customers, loans, transactions)
    
    * get month from date

* Answering Business Questions
    * Q1. Total Loan amount disbursed by DisbursedMonth, Gender, Tenor
        * Get month name from the given date
        * Use groupby on multiple cols and agg
    * Q2. Average days to disburse loan from application date ( Loan creation date, Loan Disbursement Date)
        * Calculate difference between dates
        * Calculate mean on above based on condition given on specific value of a col
    * Q3. Average Loan amount disbursed to Female for Business Purpose
    * Q4. Average Age of Customers requesting loan for Educational Purpose
        * Calculating age of customers based on DateOfBirth column- from (now-DoB)
        * Calculate mean on above based on condition given on specific value of a col
    * Q5. Average tenor for the loan disbursed for customers with the max credit score
    * Q5. Customers with the max credit score, average tenor for the loan disbursed to them
        * Create multiple filters based on given criteria


In [2]:
import pandas as pd
from sqlalchemy import create_engine

In [3]:
#install pymssql
!pip install pymssql



You should consider upgrading via the 'c:\programdata\anaconda3\python.exe -m pip install --upgrade pip' command.


In [4]:
import pymssql

### Connecting to AWS Database

In [7]:
uri = "redacted"
engine = create_engine(uri)

In [8]:
conn = engine.connect()

In [None]:
# Loans data, customer demographice

### Get all existing tables from Database

In [115]:
# All tables from the DataBase
pd.read_sql('SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES', conn)


Unnamed: 0,TABLE_NAME
0,LoanAccounts
1,TransactionsLog
2,Customers


### Load database tables into pandas dataframe

In [10]:
#Analysing LoanAccounts Data

In [11]:
query1 = 'SELECT * FROM [LOANAccounts]'

loans = pd.read_sql(query1, conn)

In [12]:
#Exporting to csv jus for my reference
loans.to_csv('loans.csv')

In [116]:
loans.head()

Unnamed: 0,Id,CreatedDate,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,FirstRepaymentDate,MaturityDate,TotalOutstandingPrincipal,TotalOutstandingInterest,TotalOutstandingBalance,IsEmployed,IsHomeOwner,RepaymentType,LoanPurpose
0,1,2021-06-08 21:54:53.830,2,40000.0,1,,54933.33,3,312.0,Rejected,2021-07-07,2021-08-09 21:54:53.820,40000.0,124800.0,164800.0,True,,1,Business
1,2,2021-06-09 16:17:15.367,2,45000.0,1,,30900.0,6,312.0,Rejected,2021-07-08,2021-11-09 16:17:15.357,45000.0,140400.0,185400.0,True,,1,SpecialEvents
2,3,2021-06-09 18:01:55.690,2,45000.0,1,,30900.0,6,312.0,Rejected,2021-07-08,2021-11-09 18:01:55.690,45000.0,140400.0,185400.0,True,,1,Investment
3,4,2021-06-09 18:16:59.903,2,25000.0,1,,34333.33,3,312.0,Rejected,2021-07-08,2021-08-09 18:16:59.900,25000.0,78000.0,103000.0,True,,1,Investment
4,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,2021-07-08,2021-09-09 18:49:50.483,25000.0,18099.93,43099.93,True,,1,Rent


In [14]:
#Check whether all columns are having right datatypes
loans.dtypes

Id                                    int64
CreatedDate                  datetime64[ns]
CustomerId                            int64
LoanAmount                          float64
ProductId                             int64
LoanReason                           object
Installment                         float64
Tenor                                 int64
Rate                                float64
LoanStatus                           object
FirstRepaymentDate           datetime64[ns]
MaturityDate                 datetime64[ns]
TotalOutstandingPrincipal           float64
TotalOutstandingInterest            float64
TotalOutstandingBalance             float64
IsEmployed                             bool
IsHomeOwner                          object
RepaymentType                         int64
LoanPurpose                          object
dtype: object

In [15]:
query2 = 'SELECT * FROM Customers'

customers = pd.read_sql(query2, conn)

In [16]:
#Exporting to csv jus for my reference
customers.to_csv('customers.csv')

In [17]:
loans['LoanReason'].unique()

array(['NA'], dtype=object)

In [18]:
customers.head(2)

Unnamed: 0,Id,DateOfBirth,Gender,MaritalStatus,CreditScore,LastLoginTime,IsEmailConfirmed,EmailConfirmationDate,PhoneNumberConfirmationDate,IsPhoneNumberConfirmed,EmployerSector,EmploymentStatus,IsVerified,IsBasicProfileComplete
0,1,1997-06-20 00:00:00,Female,Single,0,,True,2021-06-08 20:04:32.227,,False,Private,Employed,,True
1,2,1983-03-25 00:00:00,Male,Married,0,,True,2021-06-08 20:50:22.583,,False,Private,Employed,,True


In [19]:
customers.dtypes

Id                                      int64
DateOfBirth                            object
Gender                                 object
MaritalStatus                          object
CreditScore                            object
LastLoginTime                          object
IsEmailConfirmed                         bool
EmailConfirmationDate          datetime64[ns]
PhoneNumberConfirmationDate            object
IsPhoneNumberConfirmed                   bool
EmployerSector                         object
EmploymentStatus                       object
IsVerified                             object
IsBasicProfileComplete                   bool
dtype: object

In [20]:
#Get the TransactionLog data from database and store it in dataframe
query3 = 'SELECT * FROM [TransactionsLog]'

transactions = pd.read_sql(query3, conn)

In [117]:
transactions.head(2)

Unnamed: 0,Id,CreatedDate,Amount,PaymentType,LoanAccountId,RepaymentDate,RepaymentDeduction
0,1,2021-06-09 18:51:36.960,25000.0,LoanDisbursement,5,2021-06-09 18:51:36.960,
1,2,2021-06-09 19:03:07.937,45000.0,LoanDisbursement,7,2021-06-09 19:03:07.937,


In [26]:
transactions.PaymentType.value_counts()

CPA                        5919
BankTransfer               5077
LoanDisbursement           2798
CardPayment                2566
Liquidation                1357
Withdrawal_Cancellation     315
CashDeposit                  15
GoodwillCredit                5
Name: PaymentType, dtype: int64

### Merging Tables 

* To merge 3 tables, 
* First I have merged to two tables And then merged the result table with another table

In [22]:
# Merging loan and transactions table, renaming the ID cloumns with suffixes
loan_transaction = loans.merge(transactions, left_on="Id", right_on="LoanAccountId", suffixes=["_loans", "_transactions"])


In [23]:
loan_transaction.head(2)

Unnamed: 0,Id_loans,CreatedDate_loans,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,...,IsHomeOwner,RepaymentType,LoanPurpose,Id_transactions,CreatedDate_transactions,Amount,PaymentType,LoanAccountId,RepaymentDate,RepaymentDeduction
0,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,,1,Rent,1,2021-06-09 18:51:36.960,25000.0,LoanDisbursement,5,2021-06-09 18:51:36.960,
1,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,,1,Rent,3,2021-06-09 19:19:53.857,25000.0,CardPayment,5,2021-06-09 19:19:53.857,TotalReceived


In [118]:
#Merging loan_transaction and customers tables(Getting only specific columns from customers table)
customer_loan_transactions = loan_transaction.merge(customers[["Id","Gender","DateOfBirth","CreditScore"]], left_on="CustomerId", right_on="Id")


* customer_loan_transactions  is the final dataframe having loans,customers and transactions tables joined 
* so for simplicity I am renaming it as df

In [119]:
df = customer_loan_transactions

In [120]:
df.head(2)

Unnamed: 0,Id_loans,CreatedDate_loans,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,...,CreatedDate_transactions,Amount,PaymentType,LoanAccountId,RepaymentDate,RepaymentDeduction,Id,Gender,DateOfBirth,CreditScore
0,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,2021-06-09 18:51:36.960,25000.0,LoanDisbursement,5,2021-06-09 18:51:36.960,,2,Male,1983-03-25 00:00:00,0
1,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,2021-06-09 19:19:53.857,25000.0,CardPayment,5,2021-06-09 19:19:53.857,TotalReceived,2,Male,1983-03-25 00:00:00,0


In [121]:
df.PaymentType.value_counts()

CPA                        5919
BankTransfer               5077
LoanDisbursement           2798
CardPayment                2566
Liquidation                1357
Withdrawal_Cancellation     315
CashDeposit                  15
GoodwillCredit                5
Name: PaymentType, dtype: int64

In [122]:
df.columns

Index(['Id_loans', 'CreatedDate_loans', 'CustomerId', 'LoanAmount',
       'ProductId', 'LoanReason', 'Installment', 'Tenor', 'Rate', 'LoanStatus',
       'FirstRepaymentDate', 'MaturityDate', 'TotalOutstandingPrincipal',
       'TotalOutstandingInterest', 'TotalOutstandingBalance', 'IsEmployed',
       'IsHomeOwner', 'RepaymentType', 'LoanPurpose', 'Id_transactions',
       'CreatedDate_transactions', 'Amount', 'PaymentType', 'LoanAccountId',
       'RepaymentDate', 'RepaymentDeduction', 'Id', 'Gender', 'DateOfBirth',
       'CreditScore'],
      dtype='object')

### Q1. Total Loan amount disbursed by DisbursedMonth, Gender, Tenor

In [123]:
#get month from date
df['DisbursedMonth'] = df.CreatedDate_transactions.dt.month_name()#df['month'] = df['date_col'].dt.month_name()

In [125]:
df.columns

Index(['Id_loans', 'CreatedDate_loans', 'CustomerId', 'LoanAmount',
       'ProductId', 'LoanReason', 'Installment', 'Tenor', 'Rate', 'LoanStatus',
       'FirstRepaymentDate', 'MaturityDate', 'TotalOutstandingPrincipal',
       'TotalOutstandingInterest', 'TotalOutstandingBalance', 'IsEmployed',
       'IsHomeOwner', 'RepaymentType', 'LoanPurpose', 'Id_transactions',
       'CreatedDate_transactions', 'Amount', 'PaymentType', 'LoanAccountId',
       'RepaymentDate', 'RepaymentDeduction', 'Id', 'Gender', 'DateOfBirth',
       'CreditScore', 'DisbursedMonth'],
      dtype='object')

In [126]:
df[df['PaymentType'] == 'LoanDisbursement']

Unnamed: 0,Id_loans,CreatedDate_loans,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,...,Amount,PaymentType,LoanAccountId,RepaymentDate,RepaymentDeduction,Id,Gender,DateOfBirth,CreditScore,DisbursedMonth
0,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,25000.0,LoanDisbursement,5,2021-06-09 18:51:36.960,,2,Male,1983-03-25 00:00:00,0,June
3,7,2021-06-09 18:59:22.983,2,45000.0,1,,15598.05,6,312.0,Settled,...,45000.0,LoanDisbursement,7,2021-06-09 19:03:07.937,,2,Male,1983-03-25 00:00:00,0,June
5,16,2021-06-11 09:59:25.723,2,25000.0,1,,8665.58,6,312.0,Settled,...,25000.0,LoanDisbursement,16,2021-06-11 21:53:00.627,,2,Male,1983-03-25 00:00:00,0,June
12,5454,2021-09-10 09:57:07.190,2,30000.0,1,,10398.70,6,312.0,Settled,...,30000.0,LoanDisbursement,5454,2021-09-10 10:19:02.540,,2,Male,1983-03-25 00:00:00,0,September
26,5703,2021-09-14 16:59:59.220,2,25000.0,1,,12997.56,3,312.0,Settled,...,25000.0,LoanDisbursement,5703,2021-09-14 17:08:03.737,,2,Male,1983-03-25 00:00:00,0,September
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17999,5924,2021-09-19 09:07:55.967,47138,30000.0,1,,13357.74,3,192.0,Settled,...,30000.0,LoanDisbursement,5924,2021-09-21 14:18:04.933,,47138,Female,1986-10-11 00:00:00,811,September
18012,5949,2021-09-19 15:09:21.450,5333,50000.0,1,,22262.89,3,192.0,Due,...,50000.0,LoanDisbursement,5949,2021-09-21 18:48:15.437,,5333,Male,1979-07-17 00:00:00,789,September
18046,5955,2021-09-19 18:27:03.580,47553,25000.0,1,,12997.56,3,312.0,Due,...,25000.0,LoanDisbursement,5955,2021-09-21 13:24:03.267,,47553,Male,1984-12-20 00:00:00,623,September
18050,5961,2021-09-19 20:07:28.450,47609,45000.0,1,,15598.05,6,312.0,Due,...,45000.0,LoanDisbursement,5961,2021-09-20 18:20:15.610,,47609,Male,1966-10-30 00:00:00,599,September


In [127]:
#Total Loan amount disbursed by DisbursedMonth, Gender, Tenor
#Solution1
df[df['PaymentType'] == 'LoanDisbursement'].groupby(['DisbursedMonth','Gender','Tenor']).agg({'LoanAmount':'sum'})

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,LoanAmount
DisbursedMonth,Gender,Tenor,Unnamed: 3_level_1
August,Female,3,5025000.0
August,Female,4,1365000.0
August,Female,5,860000.0
August,Female,6,13998000.0
August,Male,3,15076800.0
August,Male,4,5007000.0
August,Male,5,2961000.0
August,Male,6,30420500.0
July,Female,3,4752500.0
July,Female,4,1522000.0


In [128]:
#Total Loan amount disbursed by DisbursedMonth, Gender, Tenor
#Solution2
df[df['PaymentType'] == 'LoanDisbursement'].groupby(['DisbursedMonth','Gender','Tenor'])['LoanAmount'].sum()

DisbursedMonth  Gender  Tenor
August          Female  3         5025000.0
                        4         1365000.0
                        5          860000.0
                        6        13998000.0
                Male    3        15076800.0
                        4         5007000.0
                        5         2961000.0
                        6        30420500.0
July            Female  3         4752500.0
                        4         1522000.0
                        5         1028000.0
                        6        12431000.0
                Male    3        13804000.0
                        4         3989000.0
                        5         1085000.0
                        6        22903000.0
June            Female  3          680000.0
                        4          190000.0
                        5          120000.0
                        6         1405000.0
                Male    3         2310000.0
                        4          457000.0
  

### Q2. Average days to disburse loan from application date ( Loan creation date, Loan Disbursement Date)

In [129]:
df.columns

Index(['Id_loans', 'CreatedDate_loans', 'CustomerId', 'LoanAmount',
       'ProductId', 'LoanReason', 'Installment', 'Tenor', 'Rate', 'LoanStatus',
       'FirstRepaymentDate', 'MaturityDate', 'TotalOutstandingPrincipal',
       'TotalOutstandingInterest', 'TotalOutstandingBalance', 'IsEmployed',
       'IsHomeOwner', 'RepaymentType', 'LoanPurpose', 'Id_transactions',
       'CreatedDate_transactions', 'Amount', 'PaymentType', 'LoanAccountId',
       'RepaymentDate', 'RepaymentDeduction', 'Id', 'Gender', 'DateOfBirth',
       'CreditScore', 'DisbursedMonth'],
      dtype='object')

In [130]:
df.PaymentType.value_counts()

CPA                        5919
BankTransfer               5077
LoanDisbursement           2798
CardPayment                2566
Liquidation                1357
Withdrawal_Cancellation     315
CashDeposit                  15
GoodwillCredit                5
Name: PaymentType, dtype: int64

In [132]:
df['days'].value_counts()

90 days 19:51:42.290000     6
86 days 16:13:49.687000     4
30 days 11:34:04.776000     4
125 days 03:41:47.446000    3
29 days 06:39:16.730000     3
                           ..
0 days 17:13:36.950000      1
29 days 18:03:54.477000     1
0 days 00:19:12.173000      1
30 days 03:43:01.620000     1
0 days 01:59:12.613000      1
Name: days, Length: 9236, dtype: int64

In [136]:
df['days'] = df.CreatedDate_transactions-df.CreatedDate_loans

In [137]:
df.head(2)

Unnamed: 0,Id_loans,CreatedDate_loans,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,...,PaymentType,LoanAccountId,RepaymentDate,RepaymentDeduction,Id,Gender,DateOfBirth,CreditScore,DisbursedMonth,days
0,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,LoanDisbursement,5,2021-06-09 18:51:36.960,,2,Male,1983-03-25 00:00:00,0,June,00:01:46.477000
1,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,CardPayment,5,2021-06-09 19:19:53.857,TotalReceived,2,Male,1983-03-25 00:00:00,0,June,00:30:03.374000


In [135]:
# Q2. Average days to disburse loan from application date ( Loan creation date, Loan Disbursement Date)
# solution1
df[df['PaymentType']=='LoanDisbursement']['days'].mean()

Timedelta('0 days 10:06:43.305943')

In [187]:
# Q2. Average days to disburse loan from application date ( Loan creation date, Loan Disbursement Date)
# solution1
df[df['PaymentType']=='LoanDisbursement'][['days']].mean()

days   10:06:43.305943
dtype: timedelta64[ns]

In [139]:
# Q2. Average days to disburse loan from application date ( Loan creation date, Loan Disbursement Date)
# solution2
df['days'].loc[df['PaymentType'] == 'LoanDisbursement'].mean()

Timedelta('0 days 10:06:43.305943')

In [197]:
# Q2. Average days to disburse loan from application date ( Loan creation date, Loan Disbursement Date)
# solution3
df['days'][df['PaymentType'] == 'LoanDisbursement'].mean()

Timedelta('0 days 10:06:43.305943')

### Q3. Average Loan amount disbursed to Female for Business Purpose

In [163]:
df.columns

Index(['Id_loans', 'CreatedDate_loans', 'CustomerId', 'LoanAmount',
       'ProductId', 'LoanReason', 'Installment', 'Tenor', 'Rate', 'LoanStatus',
       'FirstRepaymentDate', 'MaturityDate', 'TotalOutstandingPrincipal',
       'TotalOutstandingInterest', 'TotalOutstandingBalance', 'IsEmployed',
       'IsHomeOwner', 'RepaymentType', 'LoanPurpose', 'Id_transactions',
       'CreatedDate_transactions', 'Amount', 'PaymentType', 'LoanAccountId',
       'RepaymentDate', 'RepaymentDeduction', 'Id', 'Gender', 'DateOfBirth',
       'CreditScore', 'DisbursedMonth', 'days', 'age'],
      dtype='object')

In [164]:
df.LoanPurpose.value_counts() #Gender=female

Business             8726
Emergency            2454
Rent                 1342
MedicalBills         1283
Bills                1024
Education            1002
Investment            756
CarExpense            451
Food_Provisions       444
SpecialEvents         252
DebtConsideration     149
PocketMoney           113
Entertainment          56
Name: LoanPurpose, dtype: int64

In [198]:
df.PaymentType.value_counts()

CPA                        5919
BankTransfer               5077
LoanDisbursement           2798
CardPayment                2566
Liquidation                1357
Withdrawal_Cancellation     315
CashDeposit                  15
GoodwillCredit                5
Name: PaymentType, dtype: int64

In [166]:
#Q3. Average Loan amount disbursed to Female for Business Purpose
#Solution1
filter1 = df.PaymentType=='LoanDisbursement'
filter2 = df['Gender'] == 'Female'
filter3 =  df['LoanPurpose'] == 'Business'


customer_loan_transactions[filter1 & filter2 & filter3]['LoanAmount'].mean()


74692.69776876268

In [179]:
#Q3. Average Loan amount disbursed to Female for Business Purpose
#Solution2
filter1 = df.PaymentType=='LoanDisbursement'
filter2 = df['Gender'] == 'Female'
filter3 =  df['LoanPurpose'] == 'Business'


df['LoanAmount'][filter1 & filter2 & filter3].mean()

74692.69776876268

In [168]:
#Q3. Average Loan amount disbursed to Female for Business Purpose
#Solution3
df[(df.PaymentType=='LoanDisbursement') & 
   (df.Gender=='Female') & 
   (df.LoanPurpose=='Business')]['LoanAmount'].mean() 

74692.69776876268

In [217]:
#Q3. Average Loan amount disbursed to Female for Business Purpose
#Solution4
df.loc[(df.PaymentType=='LoanDisbursement') & 
   (df.Gender=='Female') & 
   (df.LoanPurpose=='Business')]['LoanAmount'].mean() 

74692.69776876268

In [226]:
df.loc[]

Id_loans                                              5
CreatedDate_loans            2021-06-09 18:49:50.483000
CustomerId                                            2
LoanAmount                                        25000
ProductId                                             1
LoanReason                                           NA
Installment                                       10775
Tenor                                                 4
Rate                                                312
LoanStatus                                      Settled
FirstRepaymentDate                  2021-07-08 00:00:00
MaturityDate                 2021-09-09 18:49:50.483000
TotalOutstandingPrincipal                         25000
TotalOutstandingInterest                        18099.9
TotalOutstandingBalance                         43099.9
IsEmployed                                         True
IsHomeOwner                                        None
RepaymentType                                   

### Q4. Average Age of Customers requesting loan for Educational Purpose


In [169]:
df.LoanPurpose.value_counts() #Gender=female

Business             8726
Emergency            2454
Rent                 1342
MedicalBills         1283
Bills                1024
Education            1002
Investment            756
CarExpense            451
Food_Provisions       444
SpecialEvents         252
DebtConsideration     149
PocketMoney           113
Entertainment          56
Name: LoanPurpose, dtype: int64

In [170]:
#calculating age of customers based on DateOfBirth column- from (now-DoB)
now = pd.Timestamp('now')
df['age'] = (now - df['DateOfBirth']).astype('<m8[Y]')  

In [223]:
# Q4. Average Age of Customers requesting loan for Educational Purpose
# Solution1
df[df.LoanPurpose == 'Education']['age'].agg('mean')

36.77445109780439

In [222]:
# Q4. Average Age of Customers requesting loan for Educational Purpose
# Solution2
df[df.LoanPurpose == 'Education']['age'].mean()

36.77445109780439

In [221]:
# Q4. Average Age of Customers requesting loan for Educational Purpose
# Solution2
df.loc[df.LoanPurpose == 'Education']['age'].mean()

36.77445109780439

### Q5. Average tenor for the loan disbursed for customers with the max credit score

In [262]:
filter1 = (df.CreditScore == df.CreditScore.max())

In [263]:
filter2 = (df.PaymentType == 'LoanDisbursement')

In [264]:
df[filter1 & filter2]['Tenor'].mean()

5.333333333333333

### Q5. List of customers with the max credit score, average tenor for the loan disbursed to them.

In [173]:
df.columns

Index(['Id_loans', 'CreatedDate_loans', 'CustomerId', 'LoanAmount',
       'ProductId', 'LoanReason', 'Installment', 'Tenor', 'Rate', 'LoanStatus',
       'FirstRepaymentDate', 'MaturityDate', 'TotalOutstandingPrincipal',
       'TotalOutstandingInterest', 'TotalOutstandingBalance', 'IsEmployed',
       'IsHomeOwner', 'RepaymentType', 'LoanPurpose', 'Id_transactions',
       'CreatedDate_transactions', 'Amount', 'PaymentType', 'LoanAccountId',
       'RepaymentDate', 'RepaymentDeduction', 'Id', 'Gender', 'DateOfBirth',
       'CreditScore', 'DisbursedMonth', 'days', 'age'],
      dtype='object')

In [174]:
customers.columns

Index(['Id', 'DateOfBirth', 'Gender', 'MaritalStatus', 'CreditScore',
       'LastLoginTime', 'IsEmailConfirmed', 'EmailConfirmationDate',
       'PhoneNumberConfirmationDate', 'IsPhoneNumberConfirmed',
       'EmployerSector', 'EmploymentStatus', 'IsVerified',
       'IsBasicProfileComplete'],
      dtype='object')

In [175]:
filter1 = df.CreditScore.max

In [176]:
filter2 = df.Tenor.mean()

In [177]:
filter3 = (df.PaymentType == 'LoanDisbursement')

In [248]:
# solution1 - gives boolean series as result with unique customerids
df.loc[filter1 & filter2 & filter3].CustomerId.unique

<bound method Series.unique of 97           1
100          1
129         22
171         25
388         97
         ...  
17881    10909
17891     2693
17959    42810
17963    43172
17999    47138
Name: CustomerId, Length: 493, dtype: int64>

In [250]:
#solution2 - gives numpy array as result with list of unique customerIDs
df.loc[filter1 & filter2 & filter3].CustomerId.unique()

array([    1,    22,    25,    97,   372,   222,   542,   783,   665,
         609,  1064,  1063,   559,  1293,  1309,  1521,  1525,  1775,
        1778,   421,  1304,  1838,  1158,  1865,  1371,  1042,  1196,
        2030,  2086,  1608,  1124,  2118,  2096,  2016,  2166,  2107,
        1943,  1369,  1360,  2307,  2142,  1295,  2329,  2336,  1297,
        2363,  2445,  2456,  2536,  2516,   526,  2262,  1955,  2665,
        2551,   902,  2387,  2303,  2225,  2507,  2566,  2838,  2924,
         934,  1126,  2887,  3051,  1303,  1732,   487,  1666,  3157,
        3307,  3287,  2936,  2539,  2694,  3302,  3249,  2085,  2446,
        2801,  3489,  2797,  2193,  2656,  2861,  1096,  1773,  2411,
        2955,  2155,  3612,  3206,   633,  3725,  2076,  2596,  2793,
         417,  1153,  4121,  2561,  3848,  2692,  4226,  2345,  4251,
        2669,  4315,  1450,  2181,  4414,  2650,  1628,  4386,  3070,
        4404,  4391,  2235,  4702,  4527,  4956,  4026,  2745,  2197,
        1465,  3745,

In [231]:
# solution3 - gives dataframe as result with CustomerId(but not unique)
df[filter1 & filter2 & filter3][['CustomerId']]

Unnamed: 0,CustomerId
97,1
100,1
129,22
171,25
388,97
...,...
17881,10909
17891,2693
17959,42810
17963,43172


In [228]:
#Solution4- gives dataframe with all details of customers(but not unique)
df.loc[filter1 & filter2 & filter3].head(2)

Unnamed: 0,Id_loans,CreatedDate_loans,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,...,LoanAccountId,RepaymentDate,RepaymentDeduction,Id,Gender,DateOfBirth,CreditScore,DisbursedMonth,days,age
97,6,2021-06-09 18:51:18.263,1,25000.0,1,,9459.86,6,360.0,Settled,...,6,2021-06-09 19:48:56.540,,1,Female,1997-06-20 00:00:00,0,June,00:57:38.277000,24.0
100,15,2021-06-11 09:34:33.620,1,25000.0,1,,9459.86,6,360.0,Settled,...,15,2021-06-11 09:43:30.263,,1,Female,1997-06-20 00:00:00,0,June,00:08:56.643000,24.0


In [251]:
df.loc[df['CreditScore'] == df['CreditScore'].max()]['Tenor'].mean()

4.774193548387097

In [199]:
transactions.loc[]

In [203]:
df.head()

Unnamed: 0,Id_loans,CreatedDate_loans,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,...,LoanAccountId,RepaymentDate,RepaymentDeduction,Id,Gender,DateOfBirth,CreditScore,DisbursedMonth,days,age
0,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,5,2021-06-09 18:51:36.960,,2,Male,1983-03-25 00:00:00,0,June,00:01:46.477000,38.0
1,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,5,2021-06-09 19:19:53.857,TotalReceived,2,Male,1983-03-25 00:00:00,0,June,00:30:03.374000,38.0
2,5,2021-06-09 18:49:50.483,2,25000.0,1,,10774.98,4,312.0,Settled,...,5,2021-06-09 19:32:34.473,TotalReceived,2,Male,1983-03-25 00:00:00,0,June,00:42:43.990000,38.0
3,7,2021-06-09 18:59:22.983,2,45000.0,1,,15598.05,6,312.0,Settled,...,7,2021-06-09 19:03:07.937,,2,Male,1983-03-25 00:00:00,0,June,00:03:44.954000,38.0
4,7,2021-06-09 18:59:22.983,2,45000.0,1,,15598.05,6,312.0,Settled,...,7,2021-06-09 19:28:19.360,TotalReceived,2,Male,1983-03-25 00:00:00,0,June,00:28:56.377000,38.0


In [213]:
#Passing booleen series to dataframe -results in dataframe
df[df["CreditScore"] == df["CreditScore"].max()].head(2)


Unnamed: 0,Id_loans,CreatedDate_loans,CustomerId,LoanAmount,ProductId,LoanReason,Installment,Tenor,Rate,LoanStatus,...,LoanAccountId,RepaymentDate,RepaymentDeduction,Id,Gender,DateOfBirth,CreditScore,DisbursedMonth,days,age
2160,739,2021-07-07 12:11:38.450,1046,70000.0,1,,18997.29,6,192.0,Due,...,739,2021-07-07 13:25:15.850,,1046,Male,1970-09-24 00:00:00,850,July,0 days 01:13:37.400000,51.0
2161,739,2021-07-07 12:11:38.450,1046,70000.0,1,,18997.29,6,192.0,Due,...,739,2021-08-06 20:54:15.780,TotalReceived,1046,Male,1970-09-24 00:00:00,850,August,30 days 08:42:37.330000,51.0
