# L1: Analyzing Customer Spending Habits

Created by Machine Learning Department, NUS Fintech Society

In this lab, you will perform some basic data analysis on the credit card dataset to get some glances of the customer spending habits. The dataset contains information about the usage behavior of about 9000 active credit card holders. The file is at a customer level with 18 behavioral variables. Your goal is to understand the dataset by performing processing techniques you learn in the week 3 training session.

This analysis is required to develop a customer segmentation to define a proper marketing strategy. Following is the meaning of each column in the dataset:

* `CUST_ID`: Identification of Credit Card holder (Categorical)
* `BALANCE`: Balance amount left in their account to make purchases
* `BALANCE_FREQUENCY`: How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
* `PURCHASES`: Amount of purchases made from account
* `ONEOFF_PURCHASES`: Maximum purchase amount done in one-go
* `INSTALLMENTS_PURCHASES`: Amount of purchase done in installment
* `CASH_ADVANCE`: Cash in advance given by the user
* `PURCHASES_FREQUENCY`: How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
* `ONEOFF_PURCHASES_FREQUENCY`: How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
* `PURCHASES_INSTALLMENTS_FREQUENCY`: How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)
* `CASH_ADVANCE_FREQUENCY`: How frequently the cash in advance being paid
* `CASH_ADVANCE_TRX`: Number of Transactions made with "Cash in Advanced"
* `PURCHASES_TRX`: Number of purchase transactions made
* `CREDIT_LIMIT`: Limit of Credit Card for user
* `PAYMENTS`: Amount of Payment done by user
* `MINIMUM_PAYMENTS`: Minimum amount of payments made by user
* `PRC_FULL_PAYMENT`: Percent of full payment paid by user
* `TENURE`: Tenure of credit card service for user

In [None]:
# Download the dataset from the repository (please run this first without any modification)
!wget -N https://raw.githubusercontent.com/oadultradeepfield/nus-fintech-society-ml-training-ay24-25-sem1/main/week3/data/credit_card.csv

# Lab Tasks and Questions

## Loading the Data and Initial Exploration

### Instructions:
* Load the dataset into a Pandas DataFrame.
* Display the first few rows to get an initial look at the data.

### Q1: What is maximum amount of purchases done in one-go for the customer with an identification number of C10004? (Answer this in Google Form)

In [None]:
# These are the libraries you will use in this labb
import pandas as pd
import seaborn as sns

In [None]:
# Load the dataset from the downloaded file
df = pd.read_csv('credit_card.csv')

In [None]:
# Display the first five rows of the dataset
# your code go here

## Understanding the Dataset Structure

### Instructions:

* Use `df.info()` to get an overview of the dataset's structure.
* Use `df.describe()` to generate summary statistics for the numerical columns.

### Q2: How many rows and columns does the dataset have?  (Answer this in Google Form)

In [None]:
# Display dataset information
# your code go here

### Q3: What is the maximum amount of cash in advance ever given by a user?  (Answer this in Google Form)

In [None]:
# Generate summary statistics for numerical columns
# your code go here

## Slicing and Sorting Data

### Instructions:

* Select only the columns `CUST_ID` and `PURCHASES_TRX`.
* Sort the data by `PURCHASES_TRX` to identify the highest numbers of purchases.

### Q4: What are the top 3 highest numbers of purchases? Provide the details for these transactions, including the credit card identification number and the numbers of purchases.  (Answer this in Google Form)

In [None]:
# Select the specified columns
# your code go here

# Sort the data by 'transaction_amount' in descending order
# your code go here

## Data Visualization with Seaborn

### Instructions:

* Create a histogram to visualize the distribution of the amount of payment done by users
* Create a scatter plot to explore the relationship between amount of payment done by users ($Y$) and their credit limits ($X$)

### Q5: Based on the observations on scatter plot, what can you infer about the relationship between a customer's credit limit and their spending behavior?  (Answer this in Google Form)

In [None]:
# Histogram of the amount of payment done by users
# your code go here

In [None]:
# Scatter plot for amount of payment done by users vs. their credit limits
# your code go here

<h2 style="text-align: center;"><hr>END OF THE LAB<hr></h2>