<a href="https://colab.research.google.com/github/kamalkant9928/recomeder-system/blob/main/Untitled48.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [None]:
df=pd.read_csv('/content/transactions.csv')
df.info()

In [None]:
df['Region']=df['Region'].fillna(method='ffill')
df.head()
df.info()

Data Cleaning & Preparation

Clean Transaction Amount so all values are floats (remove $, $$, etc.).

Convert Transaction Date to a proper datetime column, handling mixed formats.

Create a new column Transaction Month and Transaction Weekday.

Fill missing values in Region with "Unknown".

Create a new feature Amount per Quantity = Transaction Amount / Quantity.

In [None]:
df['Transaction Amount']=pd.to_numeric(df['Transaction Amount'],errors='coerce')
df['Transaction Date']=pd.to_datetime(df['Transaction Date'],errors='coerce')
df['Month']=df['Transaction Date'].dt.month
df['Weekday']=df['Transaction Date'].dt.weekday
df.head()

For each Product Category, calculate average amount per transaction and fraud percentage.

Create a pivot table showing total revenue by Region and Product Category

In [None]:
x=df.groupby(['Region','Product Category'])['Transaction Amount'].sum()


In [None]:
pivot_table = df.pivot_table(index='Customer ID', columns='Product Category', values=['Transaction Amount','Transaction Hour','Quantity'], aggfunc=['sum','mean','count'])
display(pivot_table)

In [None]:
df.info()

In [None]:
df['Revenue']=df['Transaction Amount']*df['Quantity']

df.groupby('Weekday')['Revenue'].max().plot(color='red',kind='bar')


In [None]:
df.groupby('Month')['Revenue'].max().plot(color='green',linestyle='--')

In [None]:
df.groupby('Transaction Hour')['Revenue'].max().plot(color='blue',linestyle='--')
df.groupby('Transaction Hour')['Revenue'].max().plot(color='blue',linestyle='--',kind='bar')

In [None]:
df.groupby('Region')['Payment Method'].apply(lambda x: x.mode())

Identify customers whose average transaction amount increased month over month.

In [None]:
df=df.sort_values(by=['Customer ID','Transaction Date','Month'])
df.groupby(['Customer ID','Month'])['Transaction Amount'].mean()


In [None]:
# Calculate the average transaction amount per customer per month
monthly_avg_transaction = df.groupby(['Customer ID', 'Month'])['Transaction Amount'].mean().reset_index()

# Calculate the difference in average transaction amount from the previous month for each customer
monthly_avg_transaction['Avg_Transaction_Change'] = monthly_avg_transaction.groupby('Customer ID')['Transaction Amount'].diff()

# Identify customers where the average transaction amount increased in the next month
# We can filter for changes greater than 0
customers_with_increase = monthly_avg_transaction[monthly_avg_transaction['Avg_Transaction_Change'] > 0]['Customer ID'].unique()

print("Customers with month-over-month increase in average transaction amount:")
display(customers_with_increase)

In [None]:
df.pivot_table(index='Region',columns=['Customer ID','Product Category'],values='Transaction Amount',aggfunc='sum')

In [None]:
plt.figure(figsize=(10,10))
sns.heatmap(df.select_dtypes(include=np.number).corr(), annot=True, fmt=".2f")
plt.show()

In [None]:
def superReducedString(s):
    stack = []
    for char in s:
        if stack and stack[-1] == char:
            stack.pop()
        else:
            stack.append(char)

    result = "".join(stack)
    return result if result else "Empty String"

superReducedString.__doc__

In [None]:
df.loc[df["Transaction Amount"].idxmax()][['Customer ID','Transaction ID', 'Transaction Amount']]
pd.merge()

In [None]:
a=df.groupby('Product Category')['Transaction Amount'].sum()

a.agg(['max','min','mean','sum']).T

In [None]:
max(a)

In [None]:
x=df.groupby('Customer ID')['Transaction Amount'].sum().sort_values(ascending=False)
y=df.groupby('Customer ID')['Quantity'].sum().sort_values(ascending=False)
pd.merge(x,y,on='Customer ID')
x.nlargest(3)[2]

In [None]:
df.info()

Find patterns in fraudulent transactions based on transaction hour, payment method, and account age.

Build a heatmap showing high-risk time periods for fraud.

In [None]:
# Select only numeric columns for correlation calculation
numeric_df = df[['Is Fraudulent','Transaction Hour','Transaction Hour','Account Age Days']]
correlation_matrix = numeric_df.corr()
sns.heatmap(correlation_matrix, annot=True, fmt=".5f")
plt.title('Correlation Heatmap of Fraudulent Transactions')
plt.show()

Group customers by average transaction amount, frequency, and product category.

Identify top 5 customer segments with the highest lifetime value.

In [None]:
df.info()

In [None]:
df[]

In [None]:
customer_agg = df.groupby('Customer ID').agg({'Transaction Amount': 'mean', 'Product Category': lambda x: x.unique().tolist()})
display(customer_agg.sort_values(by=['Transaction Amount']))

Detect unusual transactions where amount is much higher than the customer’s average.

Mark them as “Potential Fraud” if they occur during odd hours.

In [None]:
df['Transaction Amount']=pd.to_numeric(df['Transaction Amount'].replace('$',''),errors='coerce')
x=df[df['Transaction Amount']>df['Transaction Amount'].mean()]
x['fraud']=x['Transaction Hour'].apply(lambda x:'Potential Fraud' if x%2!=0 else 'Not Potential Fraud')
x.head()

Calculate the total revenue per product category and suggest the top 3 categories to focus on for marketing campaigns.

In [None]:
df['revenue']=df['Quantity']*df['Transaction Amount']
df.groupby('Product Category')['revenue'].sum().nlargest(3)

Check if fraudulent transactions are concentrated in certain locations or on specific devices.

RFM (Recency, Frequency, Monetary) Analysis

Build RFM scores for each customer and classify them into Champions, Loyal, At Risk, and Lost.

Predictive Feature Engineering

Create new features like Average_Transaction_Per_Day, High_Risk_Hours, Age_Category for modeling fraud.

In [None]:
df['RFM scores']=df['Transaction Amount']+df['Quantity']+df['Account Age Days']
df['class']=df['RFM scores'].apply(lambda x:'champions' if x>5000 else ('loyal' if x>2000 else ('At Risk' if x>1000 else 'Lost')))
display(df.head())

In [None]:
df[df['Is Fraudulent']==1][['Customer Location','Device Used']]
df['Avg_Transaction_Per_Day']=df['Transaction Amount']/df['Account Age Days']
df['High_Risk_Hours']=df['Transaction Hour'].apply(lambda x:'High Risk' if x%2!=0 else 'Not High Risk')
df['Age_Category']=df['Account Age Days'].apply(lambda x:'Young' if x<100 else 'Old')


Find out which product categories + payment methods combinations have the highest fraud rates.

Visualize as a heatmap.

In [None]:
a=df.groupby(['Product Category','Payment Method'])['Is Fraudulent'].mean().reset_index()
a.plot()


2. Customer Lifetime Value (CLV)

Calculate total revenue per customer.

Rank top 10 customers and visualize in a bar chart.

In [None]:
df.groupby('Customer ID')['revenue'].sum().nlargest(10).plot(kind='bar')

3. Fraud Trend Over Time

Convert Transaction Date to datetime.

Plot daily fraud counts to identify spikes and patterns.

In [None]:
df['Transaction Date']=pd.to_datetime(df['Transaction Date'])
df=df.sort_values(by='Transaction Date')
df.groupby('Transaction Date')['Is Fraudulent'].count().plot()

4. High-Risk Device + Hour Combination

Analyze fraud probability by Device Used and Transaction Hour.

Suggest risk levels for each device-hour combination.

In [None]:
df.groupby(['Device Used','Transaction Hour'])['Is Fraudulent'].mean()
df['risk_levels']=df['Transaction Hour'].apply(lambda x:'High Risk' if x%2!=0 else 'Low Risk')

5. Regional Revenue Loss

Compare revenue lost due to fraudulent transactions vs. genuine revenue in each region.

In [None]:
x=df[df['Is Fraudulent']==0].groupby('Region')['revenue'].sum()
y=df[df['Is Fraudulent']==1].groupby('Region')['revenue'].sum()
pd.merge(x,y,on='Region')

Build a correlation matrix using:

Transaction Amount Clean

Transaction Hour

Account Age Days

Avg_Transaction_Per_Day

Identify strongest predictors of fraud.

In [None]:
df['Transaction Amount']=pd.to_numeric(df['Transaction Amount'].replace('$',''),errors='coerce')

df[['Transaction Amount','Transaction Hour','Account Age Days','Avg_Transaction_Per_Day']].corr()

7. Customer Loyalty Index

Combine:

Frequency (number of transactions)

Monetary (total amount)

Recency (last transaction date)

Assign a loyalty score and segment customers.

In [None]:
df.groupby('Customer ID').agg(Montary=('Transaction Amount','sum'),frequecy=('Transaction ID','count'),Recency=('Transaction Date','last'),f1=('Account Age Days','mean'))