<a href="https://colab.research.google.com/github/Dikshasingh2004/fraud_detection/blob/main/fraud_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### **Fraud detection in financial transactions.**
#####In this project, we will use exact inference to predict whether a transaction is fraudulent based on certain observed features.



### **Step 1: Install Dependencies**

In [4]:
pip install pgmpy




### **Step 2: Implement the Bayesian Network**

To implement a small Bayesian Network, the goal is to predict whether a financial transaction is fraudulent based on features such as whether the transaction amount is unusually high, if it comes from a new location, and if the transaction is occurring at an unusual time.

We will use four variables in this network:

F (Fraud): Whether the transaction is fraudulent.

A (Amount): Whether the transaction amount is unusually high.

L (Location): Whether the transaction occurs from a new location.

T (Time): Whether the transaction happens at an unusual time (e.g., late at night).

In [5]:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

## **from pgmpy.models import BayesianNetwork**


BayesianNetwork: This class is used to represent the structure of a Bayesian Network, which is a probabilistic graphical model. A Bayesian Network consists of nodes (random variables) and directed edges (dependencies between variables). Each node has an associated probability distribution (either a prior or a conditional probability distribution).

## **from pgmpy.factors.discrete import TabularCPD**

TabularCPD: Stands for Tabular Conditional Probability Distribution. This class is used to define the prior probabilities or conditional probabilities for each random variable in the Bayesian Network. These probabilities determine how the variables are related to each other, either independently or conditionally.

## **from pgmpy.inference import VariableElimination**

VariableElimination: This is an exact inference algorithm used to compute queries (probabilities) in the Bayesian Network. Variable elimination is an efficient algorithm for answering probabilistic queries by systematically eliminating variables from the Bayesian Network. It is used for exact inference, which means it calculates precise probabilities without approximation.

#### **Define the structure of the Bayesian Network**

In [7]:
model = BayesianNetwork([('A', 'F'), ('L', 'F'), ('T', 'F')])

#### **Define the CPDs (Conditional Probability Tables)**

In [8]:
# Unusual Amount (A)
cpd_A = TabularCPD(variable='A', variable_card=2, values=[[0.95], [0.05]])

In [17]:
print(cpd_A)

+------+------+
| A(0) | 0.95 |
+------+------+
| A(1) | 0.05 |
+------+------+


In [9]:
# New Location (L)
cpd_L = TabularCPD(variable='L', variable_card=2, values=[[0.9], [0.1]])


In [18]:
print(cpd_L)

+------+-----+
| L(0) | 0.9 |
+------+-----+
| L(1) | 0.1 |
+------+-----+


In [10]:
# Unusual Time (T)
cpd_T = TabularCPD(variable='T', variable_card=2, values=[[0.8], [0.2]])

In [19]:
print(cpd_T)

+------+-----+
| T(0) | 0.8 |
+------+-----+
| T(1) | 0.2 |
+------+-----+


In [11]:
# Fraud (F) dependent on Amount (A), Location (L), and Time (T)
cpd_F = TabularCPD(variable='F', variable_card=2,
                   values=[[0.9, 0.7, 0.7, 0.5, 0.6, 0.4, 0.4, 0.1],
                           [0.1, 0.3, 0.3, 0.5, 0.4, 0.6, 0.6, 0.9]],
                   evidence=['A', 'L', 'T'], evidence_card=[2, 2, 2])

In [20]:
print(cpd_F)

+------+------+------+------+------+------+------+------+------+
| A    | A(0) | A(0) | A(0) | A(0) | A(1) | A(1) | A(1) | A(1) |
+------+------+------+------+------+------+------+------+------+
| L    | L(0) | L(0) | L(1) | L(1) | L(0) | L(0) | L(1) | L(1) |
+------+------+------+------+------+------+------+------+------+
| T    | T(0) | T(1) | T(0) | T(1) | T(0) | T(1) | T(0) | T(1) |
+------+------+------+------+------+------+------+------+------+
| F(0) | 0.9  | 0.7  | 0.7  | 0.5  | 0.6  | 0.4  | 0.4  | 0.1  |
+------+------+------+------+------+------+------+------+------+
| F(1) | 0.1  | 0.3  | 0.3  | 0.5  | 0.4  | 0.6  | 0.6  | 0.9  |
+------+------+------+------+------+------+------+------+------+


#### **Add the CPDs to the model**

In [12]:
model.add_cpds(cpd_A, cpd_L, cpd_T, cpd_F)

#### **Validate the model**

In [13]:
assert model.check_model(), "The model is invalid!"


#### **Perform Variable Elimination for Exact Inference**

In [14]:
infer = VariableElimination(model)

#### Query 1: What is the probability of fraud if the transaction has a high amount, is from a new location, and occurs at an unusual time?

In [15]:
result = infer.query(variables=['F'], evidence={'A': 1, 'L': 1, 'T': 1})
print("Probability of fraud given high amount, new location, and unusual time:\n", result)

Probability of fraud given high amount, new location, and unusual time:
 +------+----------+
| F    |   phi(F) |
| F(0) |   0.1000 |
+------+----------+
| F(1) |   0.9000 |
+------+----------+


#### Query 2: What is the probability of fraud if the transaction is not from a new location, but has a high amount and occurs at an unusual time?

In [16]:
result = infer.query(variables=['F'], evidence={'A': 1, 'L': 0, 'T': 1})
print("Probability of fraud given high amount, no new location, and unusual time:\n", result)

Probability of fraud given high amount, no new location, and unusual time:
 +------+----------+
| F    |   phi(F) |
| F(0) |   0.4000 |
+------+----------+
| F(1) |   0.6000 |
+------+----------+
