In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dataset Overview - Fraud Detection Analysis\n",
    "\n",
    "**Objective**: Comprehensive exploration of the fraud detection dataset\n",
    "**Author**: Data Science Team\n",
    "**Date**: 2025-10-17\n",
    "**Version**: 1.0\n",
    "\n",
    "## Table of Contents\n",
    "1. [Environment Setup](#1-environment-setup)\n",
    "2. [Data Loading](#2-data-loading)\n",
    "3. [Basic Statistics](#3-basic-statistics)\n",
    "4. [Data Quality Assessment](#4-data-quality-assessment)\n",
    "5. [Fraud Distribution Analysis](#5-fraud-distribution-analysis)\n",
    "6. [Initial Insights](#6-initial-insights)\n",
    "7. [Next Steps](#7-next-steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Environment Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required libraries\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import plotly.express as px\n",
    "import plotly.graph_objects as go\n",
    "from plotly.subplots import make_subplots\n",
    "\n",
    "# Statistical libraries\n",
    "from scipy import stats\n",
    "from scipy.stats import chi2_contingency, normaltest\n",
    "\n",
    "# Data quality libraries\n",
    "import pandas_profiling as pp\n",
    "import missingno as msno\n",
    "\n",
    "# Utility libraries\n",
    "import warnings\n",
    "import sys\n",
    "import os\n",
    "from datetime import datetime, timedelta\n",
    "from typing import Tuple, List, Dict, Any\n",
    "\n",
    "# Custom utilities\n",
    "sys.path.append('../utils')\n",
    "from data_utils import load_fraud_data, clean_data\n",
    "from viz_utils import plot_distribution, plot_correlation_matrix\n",
    "\n",
    "# Configuration\n",
    "warnings.filterwarnings('ignore')\n",
    "plt.style.use('seaborn-v0_8')\n",
    "sns.set_palette('husl')\n",
    "\n",
    "# Set random seed for reproducibility\n",
    "np.random.seed(42)\n",
    "\n",
    "print(f\"üìä Environment setup complete!\")\n",
    "print(f\"Python version: {sys.version}\")\n",
    "print(f\"Pandas version: {pd.__version__}\")\n",
    "print(f\"NumPy version: {np.__version__}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Data Loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define data paths\n",
    "DATA_PATH = '../data/'\n",
    "TRANSACTION_DATA = f'{DATA_PATH}/transactions.csv'\n",
    "USER_DATA = f'{DATA_PATH}/users.csv'\n",
    "MERCHANT_DATA = f'{DATA_PATH}/merchants.csv'\n",
    "\n",
    "# Load datasets\n",
    "print(\"üîÑ Loading datasets...\")\n",
    "\n",
    "try:\n",
    "    # Load main transaction dataset\n",
    "    df_transactions = pd.read_csv(TRANSACTION_DATA, parse_dates=['timestamp'])\n",
    "    print(f\"‚úÖ Transactions loaded: {df_transactions.shape}\")\n",
    "    \n",
    "    # Load user data\n",
    "    df_users = pd.read_csv(USER_DATA)\n",
    "    print(f\"‚úÖ Users loaded: {df_users.shape}\")\n",
    "    \n",
    "    # Load merchant data\n",
    "    df_merchants = pd.read_csv(MERCHANT_DATA)\n",
    "    print(f\"‚úÖ Merchants loaded: {df_merchants.shape}\")\n",
    "    \n",
    "except FileNotFoundError as e:\n",
    "    print(f\"‚ùå File not found: {e}\")\n",
    "    print(\"üìù Creating synthetic data for demonstration...\")\n",
    "    \n",
    "    # Generate synthetic data for demonstration\n",
    "    from utils.data_utils import generate_synthetic_fraud_data\n",
    "    \n",
    "    df_transactions, df_users, df_merchants = generate_synthetic_fraud_data(\n",
    "        n_transactions=100000,\n",
    "        n_users=10000,\n",
    "        n_merchants=1000,\n",
    "        fraud_rate=0.02\n",
    "    )\n",
    "    \n",
    "    print(f\"‚úÖ Synthetic data generated\")\n",
    "    print(f\"   - Transactions: {df_transactions.shape}\")\n",
    "    print(f\"   - Users: {df_users.shape}\")\n",
    "    print(f\"   - Merchants: {df_merchants.shape}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display basic information about the datasets\n",
    "print(\"üìã Dataset Overview:\")\n",
    "print(\"=\"*50)\n",
    "\n",
    "print(\"\\nüè¶ TRANSACTIONS DATASET:\")\n",
    "print(f\"Shape: {df_transactions.shape}\")\n",
    "print(f\"Columns: {list(df_transactions.columns)}\")\n",
    "print(f\"Date range: {df_transactions['timestamp'].min()} to {df_transactions['timestamp'].max()}\")\n",
    "print(f\"Memory usage: {df_transactions.memory_usage(deep=True).sum() / 1024**2:.2f} MB\")\n",
    "\n",
    "print(\"\\nüë• USERS DATASET:\")\n",
    "print(f\"Shape: {df_users.shape}\")\n",
    "print(f\"Columns: {list(df_users.columns)}\")\n",
    "\n",
    "print(\"\\nüè™ MERCHANTS DATASET:\")\n",
    "print(f\"Shape: {df_merchants.shape}\")\n",
    "print(f\"Columns: {list(df_merchants.columns)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Basic Statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display first few rows\n",
    "print(\"üîç First 5 transactions:\")\n",
    "display(df_transactions.head())\n",
    "\n",
    "print(\"\\nüìä Data Types:\")\n",
    "print(df_transactions.dtypes)\n",
    "\n",
    "print(\"\\nüìà Descriptive Statistics:\")\n",
    "display(df_transactions.describe(include='all'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fraud distribution analysis\n",
    "fraud_counts = df_transactions['is_fraud'].value_counts()\n",
    "fraud_rate = df_transactions['is_fraud'].mean()\n",
    "\n",
    "print(f\"üö® FRAUD ANALYSIS:\")\n",
    "print(f\"Total transactions: {len(df_transactions):,}\")\n",
    "print(f\"Fraudulent transactions: {fraud_counts[1]:,}\")\n",
    "print(f\"Legitimate transactions: {fraud_counts[0]:,}\")\n",
    "print(f\"Fraud rate: {fraud_rate:.2%}\")\n",
    "print(f\"Class imbalance ratio: {fraud_counts[0] / fraud_counts[1]:.1f}:1\")\n",
    "\n",
    "# Visualize fraud distribution\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))\n",
    "\n",
    "# Pie chart\n",
    "fraud_counts.plot(kind='pie', ax=ax1, autopct='%1.2f%%', \n",
    "                  labels=['Legitimate', 'Fraudulent'],\n",
    "                  colors=['lightgreen', 'salmon'])\n",
    "ax1.set_title('Transaction Distribution', fontsize=14, fontweight='bold')\n",
    "ax1.set_ylabel('')\n",
    "\n",
    "# Bar chart with counts\n",
    "fraud_counts.plot(kind='bar', ax=ax2, color=['lightgreen', 'salmon'])\n",
    "ax2.set_title('Transaction Counts', fontsize=14, fontweight='bold')\n",
    "ax2.set_xlabel('Transaction Type')\n",
    "ax2.set_ylabel('Count')\n",
    "ax2.set_xticklabels(['Legitimate', 'Fraudulent'], rotation=0)\n",
    "\n",
    "# Add count annotations\n",
    "for i, v in enumerate(fraud_counts.values):\n",
    "    ax2.text(i, v + len(df_transactions)*0.01, f'{v:,}', \n",
    "             ha='center', va='bottom', fontweight='bold')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Data Quality Assessment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Missing values analysis\n",
    "print(\"üîç MISSING VALUES ANALYSIS:\")\n",
    "print(\"=\"*40)\n",
    "\n",
    "missing_values = df_transactions.isnull().sum()\n",
    "missing_percentage = (missing_values / len(df_transactions)) * 100\n",
    "\n",
    "missing_df = pd.DataFrame({\n",
    "    'Column': missing_values.index,\n",
    "    'Missing_Count': missing_values.values,\n",
    "    'Missing_Percentage': missing_percentage.values\n",
    "})\n",
    "\n",
    "missing_df = missing_df[missing_df['Missing_Count'] > 0].sort_values('Missing_Count', ascending=False)\n",
    "\n",
    "if len(missing_df) > 0:\n",
    "    display(missing_df)\n",
    "    \n",
    "    # Visualize missing values\n",
    "    plt.figure(figsize=(12, 6))\n",
    "    msno.bar(df_transactions)\n",
    "    plt.title('Missing Values by Column', fontsize=14, fontweight='bold')\n",
    "    plt.show()\n",
    "else:\n",
    "    print(\"‚úÖ No missing values found in the dataset!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data quality checks\n",
    "print(\"üîç DATA QUALITY CHECKS:\")\n",
    "print(\"=\"*30)\n",
    "\n",
    "# Check for duplicates\n",
    "duplicate_count = df_transactions.duplicated().sum()\n",
    "print(f\"Duplicate transactions: {duplicate_count}\")\n",
    "\n",
    "# Check for negative amounts\n",
    "negative_amounts = (df_transactions['amount'] < 0).sum()\n",
    "print(f\"Negative amounts: {negative_amounts}\")\n",
    "\n",
    "# Check for future dates\n",
    "future_dates = (df_transactions['timestamp'] > datetime.now()).sum()\n",
    "print(f\"Future timestamps: {future_dates}\")\n",
    "\n",
    "# Check data ranges\n",
    "print(f\"\\nüìä DATA RANGES:\")\n",
    "print(f\"Amount range: ${df_transactions['amount'].min():.2f} - ${df_transactions['amount'].max():,.2f}\")\n",
    "print(f\"Unique users: {df_transactions['user_id'].nunique():,}\")\n",
    "print(f\"Unique merchants: {df_transactions['merchant_id'].nunique():,}\")\n",
    "print(f\"Unique payment methods: {df_transactions['payment_method'].nunique()}\")\n",
    "print(f\"Payment methods: {df_transactions['payment_method'].unique()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Fraud Distribution Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fraud by amount analysis\n",
    "print(\"üí∞ FRAUD BY AMOUNT ANALYSIS:\")\n",
    "print(\"=\"*35)\n",
    "\n",
    "# Amount statistics by fraud status\n",
    "amount_stats = df_transactions.groupby('is_fraud')['amount'].agg([\n",
    "    'count', 'mean', 'median', 'std', 'min', 'max'\n",
    "]).round(2)\n",
    "\n",
    "amount_stats.index = ['Legitimate', 'Fraudulent']\n",
    "display(amount_stats)\n",
    "\n",
    "# Visualize amount distributions\n",
    "fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))\n",
    "\n",
    "# Box plot\n",
    "df_transactions.boxplot(column='amount', by='is_fraud', ax=ax1)\n",
    "ax1.set_title('Amount Distribution by Fraud Status')\n",
    "ax1.set_xlabel('Fraud Status (0=Legitimate, 1=Fraudulent)')\n",
    "ax1.set_ylabel('Amount ($)')\n",
    "\n",
    "# Histogram\n",
    "df_transactions[df_transactions['is_fraud'] == 0]['amount'].hist(\n",
    "    bins=50, alpha=0.7, label='Legitimate', ax=ax2, density=True)\n",
    "df_transactions[df_transactions['is_fraud'] == 1]['amount'].hist(\n",
    "    bins=50, alpha=0.7, label='Fraudulent', ax=ax2, density=True)\n",
    "ax2.set_title('Amount Distribution Histogram')\n",
    "ax2.set_xlabel('Amount ($)')\n",
    "ax2.set_ylabel('Density')\n",
    "ax2.legend()\n",
    "\n",
    "# Log scale histogram\n",
    "legitimate_amounts = df_transactions[df_transactions['is_fraud'] == 0]['amount']\n",
    "fraudulent_amounts = df_transactions[df_transactions['is_fraud'] == 1]['amount']\n",
    "\n",
    "ax3.hist(legitimate_amounts, bins=50, alpha=0.7, label='Legitimate', density=True)\n",
    "ax3.hist(fraudulent_amounts, bins=50, alpha=0.7, label='Fraudulent', density=True)\n",
    "ax3.set_yscale('log')\n",
    "ax3.set_title('Amount Distribution (Log Scale)')\n",
    "ax3.set_xlabel('Amount ($)')\n",
    "ax3.set_ylabel('Log Density')\n",
    "ax3.legend()\n",
    "\n",
    "# Cumulative distribution\n",
    "legitimate_sorted = np.sort(legitimate_amounts)\n",
    "fraudulent_sorted = np.sort(fraudulent_amounts)\n",
    "\n",
    "ax4.plot(legitimate_sorted, np.linspace(0, 1, len(legitimate_sorted)), \n",
    "         label='Legitimate', linewidth=2)\n",
    "ax4.plot(fraudulent_sorted, np.linspace(0, 1, len(fraudulent_sorted)), \n",
    "         label='Fraudulent', linewidth=2)\n",
    "ax4.set_title('Cumulative Distribution of Amounts')\n",
    "ax4.set_xlabel('Amount ($)')\n",
    "ax4.set_ylabel('Cumulative Probability')\n",
    "ax4.legend()\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Statistical test for amount differences\n",
    "stat, p_value = stats.mannwhitneyu(legitimate_amounts, fraudulent_amounts)\n",
    "print(f\"\\nüìä Mann-Whitney U test for amount differences:\")\n",
    "print(f\"Statistic: {stat:.2f}\")\n",
    "print(f\"P-value: {p_value:.2e}\")\n",
    "print(f\"Result: {'Significant difference' if p_value < 0.05 else 'No significant difference'}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fraud by payment method\n",
    "print(\"üí≥ FRAUD BY PAYMENT METHOD:\")\n",
    "print(\"=\"*30)\n",
    "\n",
    "payment_fraud = df_transactions.groupby('payment_method')['is_fraud'].agg([\n",
    "    'count', 'sum', 'mean'\n",
    "]).round(4)\n",
    "\n",
    "payment_fraud.columns = ['Total_Transactions', 'Fraud_Count', 'Fraud_Rate']\n",
    "payment_fraud['Fraud_Percentage'] = payment_fraud['Fraud_Rate'] * 100\n",
    "payment_fraud = payment_fraud.sort_values('Fraud_Rate', ascending=False)\n",
    "\n",
    "display(payment_fraud)\n",
    "\n",
    "# Visualize payment method fraud rates\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))\n",
    "\n",
    "# Fraud rate by payment method\n",
    "payment_fraud['Fraud_Percentage'].plot(kind='bar', ax=ax1, color='coral')\n",
    "ax1.set_title('Fraud Rate by Payment Method', fontsize=14, fontweight='bold')\n",
    "ax1.set_xlabel('Payment Method')\n",
    "ax1.set_ylabel('Fraud Rate (%)')\n",
    "ax1.tick_params(axis='x', rotation=45)\n",
    "\n",
    "# Transaction volume by payment method\n",
    "payment_fraud['Total_Transactions'].plot(kind='bar', ax=ax2, color='lightblue')\n",
    "ax2.set_title('Transaction Volume by Payment Method', fontsize=14, fontweight='bold')\n",
    "ax2.set_xlabel('Payment Method')\n",
    "ax2.set_ylabel('Number of Transactions')\n",
    "ax2.tick_params(axis='x', rotation=45)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Initial Insights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Time-based fraud analysis\n",
    "print(\"‚è∞ TIME-BASED FRAUD ANALYSIS:\")\n",
    "print(\"=\"*32)\n",
    "\n",
    "# Extract time features\n",
    "df_transactions['hour'] = df_transactions['timestamp'].dt.hour\n",
    "df_transactions['day_of_week'] = df_transactions['timestamp'].dt.day_name()\n",
    "df_transactions['month'] = df_transactions['timestamp'].dt.month\n",
    "\n",
    "# Fraud by hour of day\n",
    "hourly_fraud = df_transactions.groupby('hour')['is_fraud'].mean() * 100\n",
    "\n",
    "# Fraud by day of week\n",
    "daily_fraud = df_transactions.groupby('day_of_week')['is_fraud'].mean() * 100\n",
    "daily_fraud = daily_fraud.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', \n",
    "                                  'Friday', 'Saturday', 'Sunday'])\n",
    "\n",
    "# Visualize temporal patterns\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))\n",
    "\n",
    "# Hourly patterns\n",
    "hourly_fraud.plot(kind='line', marker='o', ax=ax1, color='red', linewidth=2, markersize=6)\n",
    "ax1.set_title('Fraud Rate by Hour of Day', fontsize=14, fontweight='bold')\n",
    "ax1.set_xlabel('Hour of Day')\n",
    "ax1.set_ylabel('Fraud Rate (%)')\n",
    "ax1.grid(True, alpha=0.3)\n",
    "\n",
    "# Daily patterns\n",
    "daily_fraud.plot(kind='bar', ax=ax2, color='orange')\n",
    "ax2.set_title('Fraud Rate by Day of Week', fontsize=14, fontweight='bold')\n",
    "ax2.set_xlabel('Day of Week')\n",
    "ax2.set_ylabel('Fraud Rate (%)')\n",
    "ax2.tick_params(axis='x', rotation=45)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(f\"\\nüîç Key Temporal Insights:\")\n",
    "print(f\"Peak fraud hour: {hourly_fraud.idxmax()}:00 ({hourly_fraud.max():.2f}% fraud rate)\")\n",
    "print(f\"Lowest fraud hour: {hourly_fraud.idxmin()}:00 ({hourly_fraud.min():.2f}% fraud rate)\")\n",
    "print(f\"Highest fraud day: {daily_fraud.idxmax()} ({daily_fraud.max():.2f}% fraud rate)\")\n",
    "print(f\"Lowest fraud day: {daily_fraud.idxmin()} ({daily_fraud.min():.2f}% fraud rate)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Summary insights\n",
    "print(\"\\nüéØ KEY INSIGHTS AND RECOMMENDATIONS:\")\n",
    "print(\"=\"*50)\n",
    "\n",
    "print(f\"\\nüìä Dataset Overview:\")\n",
    "print(f\"   ‚Ä¢ Total transactions: {len(df_transactions):,}\")\n",
    "print(f\"   ‚Ä¢ Fraud rate: {fraud_rate:.2%}\")\n",
    "print(f\"   ‚Ä¢ Class imbalance: {fraud_counts[0] / fraud_counts[1]:.1f}:1\")\n",
    "\n",
    "print(f\"\\nüí∞ Financial Impact:\")\n",
    "total_fraud_amount = df_transactions[df_transactions['is_fraud'] == 1]['amount'].sum()\n",
    "avg_fraud_amount = df_transactions[df_transactions['is_fraud'] == 1]['amount'].mean()\n",
    "print(f\"   ‚Ä¢ Total fraudulent amount: ${total_fraud_amount:,.2f}\")\n",
    "print(f\"   ‚Ä¢ Average fraud transaction: ${avg_fraud_amount:.2f}\")\n",
    "\n",
    "print(f\"\\nüö® Risk Factors Identified:\")\n",
    "print(f\"   ‚Ä¢ Higher risk payment method: {payment_fraud.index[0]}\")\n",
    "print(f\"   ‚Ä¢ Peak fraud time: {hourly_fraud.idxmax()}:00\")\n",
    "print(f\"   ‚Ä¢ Riskiest day: {daily_fraud.idxmax()}\")\n",
    "\n",
    "print(f\"\\nüîß Data Quality:\")\n",
    "print(f\"   ‚Ä¢ Missing values: {'Yes' if missing_df.shape[0] > 0 else 'None detected'}\")\n",
    "print(f\"   ‚Ä¢ Duplicates: {duplicate_count}\")\n",
    "print(f\"   ‚Ä¢ Data anomalies: {negative_amounts + future_dates}\")\n",
    "\n",
    "print(f\"\\nüìà Next Steps:\")\n",
    "print(f\"   1. Deep dive into transaction patterns (Notebook 02)\")\n",
    "print(f\"   2. Engineer velocity and behavioral features\")\n",
    "print(f\"   3. Develop baseline models with current features\")\n",
    "print(f\"   4. Focus on class imbalance handling strategies\")\n",
    "print(f\"   5. Investigate high-risk payment methods and times\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Next Steps\n",
    "\n",
    "Based on this initial exploration, the recommended next steps are:\n",
    "\n",
    "### Immediate Actions\n",
    "1. **Data Quality**: Address any identified data quality issues\n",
    "2. **Feature Engineering**: Create velocity, behavioral, and network features\n",
    "3. **Class Imbalance**: Implement SMOTE, undersampling, or cost-sensitive learning\n",
    "\n",
    "### Analysis Deep Dives\n",
    "1. **Transaction Patterns** (`02_transaction_analysis.ipynb`)\n",
    "2. **User Behavior Analysis** (`03_fraud_patterns.ipynb`)\n",
    "3. **Merchant Risk Assessment** (`04_data_quality_assessment.ipynb`)\n",
    "\n",
    "### Model Development\n",
    "1. Start with simple baseline models (Logistic Regression, Decision Trees)\n",
    "2. Progress to ensemble methods (Random Forest, XGBoost)\n",
    "3. Explore neural networks and anomaly detection approaches\n",
    "\n",
    "### Business Impact\n",
    "1. Calculate fraud prevention ROI\n",
    "2. Estimate false positive costs\n",
    "3. Develop business-friendly model interpretation\n",
    "\n",
    "---\n",
    "\n",
    "**Next Notebook**: `02_transaction_analysis.ipynb` - Deep dive into transaction patterns and user behavior"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}