# Retail Transactions Data Exploration & Neo4j Import

This notebook will help you explore the Kaggle retail transactions data, clean it, and prepare Cypher queries for Neo4j import.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load the retail transactions data
df = pd.read_csv('../data/retail_transactions.csv')
df.head()

## Data Overview
- Check for missing values, data types, and basic statistics.

In [None]:
df.info()
df.describe(include='all')

## Data Cleaning
- Remove rows with missing Customer ID or Product Description.
- Remove negative or zero quantities.
- Remove duplicates.

In [None]:
df = df.dropna(subset=['Customer ID', 'Description'])
df = df[df['Quantity'] > 0]
df = df.drop_duplicates()
df.reset_index(drop=True, inplace=True)
df.head()

## Prepare Data for Neo4j Import
- Extract unique Customers, Products, and Transactions.
- Prepare Cypher queries or CSVs for import.

In [None]:
# Unique customers
customers = df[['Customer ID', 'Country']].drop_duplicates().rename(columns={'Customer ID': 'customer_id', 'Country': 'country'})
# Unique products
products = df[['StockCode', 'Description']].drop_duplicates().rename(columns={'StockCode': 'product_id', 'Description': 'description'})
# Transactions
transactions = df[['Invoice', 'InvoiceDate', 'Customer ID']].drop_duplicates().rename(columns={'Invoice': 'invoice_id', 'InvoiceDate': 'date', 'Customer ID': 'customer_id'})
# Transaction-Product relationships
transaction_products = df[['Invoice', 'StockCode', 'Quantity', 'Price']].rename(columns={'Invoice': 'invoice_id', 'StockCode': 'product_id', 'Quantity': 'quantity', 'Price': 'price'})
customers.head(), products.head(), transactions.head(), transaction_products.head()

## (Optional) Export for Neo4j Bulk Import
You can export these DataFrames as CSVs for Neo4j's [LOAD CSV](https://neo4j.com/docs/cypher-manual/current/clauses/load-csv/) command.

In [None]:
customers.to_csv('../data/customers.csv', index=False)
products.to_csv('../data/products.csv', index=False)
transactions.to_csv('../data/transactions.csv', index=False)
transaction_products.to_csv('../data/transaction_products.csv', index=False)

## Next: Cypher Scripts for Neo4j Import
- Use the exported CSVs and write Cypher queries to create nodes and relationships in Neo4j.