# Ensuring Consistency

**Activity Overview**: Ensure consistency by identifying and resolving conflicting values across datasets.

## Title: Customer Address Discrepancies

**Task**: Address customer address mismatches between CRM and marketing databases.

**Steps**:
1. Compare customer addresses in the CRM with those in the marketing database.
2. Identify records with conflicting address information.
3. Propose a method to consolidate records with verified addresses.

In [1]:
# Write your code from here
import pandas as pd

print("\n--- Customer Address Mismatch Identification ---")

# --- Configuration ---
CRM_DATA_PATH = 'crm_customers.csv'         # Path to your CRM customer data
MARKETING_DATA_PATH = 'marketing_customers.csv' # Path to your marketing customer data
CUSTOMER_ID_COLUMN = 'customer_id'         # Common column to identify customers
ADDRESS_COLUMN_CRM = 'crm_address'          # Address column in CRM data
ADDRESS_COLUMN_MARKETING = 'marketing_address' # Address column in marketing data

try:
    # Load the datasets
    crm_data = pd.read_csv(CRM_DATA_PATH)
    marketing_data = pd.read_csv(MARKETING_DATA_PATH)

    # Merge the two datasets on the customer ID
    merged_df = pd.merge(crm_data, marketing_data, on=CUSTOMER_ID_COLUMN, suffixes=('_crm', '_marketing'), how='inner')

    # Identify records with conflicting address information
    conflicting_addresses_df = merged_df[merged_df[ADDRESS_COLUMN_CRM] != merged_df[ADDRESS_COLUMN_MARKETING]]

    if conflicting_addresses_df.empty:
        print("No conflicting customer addresses found between the CRM and marketing databases (for matching customer IDs).")
    else:
        print("\nCustomers with conflicting address information:")
        print(conflicting_addresses_df[[CUSTOMER_ID_COLUMN, ADDRESS_COLUMN_CRM, ADDRESS_COLUMN_MARKETING]])

    print("\n--- Proposed Method to Consolidate Records with Verified Addresses ---")
    print("\n1. **Identify a 'Verified Address' Source:** Determine which database (CRM or marketing) is more likely to contain the most accurate and up-to-date address information. This might involve checking for timestamps, data entry protocols, or specific data quality checks performed on either database.")
    print("\n2. **Prioritize Verified Addresses:**")
    print("   - If one database is deemed the 'verified' source, prioritize its address during consolidation.")
    print("   - If there's no single verified source, consider implementing a rule-based system:")
    print("     - Prioritize addresses that have been recently updated.")
    print("     - Prioritize addresses that have more complete information (e.g., fewer missing fields).")
    print("     - Potentially use a third-party address verification service to validate and standardize addresses from both sources.")
    print("\n3. **Consolidation Process:**")
    print("   - For records with matching 'customer_id' and conflicting addresses:")
    print("     - If a verified address source is identified, update the address in the non-verified system with the address from the verified source.")
    print("     - If using a rule-based system, apply the rules to determine the most appropriate address to retain or use for updating.")
    print("\n4. **Logging and Auditing:** Maintain a log of all address conflicts and the resolution steps taken. This helps in understanding the extent of the issue and tracking the impact of the consolidation process.")
    print("\n5. **Data Governance:** Implement clear data governance policies and procedures for address management in both the CRM and marketing databases to prevent future inconsistencies. This might include standardized address formats, validation rules at the point of entry, and regular data quality checks.")

except FileNotFoundError as e:
    print(f"Error: One or both of the CSV files ('{CRM_DATA_PATH}', '{MARKETING_DATA_PATH}') were not found: {e}")
except KeyError as e:
    print(f"Error: One or more of the specified column names were not found in the CSV files. Please check: '{CUSTOMER_ID_COLUMN}', '{ADDRESS_COLUMN_CRM}', '{ADDRESS_COLUMN_MARKETING}'")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


--- Customer Address Mismatch Identification ---
Error: One or both of the CSV files ('crm_customers.csv', 'marketing_customers.csv') were not found: [Errno 2] No such file or directory: 'crm_customers.csv'
