# 🔐 Encrypto-Anonymizer

**A simple Google Colab-based tool by Elizabeth Shamblin to anonymize and encrypt CSV data securely.**

## 🔒 About the Encrypto-Anonymizer
- Uploaded files are stored **temporarily** on the virtual machine running your Colab session.
- Only **you** (the logged-in Google account) have access to them during that session.
- Files are **not shared** with Google, the public, or other users unless you explicitly do so.
- Colab does **not store** those files persistently — they are deleted when the session ends (e.g., if idle for 90 minutes or closed).


## 📁 Step 1: Upload a CSV file

In [None]:
from google.colab import files
import pandas as pd
import io

uploaded = files.upload()
df = pd.read_csv(io.BytesIO(list(uploaded.values())[0]))
df.head()

## 🎯 Step 2: Select columns to anonymize

In [None]:
columns_to_anonymize = ['Name', 'Email', 'ID']  # Change as needed
print("Selected columns to anonymize:", columns_to_anonymize)

## 🔀 Step 3: Anonymize and create encrypted mapping

**Options:**
- `use_composite_key = True`: Creates a single anonymous ID for all selected columns in a row (linked anonymization)
- `use_composite_key = False`: Creates separate anonymous IDs for each value (traditional behavior)

The code now:
- Checks if values already exist in the mapping to avoid duplicates
- Can create composite keys for multiple fields
- Maintains a reverse mapping for efficiency

In [None]:
import uuid
from cryptography.fernet import Fernet
import json

# Generate a secure encryption key
fernet_key = Fernet.generate_key()
fernet = Fernet(fernet_key)

# Single mapping dictionary to store all mappings
mapping = {}
# Reverse mapping to check if a value already has an ID
reverse_mapping = {}

anon_df = df.copy()

# Check if we want to create a composite key for all selected columns
use_composite_key = True  # Set to False if you want separate mappings per value

if use_composite_key and len(columns_to_anonymize) > 1:
    # Create composite key based on all selected columns
    print("Using composite key for multiple columns...")
    for index in range(len(df)):
        # Create a composite key from all anonymized columns for this row
        composite_parts = []
        for col in columns_to_anonymize:
            val = str(df.iloc[index][col])
            composite_parts.append(val)
        
        # Create a unique identifier for this combination
        composite_key = "|".join(composite_parts)
        
        # Check if this combination already has an ID
        if composite_key in reverse_mapping:
            anon_id = reverse_mapping[composite_key]
        else:
            anon_id = str(uuid.uuid4())
            # Encrypt each value separately but store under same ID
            encrypted_values = {}
            for i, col in enumerate(columns_to_anonymize):
                encrypted_val = fernet.encrypt(composite_parts[i].encode()).decode()
                encrypted_values[col] = encrypted_val
            
            mapping[anon_id] = encrypted_values
            reverse_mapping[composite_key] = anon_id
        
        # Apply the same anonymous ID to all columns for this row
        for col in columns_to_anonymize:
            anon_df.loc[index, col] = anon_id
else:
    # Original behavior - separate mapping per value
    print("Using separate mappings per value...")
    for col in columns_to_anonymize:
        new_ids = []
        for val in df[col]:
            str_val = str(val)
            
            # Check if this value already has a mapping
            if str_val in reverse_mapping:
                anon_id = reverse_mapping[str_val]
            else:
                anon_id = str(uuid.uuid4())
                encrypted_val = fernet.encrypt(str_val.encode()).decode()
                mapping[anon_id] = encrypted_val
                reverse_mapping[str_val] = anon_id
            
            new_ids.append(anon_id)
        anon_df[col] = new_ids

anon_df.to_csv("anonymized.csv", index=False)
with open("mapping.json", "w") as f:
    json.dump(mapping, f)
with open("key.key", "wb") as f:
    f.write(fernet_key)
print("✅ Done anonymizing.")

## 💾 Step 4: Download anonymized and encrypted files

In [None]:
files.download("anonymized.csv")
files.download("mapping.json")
files.download("key.key")

## 🔓 Step 5: Upload anonymized file + mapping + key to de-anonymize

In [None]:
uploaded = files.upload()
anon_df = pd.read_csv("anonymized.csv")
with open("mapping.json") as f:
    mapping = json.load(f)
with open("key.key", "rb") as f:
    fernet = Fernet(f.read())

## 🔁 Step 6: De-anonymize and restore original values

In [None]:
restored_df = anon_df.copy()

# Check the structure of the mapping to determine how to decrypt
for col in columns_to_anonymize:
    original_values = []
    for anon_id in anon_df[col]:
        if isinstance(mapping[anon_id], dict):
            # Composite key mapping - extract value for specific column
            encrypted_val = mapping[anon_id][col]
            decrypted_val = fernet.decrypt(encrypted_val.encode()).decode()
        else:
            # Simple mapping - decrypt direct value
            decrypted_val = fernet.decrypt(mapping[anon_id].encode()).decode()
        original_values.append(decrypted_val)
    restored_df[col] = original_values

restored_df.to_csv("restored.csv", index=False)
print("✅ Restoration complete.")

## 📥 Step 7: Download the restored file

In [None]:
files.download("restored.csv")