# Data Anonymization

## Introduction

In our increasingly data-centric world, organizations handle a wide variety of sensitive customer information, such as personal identifiers, financial records, and demographic variables. While this information provides valuable insights, it also raises significant privacy risks. To address these challenges, **data anonymization** is employed to protect personal or sensitive data within a dataset. The goal of anonymization is to preserve the dataset's utility while removing identifiable information to comply with regulations and ensure user privacy (e.g., GDPR, CCPA).

The dataset used for this case study, `mobile_customers (1).xlsx`, included customer profiles containing sensitive details such as addresses, email addresses, credit card information, job roles, and more. This report outlines the systematic anonymization of this dataset to eliminate the risk of exposing sensitive information while ensuring the analysis and utility of non-sensitive fields remain unaffected.

## Objectives and Tasks
The goal of this project is to:
1. **Understand and clean the dataset**: Identify sensitive columns and remove unnecessary ones.
2. **Anonymize personal identifiers**: Replace direct identifiers such as usernames, emails, names, and more with pseudonymous or hashed values.
3. **Mask sensitive numeric data**: Implement data masking for sensitive numbers (e.g., credit card numbers).
4. **Introduce generalization**: Group numeric values like age and salary into meaningful categories for indirect anonymization.
5. **Add noise**: Slightly alter timestamps while retaining their chronological utility.
6. **Generate fiction**: Replace personal data like names and addresses with fictitious, realistic values.
7. **Export data**: Save the anonymized dataset to ensure it can be safely shared for downstream analysis.

Below is a step-by-step description of what was performed and why.

### Importing Libraries

In [1]:
import openpyxl
import pandas as pd
import numpy as np
import hashlib
import random
import string


### 1. **Dataset Overview and Loading**
The dataset named `mobile_customers (1).xlsx` was loaded into a Pandas DataFrame. Upon initial inspection, the dataset contained 18 columns with personal information such as date of birth, email addresses, credit card numbers, residence information, job details, and more.


In [2]:
data = pd.read_excel('mobile_customers (1).xlsx')
data.head()

Unnamed: 0.1,Unnamed: 0,customer_id,date_registered,username,name,gender,address,email,birthdate,current_location,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,24c9d2d0-d0d3-4a90-9a3a-e00e4aac99bd,2021-09-29,robertsbryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",marcus58@hotmail.com,1978-03-11,"['78.937112', '71.260464']","195 Brandi Junctions\nNew Julieberg, NE 63410","Byrd, Welch and Holt",Chief Technology Officer,49,53979,VISA 19 digit,38985874269846,994,2023-10-27 00:00:00
1,1,7b2bc220-0296-4914-ba46-d6cc6a55a62a,2019-08-17,egarcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",alexanderkathy@hotmail.com,1970-11-29,"['-24.1692185', '100.746122']","58272 Brown Isle Apt. 698\nPort Michael, HI 04693",Hurst PLC,Data scientist,43,81510,Discover,6525743622515979,163,2023-07-30 00:00:00
2,2,06febdf9-07fb-4a1b-87d7-a5f97d9a5faf,2019-11-01,turnermegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vwood@gmail.com,2009-04-23,"['8.019908', '-19.603269']","36848 Jones Lane Suite 282\nMarquezbury, ID 26822","Mora, Caldwell and Guerrero",Chief Operating Officer,47,205345,VISA 16 digit,4010729915028682247,634,2023-04-26 00:00:00
3,3,23df88e5-5dd3-46af-ac0d-0c6bd92e4b96,2021-12-31,richardcampbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",kathleen36@gmail.com,1992-07-27,"['62.497506', '2.717198']","317 Lamb Cape Apt. 884\nLake Amy, DC 79074",Patel PLC,Counselling psychologist,34,116095,VISA 16 digit,4854862659569207844,7957,2023-10-31 00:00:00
4,4,6069c2d7-7905-4993-a155-64f6aba143b1,2020-08-09,timothyjackson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",johnbest@hotmail.com,1989-09-16,"['73.7924695', '-80.314720']","21936 Mary Islands\nMendozafort, TN 37124",Smith-Mejia,Mining engineer,57,107529,JCB 16 digit,213152724828217,72,2023-05-28 00:00:00


### 2. **Data Cleaning**
- Columns `customer_id` and `current_location` were removed as they were deemed unnecessary for analysis.


In [3]:
# Correcting the code to use the proper pandas method 'drop' instead of 'drop_columns'.
data = data.drop(['customer_id', 'current_location'], axis=1)
data.head()

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29,robertsbryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",marcus58@hotmail.com,1978-03-11,"195 Brandi Junctions\nNew Julieberg, NE 63410","Byrd, Welch and Holt",Chief Technology Officer,49,53979,VISA 19 digit,38985874269846,994,2023-10-27 00:00:00
1,1,2019-08-17,egarcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",alexanderkathy@hotmail.com,1970-11-29,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",Hurst PLC,Data scientist,43,81510,Discover,6525743622515979,163,2023-07-30 00:00:00
2,2,2019-11-01,turnermegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vwood@gmail.com,2009-04-23,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822","Mora, Caldwell and Guerrero",Chief Operating Officer,47,205345,VISA 16 digit,4010729915028682247,634,2023-04-26 00:00:00
3,3,2021-12-31,richardcampbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",kathleen36@gmail.com,1992-07-27,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",Patel PLC,Counselling psychologist,34,116095,VISA 16 digit,4854862659569207844,7957,2023-10-31 00:00:00
4,4,2020-08-09,timothyjackson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",johnbest@hotmail.com,1989-09-16,"21936 Mary Islands\nMendozafort, TN 37124",Smith-Mejia,Mining engineer,57,107529,JCB 16 digit,213152724828217,72,2023-05-28 00:00:00


### 3. **Anonymizing Identifiers**
- **Usernames**: Each username was replaced with a masked version showing only the last four characters preceded by "xxxx".
  
- **Emails**: Email addresses were partially masked, revealing only the first two characters and the last character before the domain (e.g., `example@gmail.com` → `ex***e@gmail.com`).


In [4]:
data['username'] = data['username'].apply(lambda x: "xxxx" + x[-4:])
data.head()

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29,xxxxryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",marcus58@hotmail.com,1978-03-11,"195 Brandi Junctions\nNew Julieberg, NE 63410","Byrd, Welch and Holt",Chief Technology Officer,49,53979,VISA 19 digit,38985874269846,994,2023-10-27 00:00:00
1,1,2019-08-17,xxxxrcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",alexanderkathy@hotmail.com,1970-11-29,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",Hurst PLC,Data scientist,43,81510,Discover,6525743622515979,163,2023-07-30 00:00:00
2,2,2019-11-01,xxxxegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vwood@gmail.com,2009-04-23,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822","Mora, Caldwell and Guerrero",Chief Operating Officer,47,205345,VISA 16 digit,4010729915028682247,634,2023-04-26 00:00:00
3,3,2021-12-31,xxxxbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",kathleen36@gmail.com,1992-07-27,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",Patel PLC,Counselling psychologist,34,116095,VISA 16 digit,4854862659569207844,7957,2023-10-31 00:00:00
4,4,2020-08-09,xxxxkson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",johnbest@hotmail.com,1989-09-16,"21936 Mary Islands\nMendozafort, TN 37124",Smith-Mejia,Mining engineer,57,107529,JCB 16 digit,213152724828217,72,2023-05-28 00:00:00


In [5]:
def mask_email(email):
    parts = email.split('@')
    masked_email = parts[0][:2] + '***' + parts[0][-1]
    return masked_email + '@' + parts[1]

data['email'] = data['email'].apply(lambda x: mask_email(x))
data.head()

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29,xxxxryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",ma***8@hotmail.com,1978-03-11,"195 Brandi Junctions\nNew Julieberg, NE 63410","Byrd, Welch and Holt",Chief Technology Officer,49,53979,VISA 19 digit,38985874269846,994,2023-10-27 00:00:00
1,1,2019-08-17,xxxxrcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",al***y@hotmail.com,1970-11-29,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",Hurst PLC,Data scientist,43,81510,Discover,6525743622515979,163,2023-07-30 00:00:00
2,2,2019-11-01,xxxxegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vw***d@gmail.com,2009-04-23,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822","Mora, Caldwell and Guerrero",Chief Operating Officer,47,205345,VISA 16 digit,4010729915028682247,634,2023-04-26 00:00:00
3,3,2021-12-31,xxxxbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",ka***6@gmail.com,1992-07-27,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",Patel PLC,Counselling psychologist,34,116095,VISA 16 digit,4854862659569207844,7957,2023-10-31 00:00:00
4,4,2020-08-09,xxxxkson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",jo***t@hotmail.com,1989-09-16,"21936 Mary Islands\nMendozafort, TN 37124",Smith-Mejia,Mining engineer,57,107529,JCB 16 digit,213152724828217,72,2023-05-28 00:00:00


### 4. **Adding Noise to Sensitive Dates**
- Random noise (Gaussian distribution) was added to `date_registered` and `birthdate` values to slightly alter the recorded dates and preserve anonymity.


In [6]:
def add_noise(date, mean=0, std=1):
    noise = np.random.normal(mean, std)
    return date + pd.Timedelta(days=noise)

data['date_registered'] = data[['date_registered']].map(lambda x: add_noise(x))
data['birthdate'] = data[['birthdate']].map(lambda x: add_noise(x))
data.head()

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29 16:02:51.910266180,xxxxryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",ma***8@hotmail.com,1978-03-09 10:33:20.339336707,"195 Brandi Junctions\nNew Julieberg, NE 63410","Byrd, Welch and Holt",Chief Technology Officer,49,53979,VISA 19 digit,38985874269846,994,2023-10-27 00:00:00
1,1,2019-08-17 00:55:52.145228905,xxxxrcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",al***y@hotmail.com,1970-11-28 23:39:11.800717110,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",Hurst PLC,Data scientist,43,81510,Discover,6525743622515979,163,2023-07-30 00:00:00
2,2,2019-10-30 11:51:05.626300584,xxxxegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vw***d@gmail.com,2009-04-23 12:15:08.990432954,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822","Mora, Caldwell and Guerrero",Chief Operating Officer,47,205345,VISA 16 digit,4010729915028682247,634,2023-04-26 00:00:00
3,3,2021-12-31 06:58:52.128024719,xxxxbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",ka***6@gmail.com,1992-07-28 04:57:17.719330116,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",Patel PLC,Counselling psychologist,34,116095,VISA 16 digit,4854862659569207844,7957,2023-10-31 00:00:00
4,4,2020-08-09 01:09:37.538258296,xxxxkson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",jo***t@hotmail.com,1989-09-18 03:20:12.101412079,"21936 Mary Islands\nMendozafort, TN 37124",Smith-Mejia,Mining engineer,57,107529,JCB 16 digit,213152724828217,72,2023-05-28 00:00:00


### 5. **Binning Numerical Columns**
- **Ages**:
  - Original numeric ages were grouped into defined bins representing specific age ranges (e.g., '18-25', '26-30').
 
- **Salaries**:
  - Salaries were grouped into predefined bins representing salary ranges (e.g., '<30000', '30001-40000').

Both columns were converted into categorical variables.


In [7]:
data[['salary','age']].describe()

Unnamed: 0,salary,age
count,10000.0,10000.0
mean,133657.8028,41.618
std,64635.054802,13.858812
min,20014.0,18.0
25%,77959.5,30.0
50%,133583.5,42.0
75%,189975.5,54.0
max,244926.0,65.0


In [8]:
age_range = [18, 25, 30, 35, 40, 45, 50, 55, 60, 65]
age_labels = ['18-25', '26-30', '31-35', '36-40', '41-45', '46-50', '51-55', '56-60', '61-65']
data['age'] = pd.cut(data['age'], bins=age_range, labels=age_labels)

salary_range = [20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 150000, 200000, 245000]
salary_labels = ['<30000', '30001-40000', '40001-50000', '50001-60000', '60001-70000', '70001-80000', '80001-90000', '90001-100000', '100000-110000', '110000-120000', '120000-150000', '150000-200000', ">200000"]
data['salary'] = pd.cut(data['salary'], bins=salary_range, labels=salary_labels)

data

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29 16:02:51.910266180,xxxxryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",ma***8@hotmail.com,1978-03-09 10:33:20.339336707,"195 Brandi Junctions\nNew Julieberg, NE 63410","Byrd, Welch and Holt",Chief Technology Officer,46-50,50001-60000,VISA 19 digit,38985874269846,994,2023-10-27 00:00:00
1,1,2019-08-17 00:55:52.145228905,xxxxrcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",al***y@hotmail.com,1970-11-28 23:39:11.800717110,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",Hurst PLC,Data scientist,41-45,80001-90000,Discover,6525743622515979,163,2023-07-30 00:00:00
2,2,2019-10-30 11:51:05.626300584,xxxxegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vw***d@gmail.com,2009-04-23 12:15:08.990432954,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822","Mora, Caldwell and Guerrero",Chief Operating Officer,46-50,>200000,VISA 16 digit,4010729915028682247,634,2023-04-26 00:00:00
3,3,2021-12-31 06:58:52.128024719,xxxxbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",ka***6@gmail.com,1992-07-28 04:57:17.719330116,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",Patel PLC,Counselling psychologist,31-35,110000-120000,VISA 16 digit,4854862659569207844,7957,2023-10-31 00:00:00
4,4,2020-08-09 01:09:37.538258296,xxxxkson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",jo***t@hotmail.com,1989-09-18 03:20:12.101412079,"21936 Mary Islands\nMendozafort, TN 37124",Smith-Mejia,Mining engineer,56-60,100000-110000,JCB 16 digit,213152724828217,72,2023-05-28 00:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9995,2021-09-11 21:25:57.781292318,xxxxrick,Courtney Li,F,Unit 5088 Box 8191\nDPO AA 18373,jl***n@yahoo.com,1975-07-13 10:39:06.918666491,"53290 Jessica Pike Suite 808\nNorth Patricia, ...",Palmer Inc,Press photographer,56-60,>200000,Diners Club / Carte Blanche,4522452929113284,448,2023-06-27 00:00:00
9996,9996,2020-08-13 09:38:18.046859497,xxxxinez,Terri Hawkins,F,"38388 Joseph Drive Apt. 442\nPort Thomas, NH 1...",re***z@gmail.com,1940-10-19 09:47:02.910630178,"21271 Anthony Ports Apt. 438\nButlerport, ND 7...","Adams, Mckee and Maldonado",Phytotherapist,31-35,<30000,VISA 13 digit,4647267952761431,227,2023-12-31 00:00:00
9997,9997,2019-10-23 18:25:06.030813853,xxxxelps,Crystal Patterson,F,"4880 Newman Square\nColeberg, AK 41248",br***a@gmail.com,1994-12-25 09:36:45.398627243,"7949 Simmons Cape\nNew Glendaborough, NJ 90980",Fernandez Group,Chartered accountant,18-25,>200000,Discover,2286190388812018,877,2023-06-24 00:00:00
9998,9998,2020-08-14 07:06:51.864800998,xxxxavis,Jeff Edwards,M,"68770 Wright Plaza\nStevemouth, NM 11812",wr***d@gmail.com,1961-03-07 05:15:13.243782645,"3421 Katherine Wall\nEast Markside, NE 95730","Warner, Munoz and Franklin",Oceanographer,41-45,>200000,American Express,3582143232994788,92,2023-04-28 00:00:00


### 6. **Generate Tokens for Anonymous Fields**
Hash-based tokens were generated to replace sensitive fields:
- `credit_card_provider`, `credit_card_expire`, `job`, and `employer` were hashed using the SHA-256 algorithm combined with random salts for irreversible anonymization.


In [9]:
def generate_token(number):
    salt = ''.join(random.choices(string.ascii_uppercase + string.digits, k=10)) # Corrected from random.choice to random.choices
    token = hashlib.sha256((salt + str(number)).encode()).hexdigest() # Ensure card_number is a string before concatenation
    return token

data['credit_card_provider'] = data['credit_card_provider'].apply(lambda x: generate_token(str(x)))
data['credit_card_expire'] = data['credit_card_expire'].apply(lambda x: generate_token(str(x)))
data['job'] = data['job'].apply(lambda x: generate_token(str(x)))
data['employer'] = data['employer'].apply(lambda x: generate_token(str(x)))
data

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29 16:02:51.910266180,xxxxryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",ma***8@hotmail.com,1978-03-09 10:33:20.339336707,"195 Brandi Junctions\nNew Julieberg, NE 63410",fa474989ab3cbb8b3e5578fb639ecb275566be748717cf...,780ef1cddab0acc07c1861142f4e0d2557b4a8b7713719...,46-50,50001-60000,309e509b32f8f0cf45c1c5084b68aff8d4475d596ff93e...,38985874269846,994,bfa5db102a184db6afff6c5c614f38bd77855244a43a0d...
1,1,2019-08-17 00:55:52.145228905,xxxxrcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",al***y@hotmail.com,1970-11-28 23:39:11.800717110,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",b0328276a35af46ff84b19662162d17b1f9fe5d14f9be2...,8ec51f27df97d71e25e893d7c28b5bb0eb69d643be2c5a...,41-45,80001-90000,f4409b815648beed14eb18d269c1128378dcea42edd27b...,6525743622515979,163,fecac3b31b566ae64a5fe0c5e8cf5da6178d76ad5abf0b...
2,2,2019-10-30 11:51:05.626300584,xxxxegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vw***d@gmail.com,2009-04-23 12:15:08.990432954,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822",48e7a3ce26a0741766409002bfbf507c5f64e63664d055...,a56f2edec7eaf4f3ed21487c275e164327bea886d4bc68...,46-50,>200000,3f6c7a72ea2c7e24d92042dcbb00c0f535431581904a8c...,4010729915028682247,634,ec2b9fe5b83d5931730562b45a396a115e3248f41b07f7...
3,3,2021-12-31 06:58:52.128024719,xxxxbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",ka***6@gmail.com,1992-07-28 04:57:17.719330116,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",8a6b93a1708b7ac2e732d9702a4b7513a044533806a27d...,f856a6f9e8ee26bf4454d998f898460c8c59224a5edd4f...,31-35,110000-120000,4346a0e6f53e47181e32bc4791711e0241d7eb17bb9eb9...,4854862659569207844,7957,a77c43ed9488f42d69c3b6f0c8766089d344478a9ba3ed...
4,4,2020-08-09 01:09:37.538258296,xxxxkson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",jo***t@hotmail.com,1989-09-18 03:20:12.101412079,"21936 Mary Islands\nMendozafort, TN 37124",7559b5c39bd50b4b8a7a1bc0402694081d359d20979447...,bdc3983a4aa8746ee5bc2762c6722faa926ae06e079f3d...,56-60,100000-110000,a72c8d53d74f55014f43737a955aa7c44eb1d77f8bb456...,213152724828217,72,3f815be9492cf03a39884dd8711af289dceea4880b6912...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9995,2021-09-11 21:25:57.781292318,xxxxrick,Courtney Li,F,Unit 5088 Box 8191\nDPO AA 18373,jl***n@yahoo.com,1975-07-13 10:39:06.918666491,"53290 Jessica Pike Suite 808\nNorth Patricia, ...",b938372c57043c3a59fdf0de13f476f6067f5bc3e492be...,af11adeabc5e0e25fabbbb9f6608bba4db41c1203e76fe...,56-60,>200000,ff53a793d9b84e8313f686b8e3cb4d7b9463e9cad14bed...,4522452929113284,448,89153eb5916ccb74ded5022bb94a86d51affa7bd2033c4...
9996,9996,2020-08-13 09:38:18.046859497,xxxxinez,Terri Hawkins,F,"38388 Joseph Drive Apt. 442\nPort Thomas, NH 1...",re***z@gmail.com,1940-10-19 09:47:02.910630178,"21271 Anthony Ports Apt. 438\nButlerport, ND 7...",b98dbaefcf87a986a968e01976a038098fc979b556f865...,89ab56b603812ae51a660de6ff4f60d4019d8a28f18f99...,31-35,<30000,1273f1eb6335e0e36690c85768be0ca5ba74ed76b6c339...,4647267952761431,227,20e47399d1180c957a08658ee96b70d30c98508a5d8028...
9997,9997,2019-10-23 18:25:06.030813853,xxxxelps,Crystal Patterson,F,"4880 Newman Square\nColeberg, AK 41248",br***a@gmail.com,1994-12-25 09:36:45.398627243,"7949 Simmons Cape\nNew Glendaborough, NJ 90980",484369430816ddc7967c0a66ee63b7c6fe68598c065e7b...,afe5939172ea55d43f7a4d1ac1a16d827fb45fe1f45a1f...,18-25,>200000,c77f20647df43e260f67210634a11ff6a2457461346490...,2286190388812018,877,dd541b921d3c3c6ae3ba6896457b0a602fc1fd15cd0bad...
9998,9998,2020-08-14 07:06:51.864800998,xxxxavis,Jeff Edwards,M,"68770 Wright Plaza\nStevemouth, NM 11812",wr***d@gmail.com,1961-03-07 05:15:13.243782645,"3421 Katherine Wall\nEast Markside, NE 95730",42b4fd7d24b6f4321f9720c80565452cfd422d45b2e815...,0679d0cdf79e1dc10d5cc60afa13551418ab12421c0395...,41-45,>200000,9212f64ab05fe06173339f56ddd952a509b84bf0365e21...,3582143232994788,92,088b75bdab822bdd07a06ae0d20fc3f97ac4847a0ce9c3...


### 7. **Masking Credit Card Details**
- **Credit Card Numbers**: All but the last 4 digits were replaced with random sequences based on a fixed seed to create consistency.
- **Security Codes**: Similar masking logic was applied to security codes.


In [10]:
def mask_numbers(number):
    np.random.seed(int(number[-4:]))
    mask = ''.join(np.random.choice(list(string.digits), size=4, replace=False))
    return mask + number[-4:]

# Apply the fixed function
data['credit_card_number'] = data['credit_card_number'].apply(lambda x: mask_numbers(str(x)))
data['credit_card_security_code'] = data['credit_card_security_code'].apply(lambda x: mask_numbers(str(x)))
data

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29 16:02:51.910266180,xxxxryan,Jonathan Snyder,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",ma***8@hotmail.com,1978-03-09 10:33:20.339336707,"195 Brandi Junctions\nNew Julieberg, NE 63410",fa474989ab3cbb8b3e5578fb639ecb275566be748717cf...,780ef1cddab0acc07c1861142f4e0d2557b4a8b7713719...,46-50,50001-60000,309e509b32f8f0cf45c1c5084b68aff8d4475d596ff93e...,54079846,8240994,bfa5db102a184db6afff6c5c614f38bd77855244a43a0d...
1,1,2019-08-17 00:55:52.145228905,xxxxrcia,Susan Dominguez,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",al***y@hotmail.com,1970-11-28 23:39:11.800717110,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",b0328276a35af46ff84b19662162d17b1f9fe5d14f9be2...,8ec51f27df97d71e25e893d7c28b5bb0eb69d643be2c5a...,41-45,80001-90000,f4409b815648beed14eb18d269c1128378dcea42edd27b...,89455979,9506163,fecac3b31b566ae64a5fe0c5e8cf5da6178d76ad5abf0b...
2,2,2019-10-30 11:51:05.626300584,xxxxegan,Corey Hebert,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vw***d@gmail.com,2009-04-23 12:15:08.990432954,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822",48e7a3ce26a0741766409002bfbf507c5f64e63664d055...,a56f2edec7eaf4f3ed21487c275e164327bea886d4bc68...,46-50,>200000,3f6c7a72ea2c7e24d92042dcbb00c0f535431581904a8c...,75422247,7618634,ec2b9fe5b83d5931730562b45a396a115e3248f41b07f7...
3,3,2021-12-31 06:58:52.128024719,xxxxbell,Latasha Griffin,F,"PSC 6217, Box 2610\nAPO AA 53585",ka***6@gmail.com,1992-07-28 04:57:17.719330116,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",8a6b93a1708b7ac2e732d9702a4b7513a044533806a27d...,f856a6f9e8ee26bf4454d998f898460c8c59224a5edd4f...,31-35,110000-120000,4346a0e6f53e47181e32bc4791711e0241d7eb17bb9eb9...,56147844,29477957,a77c43ed9488f42d69c3b6f0c8766089d344478a9ba3ed...
4,4,2020-08-09 01:09:37.538258296,xxxxkson,Colleen Wheeler,F,"0325 Potter Roads\nLake Lisashire, NM 77502",jo***t@hotmail.com,1989-09-18 03:20:12.101412079,"21936 Mary Islands\nMendozafort, TN 37124",7559b5c39bd50b4b8a7a1bc0402694081d359d20979447...,bdc3983a4aa8746ee5bc2762c6722faa926ae06e079f3d...,56-60,100000-110000,a72c8d53d74f55014f43737a955aa7c44eb1d77f8bb456...,78948217,041972,3f815be9492cf03a39884dd8711af289dceea4880b6912...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9995,2021-09-11 21:25:57.781292318,xxxxrick,Courtney Li,F,Unit 5088 Box 8191\nDPO AA 18373,jl***n@yahoo.com,1975-07-13 10:39:06.918666491,"53290 Jessica Pike Suite 808\nNorth Patricia, ...",b938372c57043c3a59fdf0de13f476f6067f5bc3e492be...,af11adeabc5e0e25fabbbb9f6608bba4db41c1203e76fe...,56-60,>200000,ff53a793d9b84e8313f686b8e3cb4d7b9463e9cad14bed...,54203284,3254448,89153eb5916ccb74ded5022bb94a86d51affa7bd2033c4...
9996,9996,2020-08-13 09:38:18.046859497,xxxxinez,Terri Hawkins,F,"38388 Joseph Drive Apt. 442\nPort Thomas, NH 1...",re***z@gmail.com,1940-10-19 09:47:02.910630178,"21271 Anthony Ports Apt. 438\nButlerport, ND 7...",b98dbaefcf87a986a968e01976a038098fc979b556f865...,89ab56b603812ae51a660de6ff4f60d4019d8a28f18f99...,31-35,<30000,1273f1eb6335e0e36690c85768be0ca5ba74ed76b6c339...,12871431,6157227,20e47399d1180c957a08658ee96b70d30c98508a5d8028...
9997,9997,2019-10-23 18:25:06.030813853,xxxxelps,Crystal Patterson,F,"4880 Newman Square\nColeberg, AK 41248",br***a@gmail.com,1994-12-25 09:36:45.398627243,"7949 Simmons Cape\nNew Glendaborough, NJ 90980",484369430816ddc7967c0a66ee63b7c6fe68598c065e7b...,afe5939172ea55d43f7a4d1ac1a16d827fb45fe1f45a1f...,18-25,>200000,c77f20647df43e260f67210634a11ff6a2457461346490...,03782018,7628877,dd541b921d3c3c6ae3ba6896457b0a602fc1fd15cd0bad...
9998,9998,2020-08-14 07:06:51.864800998,xxxxavis,Jeff Edwards,M,"68770 Wright Plaza\nStevemouth, NM 11812",wr***d@gmail.com,1961-03-07 05:15:13.243782645,"3421 Katherine Wall\nEast Markside, NE 95730",42b4fd7d24b6f4321f9720c80565452cfd422d45b2e815...,0679d0cdf79e1dc10d5cc60afa13551418ab12421c0395...,41-45,>200000,9212f64ab05fe06173339f56ddd952a509b84bf0365e21...,10964788,705192,088b75bdab822bdd07a06ae0d20fc3f97ac4847a0ce9c3...


### 8. **Fictitious Personal Information**
- **Names**: Names were replaced with random combinations of first names (`John`, `Jane`, etc.) and last names (`Doe`, `Smith`, etc.).
- **Residence and Address**: Realistic but fictitious addresses were substituted for original values, providing diverse locations.


In [11]:
# Define sample first names and last names lists
first_names = ['John', 'Jane', 'Alice', 'Bob', 'Carol', 'Eve', 'Frank', 'Grace']
last_names = ['Doe', 'Smith', 'Johnson', 'Brown', 'Taylor', 'Anderson', 'Harris', 'Clark']

def random_name(name):
    first_name = random.choice(first_names)
    last_name = random.choice(last_names)
    return first_name + ' ' + last_name

# Apply the fixed function to the 'name' column
data['name'] = data['name'].apply(lambda x: random_name(x))
data

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29 16:02:51.910266180,xxxxryan,Carol Harris,M,"24675 Susan Valley\nNorth Dianabury, MO 02475",ma***8@hotmail.com,1978-03-09 10:33:20.339336707,"195 Brandi Junctions\nNew Julieberg, NE 63410",fa474989ab3cbb8b3e5578fb639ecb275566be748717cf...,780ef1cddab0acc07c1861142f4e0d2557b4a8b7713719...,46-50,50001-60000,309e509b32f8f0cf45c1c5084b68aff8d4475d596ff93e...,54079846,8240994,bfa5db102a184db6afff6c5c614f38bd77855244a43a0d...
1,1,2019-08-17 00:55:52.145228905,xxxxrcia,Grace Doe,F,"4212 Cheryl Inlet\nPort Davidmouth, NC 54884",al***y@hotmail.com,1970-11-28 23:39:11.800717110,"58272 Brown Isle Apt. 698\nPort Michael, HI 04693",b0328276a35af46ff84b19662162d17b1f9fe5d14f9be2...,8ec51f27df97d71e25e893d7c28b5bb0eb69d643be2c5a...,41-45,80001-90000,f4409b815648beed14eb18d269c1128378dcea42edd27b...,89455979,9506163,fecac3b31b566ae64a5fe0c5e8cf5da6178d76ad5abf0b...
2,2,2019-10-30 11:51:05.626300584,xxxxegan,Carol Doe,M,"07388 Coleman Prairie\nLake Amy, IA 78695",vw***d@gmail.com,2009-04-23 12:15:08.990432954,"36848 Jones Lane Suite 282\nMarquezbury, ID 26822",48e7a3ce26a0741766409002bfbf507c5f64e63664d055...,a56f2edec7eaf4f3ed21487c275e164327bea886d4bc68...,46-50,>200000,3f6c7a72ea2c7e24d92042dcbb00c0f535431581904a8c...,75422247,7618634,ec2b9fe5b83d5931730562b45a396a115e3248f41b07f7...
3,3,2021-12-31 06:58:52.128024719,xxxxbell,Bob Clark,F,"PSC 6217, Box 2610\nAPO AA 53585",ka***6@gmail.com,1992-07-28 04:57:17.719330116,"317 Lamb Cape Apt. 884\nLake Amy, DC 79074",8a6b93a1708b7ac2e732d9702a4b7513a044533806a27d...,f856a6f9e8ee26bf4454d998f898460c8c59224a5edd4f...,31-35,110000-120000,4346a0e6f53e47181e32bc4791711e0241d7eb17bb9eb9...,56147844,29477957,a77c43ed9488f42d69c3b6f0c8766089d344478a9ba3ed...
4,4,2020-08-09 01:09:37.538258296,xxxxkson,Carol Clark,F,"0325 Potter Roads\nLake Lisashire, NM 77502",jo***t@hotmail.com,1989-09-18 03:20:12.101412079,"21936 Mary Islands\nMendozafort, TN 37124",7559b5c39bd50b4b8a7a1bc0402694081d359d20979447...,bdc3983a4aa8746ee5bc2762c6722faa926ae06e079f3d...,56-60,100000-110000,a72c8d53d74f55014f43737a955aa7c44eb1d77f8bb456...,78948217,041972,3f815be9492cf03a39884dd8711af289dceea4880b6912...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9995,2021-09-11 21:25:57.781292318,xxxxrick,Bob Johnson,F,Unit 5088 Box 8191\nDPO AA 18373,jl***n@yahoo.com,1975-07-13 10:39:06.918666491,"53290 Jessica Pike Suite 808\nNorth Patricia, ...",b938372c57043c3a59fdf0de13f476f6067f5bc3e492be...,af11adeabc5e0e25fabbbb9f6608bba4db41c1203e76fe...,56-60,>200000,ff53a793d9b84e8313f686b8e3cb4d7b9463e9cad14bed...,54203284,3254448,89153eb5916ccb74ded5022bb94a86d51affa7bd2033c4...
9996,9996,2020-08-13 09:38:18.046859497,xxxxinez,Bob Brown,F,"38388 Joseph Drive Apt. 442\nPort Thomas, NH 1...",re***z@gmail.com,1940-10-19 09:47:02.910630178,"21271 Anthony Ports Apt. 438\nButlerport, ND 7...",b98dbaefcf87a986a968e01976a038098fc979b556f865...,89ab56b603812ae51a660de6ff4f60d4019d8a28f18f99...,31-35,<30000,1273f1eb6335e0e36690c85768be0ca5ba74ed76b6c339...,12871431,6157227,20e47399d1180c957a08658ee96b70d30c98508a5d8028...
9997,9997,2019-10-23 18:25:06.030813853,xxxxelps,Alice Harris,F,"4880 Newman Square\nColeberg, AK 41248",br***a@gmail.com,1994-12-25 09:36:45.398627243,"7949 Simmons Cape\nNew Glendaborough, NJ 90980",484369430816ddc7967c0a66ee63b7c6fe68598c065e7b...,afe5939172ea55d43f7a4d1ac1a16d827fb45fe1f45a1f...,18-25,>200000,c77f20647df43e260f67210634a11ff6a2457461346490...,03782018,7628877,dd541b921d3c3c6ae3ba6896457b0a602fc1fd15cd0bad...
9998,9998,2020-08-14 07:06:51.864800998,xxxxavis,Frank Johnson,M,"68770 Wright Plaza\nStevemouth, NM 11812",wr***d@gmail.com,1961-03-07 05:15:13.243782645,"3421 Katherine Wall\nEast Markside, NE 95730",42b4fd7d24b6f4321f9720c80565452cfd422d45b2e815...,0679d0cdf79e1dc10d5cc60afa13551418ab12421c0395...,41-45,>200000,9212f64ab05fe06173339f56ddd952a509b84bf0365e21...,10964788,705192,088b75bdab822bdd07a06ae0d20fc3f97ac4847a0ce9c3...


In [12]:
# Define fake residence and address lists
residences = [
    '123 Main St, Springfield, IL',
    '456 Elm St, Oakville, CA',
    '789 Maple St, Rivertown, NY',
    '321 Pine St, Lakeview, TX',
    '654 Cedar St, Mountainview, CO',
    '987 Birch St, Sunnyvale, FL',
    '210 Wood St, Greenfield, WI',
    '305 Ash St, Riverbend, WA'
]

addresses = [
    'Suite 101, 123 Main St, Springfield, IL',
    'Apt 14B, 456 Elm St, Oakville, CA',
    'Unit 5C, 789 Maple St, Rivertown, NY',
    'Building 3, 321 Pine St, Lakeview, TX',
    'Room 12, 654 Cedar St, Mountainview, CO',
    'Floor 7, 987 Birch St, Sunnyvale, FL',
    'Suite 20, 210 Wood St, Greenfield, WI',
    'Penthouse, 305 Ash St, Riverbend, WA'
]

def random_residence(address):
    return random.choice(residences)

def random_address(address):
    return random.choice(addresses)

# Apply the functions to generate fake values
data['residence'] = data['residence'].apply(lambda x: random_residence(x))
data['address'] = data['address'].apply(lambda x: random_address(x))
data

Unnamed: 0.1,Unnamed: 0,date_registered,username,name,gender,address,email,birthdate,residence,employer,job,age,salary,credit_card_provider,credit_card_number,credit_card_security_code,credit_card_expire
0,0,2021-09-29 16:02:51.910266180,xxxxryan,Carol Harris,M,"Penthouse, 305 Ash St, Riverbend, WA",ma***8@hotmail.com,1978-03-09 10:33:20.339336707,"123 Main St, Springfield, IL",fa474989ab3cbb8b3e5578fb639ecb275566be748717cf...,780ef1cddab0acc07c1861142f4e0d2557b4a8b7713719...,46-50,50001-60000,309e509b32f8f0cf45c1c5084b68aff8d4475d596ff93e...,54079846,8240994,bfa5db102a184db6afff6c5c614f38bd77855244a43a0d...
1,1,2019-08-17 00:55:52.145228905,xxxxrcia,Grace Doe,F,"Building 3, 321 Pine St, Lakeview, TX",al***y@hotmail.com,1970-11-28 23:39:11.800717110,"210 Wood St, Greenfield, WI",b0328276a35af46ff84b19662162d17b1f9fe5d14f9be2...,8ec51f27df97d71e25e893d7c28b5bb0eb69d643be2c5a...,41-45,80001-90000,f4409b815648beed14eb18d269c1128378dcea42edd27b...,89455979,9506163,fecac3b31b566ae64a5fe0c5e8cf5da6178d76ad5abf0b...
2,2,2019-10-30 11:51:05.626300584,xxxxegan,Carol Doe,M,"Suite 20, 210 Wood St, Greenfield, WI",vw***d@gmail.com,2009-04-23 12:15:08.990432954,"456 Elm St, Oakville, CA",48e7a3ce26a0741766409002bfbf507c5f64e63664d055...,a56f2edec7eaf4f3ed21487c275e164327bea886d4bc68...,46-50,>200000,3f6c7a72ea2c7e24d92042dcbb00c0f535431581904a8c...,75422247,7618634,ec2b9fe5b83d5931730562b45a396a115e3248f41b07f7...
3,3,2021-12-31 06:58:52.128024719,xxxxbell,Bob Clark,F,"Floor 7, 987 Birch St, Sunnyvale, FL",ka***6@gmail.com,1992-07-28 04:57:17.719330116,"654 Cedar St, Mountainview, CO",8a6b93a1708b7ac2e732d9702a4b7513a044533806a27d...,f856a6f9e8ee26bf4454d998f898460c8c59224a5edd4f...,31-35,110000-120000,4346a0e6f53e47181e32bc4791711e0241d7eb17bb9eb9...,56147844,29477957,a77c43ed9488f42d69c3b6f0c8766089d344478a9ba3ed...
4,4,2020-08-09 01:09:37.538258296,xxxxkson,Carol Clark,F,"Suite 101, 123 Main St, Springfield, IL",jo***t@hotmail.com,1989-09-18 03:20:12.101412079,"210 Wood St, Greenfield, WI",7559b5c39bd50b4b8a7a1bc0402694081d359d20979447...,bdc3983a4aa8746ee5bc2762c6722faa926ae06e079f3d...,56-60,100000-110000,a72c8d53d74f55014f43737a955aa7c44eb1d77f8bb456...,78948217,041972,3f815be9492cf03a39884dd8711af289dceea4880b6912...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9995,2021-09-11 21:25:57.781292318,xxxxrick,Bob Johnson,F,"Building 3, 321 Pine St, Lakeview, TX",jl***n@yahoo.com,1975-07-13 10:39:06.918666491,"654 Cedar St, Mountainview, CO",b938372c57043c3a59fdf0de13f476f6067f5bc3e492be...,af11adeabc5e0e25fabbbb9f6608bba4db41c1203e76fe...,56-60,>200000,ff53a793d9b84e8313f686b8e3cb4d7b9463e9cad14bed...,54203284,3254448,89153eb5916ccb74ded5022bb94a86d51affa7bd2033c4...
9996,9996,2020-08-13 09:38:18.046859497,xxxxinez,Bob Brown,F,"Room 12, 654 Cedar St, Mountainview, CO",re***z@gmail.com,1940-10-19 09:47:02.910630178,"210 Wood St, Greenfield, WI",b98dbaefcf87a986a968e01976a038098fc979b556f865...,89ab56b603812ae51a660de6ff4f60d4019d8a28f18f99...,31-35,<30000,1273f1eb6335e0e36690c85768be0ca5ba74ed76b6c339...,12871431,6157227,20e47399d1180c957a08658ee96b70d30c98508a5d8028...
9997,9997,2019-10-23 18:25:06.030813853,xxxxelps,Alice Harris,F,"Penthouse, 305 Ash St, Riverbend, WA",br***a@gmail.com,1994-12-25 09:36:45.398627243,"456 Elm St, Oakville, CA",484369430816ddc7967c0a66ee63b7c6fe68598c065e7b...,afe5939172ea55d43f7a4d1ac1a16d827fb45fe1f45a1f...,18-25,>200000,c77f20647df43e260f67210634a11ff6a2457461346490...,03782018,7628877,dd541b921d3c3c6ae3ba6896457b0a602fc1fd15cd0bad...
9998,9998,2020-08-14 07:06:51.864800998,xxxxavis,Frank Johnson,M,"Unit 5C, 789 Maple St, Rivertown, NY",wr***d@gmail.com,1961-03-07 05:15:13.243782645,"987 Birch St, Sunnyvale, FL",42b4fd7d24b6f4321f9720c80565452cfd422d45b2e815...,0679d0cdf79e1dc10d5cc60afa13551418ab12421c0395...,41-45,>200000,9212f64ab05fe06173339f56ddd952a509b84bf0365e21...,10964788,705192,088b75bdab822bdd07a06ae0d20fc3f97ac4847a0ce9c3...


### 9. **Data Export**
The fully anonymized dataset was saved as `mobile_customers_anon.csv` for downstream use, ensuring compliance with privacy requirements.


In [13]:
data.to_csv('mobile_customers_anon.csv', index=False)

## Final Outcome

This project demonstrates a robust workflow to anonymize sensitive customer information, protecting privacy while retaining analytic utility. Techniques like masking, hashing, binning, and adding noise effectively transformed identifiers without compromising data usability. 

The anonymized dataset is now suitable for sharing, regulatory compliance, and public research.

___

**This is part of a job simulation project proposed by Commonwealth Bank on The Forage platform.*
