# Anonymisation
---

There a different ways how you can anonymise sensitive data. In this example we are using **Faker**, a Python library that allows you to generate realistic fake data for a variety of purposes. You can find the documentation **[here](https://faker.readthedocs.io/en/master/)**.

Faker allows you to:
- Generate test data\
Faker can be used to create large volumes of realistic test data for software development and testing purposes. Whether you need fake names, addresses, phone numbers, or other types of data, Faker can generate it for you.

- Anonymise data\
When sharing data for research or analysis purposes, privacy concerns may arise. Faker can be used to anonymise sensitive data by replacing personal information with fake but realistic alternatives. This allows you to share datasets without compromising individual privacy.

In [None]:
# remember to have the faker libraries installed
# !pip install faker

import pandas as pd
from faker import Faker
from datetime import datetime

faker = Faker('en_GB')

def anonymise_data(input_file, output_file):
    df = pd.read_csv(input_file)
    
    # Overwrite Name with Patient1..Patientx
    df['Name'] = ['Patient{}'.format(i) for i in range(1, len(df) + 1)]
    
    # Convert the Date_of_Birth column to datetime amd calculate the age based on the Date_of_Birth
    df['Date_of_Birth'] = pd.to_datetime(df['Date_of_Birth'])
    current_year = datetime.now().year
    df['Age'] = current_year - df['Date_of_Birth'].dt.year
    
    # Replace postcode with faker data
    df['Postcode'] = [faker.postcode() for _ in range(len(df))]
       
    # Export relevant columns to csv
    df = df[['Name','Gender','Age','Postcode']]
    df.to_csv(output_file, index=False)
    
anonymise_data('./datasets/hospital_data.csv','./datasets/anonymised_hospital_data.csv')