# Anonymizing Patient IDs in a Pandas DataFrame

In this notebook, we will anonymize patient IDs in a Pandas DataFrame. 
We replace the original patient IDs with new anonymous IDs while maintaining a one-to-one mapping.

### 1. Importing Necessary Libraries


In [1]:
import pandas as pd
from src.settings  import ROOT_DIR
import os

### 2. Loading the Original DataFrame

Assuming the original DataFrame is named "data" with a column 'patient_id' containing original patient IDs.

In [9]:
data: pd.DataFrame = pd.read_pickle(ROOT_DIR / 'data' / 'raw' / 'imu_data_time_series.pkl')

### 3. Creating a Mapping Between Original and Anonymous IDs

We create a mapping dictionary (id_map) between original and anonymous IDs using enumerate and unique().

In [3]:
id_map = {id_original: nuevo_id for nuevo_id, id_original in enumerate(data['patient_id'].unique(), start=1)}

### 4. Applying the Mapping to Create Anonymous IDs

We add a new column 'anon_id' to the DataFrame using the mapping.
We reorder the columns to have 'anon_id' as the first column.

In [10]:
data['anon_id'] = data['patient_id'].map(id_map)
anon_data: pd.DataFrame = data[['anon_id'] + [col for col in data.columns if col != 'anon_id']]
anon_data = anon_data.drop(columns=['patient_id'])

### 5. Saving the Anonymized DataFrame and Mapping for Future Reference

We save the anonymized DataFrame to a pickle file and the mapping to a CSV file.

In [12]:
anon_data.to_pickle(os.path.join(ROOT_DIR, 'data', 'raw', 'anon_imu_data_time_series.pkl'))
id_map_df = pd.DataFrame(list(id_map.items()), columns=['patient_id', 'anon_id'])
id_map_df.to_csv(os.path.join(ROOT_DIR, 'data', 'raw', 'id_map.csv'), index=False)

Now, the 'anon_id' column is the first column in the DataFrame (anon_imu_data_time_series.pkl) and the mapping file (id_map.csv) remains consistent for future reference or applying the same anonymization to other datasets.

In [13]:
anon_data.head()

Unnamed: 0,anon_id,date_measure,time_stamp,imu_gyroX_right,imu_gyroY_right,imu_gyroZ_right,imu_accX_right,imu_accY_right,imu_accZ_right,imu_gyroX_left,...,imu_angleZ_spine,imu_angularX_left,imu_angularY_left,imu_angularZ_left,imu_angularX_right,imu_angularY_right,imu_angularZ_right,imu_angularX_spine,imu_angularY_spine,imu_angularZ_spine
0,1,,0.0,-1214.134378,-706.898961,-342.133395,8.902096,3.048967,-2.707115,-1379.007652,...,-80.931437,-10.786784,-0.30033,-0.366041,-10.036721,0.142365,-0.399272,-1.69761,-0.28367,-0.027008
1,1,,41.0,-528.943208,-481.033408,-829.426549,8.826445,3.244322,-2.375892,-444.959365,...,-146.977613,-7.828853,-0.231623,-0.163033,-6.197811,0.012145,-0.175718,5.924031,-0.119951,0.407286
2,1,,82.0,42.937833,-328.464835,-937.806656,8.769737,3.341177,-2.128276,443.239462,...,849.016319,-1.793131,-0.104697,0.184659,-0.136697,-0.158921,0.153815,3.962417,-0.201369,0.10835
3,1,,123.0,386.718612,-305.92821,-531.510961,8.782168,3.298552,-2.002942,1101.526862,...,1113.97992,6.683067,0.056346,0.567452,5.757335,-0.251148,0.422339,6.045454,0.162363,-0.06153
4,1,,164.0,399.637302,-438.99705,195.116809,8.929922,3.152229,-1.978899,1220.818122,...,963.51133,15.39478,0.213584,0.848198,9.446619,-0.194006,0.50762,-3.624151,0.095794,-0.360073
