# DATA ANONYMIZATION

## Introduction

This notebook show how to use the anonymization feature with an small example. We will start by setting up the notebook, then we will create our dummy dataset and its metadata, and finally we will `model` and `sample` the data, checking the diferencies in both the data and the internal state of the objects.

## Notebook preparation

In [2]:
import numpy as np
import pandas as pd

from sdv.data_navigator import DataNavigator
from sdv.modeler import Modeler
from sdv.sampler import Sampler

## Creating dataset and metadata

We are going to create a dataset of a single table containing three different columns : `primary_key`, `name` and `credit_card_number` and two different metadata, one that does use anonymization, and the other that it doesn't.

In [5]:
table_data = pd.DataFrame([
    {
        'primary_key': 1,
        'name': 'John',
        'credit_card_number': '1111222233334444'
    },
    {
        'primary_key': 2,
        'name': 'Mike',
        'credit_card_number': '0000999988887777'
    }
])
table_data

Unnamed: 0,credit_card_number,name,primary_key
0,1111222233334444,John,1
1,999988887777,Mike,2


Now we are going to generate the metadata. There are, a part from the anonymization parameters, two major differences with the other example metadata:

- As we intend to use a DataNavigator passing the already formatted `sdv.data_navigator.Table` instances, and it won't load from disk, it doesn't matter the `name` and `path` keys, as won't be used.

- As we intend to pass the metadata directly to the `DataNavigator` as a python dict, we will be using Python sintax, not JSON.

In [7]:
normal_table_metadata = {
    'fields': [
        {
            'name': 'name', 
            'type': 'categorical', 
        },
        {
            'name': 'credit_card', 
            'type': 'categorical', 
        },
        {
            'name': 'primary_key', 
            'subtype': 'integer', 
            'type': 'number', 
            'regex': '^[0-9]{10}$'
        },

    ],
    'headers': True,
    'name': None,
    'path': None,
    'primary_key': 'primary_key',
    'use': True
}
normal_table_metadata

{'fields': [{'name': 'name', 'type': 'categorical'},
  {'name': 'credit_card', 'type': 'categorical'},
  {'name': 'primary_key',
   'subtype': 'integer',
   'type': 'number',
   'regex': '^[0-9]{10}$'}],
 'headers': True,
 'name': 'DEMO_TABLE',
 'path': 'customers.csv',
 'primary_key': 'primary_key',
 'use': True}

In [8]:
anon_table_metadata = {
    'fields': [
        {
            'name': 'name', 
            'type': 'categorical',
            'pii': True,
            'pii_category': 'first_name'
        },
        {
            'name': 'credit_card', 
            'type': 'categorical',
            'pii': True,
            'pii_category': 'credit_card_number'
        },
        {
            'name': 'primary_key', 
            'subtype': 'integer', 
            'type': 'number', 
            'regex': '^[0-9]{10}$'
        },

    ],
    'headers': True,
    'name': None,
    'path': None,
    'primary_key': 'primary_key',
    'use': True
}
anon_table_metadata

{'fields': [{'name': 'name',
   'type': 'categorical',
   'pii': True,
   'pii_category': 'first_name'},
  {'name': 'credit_card',
   'type': 'categorical',
   'pii': True,
   'pii_category': 'credit_card_number'},
  {'name': 'primary_key',
   'subtype': 'integer',
   'type': 'number',
   'regex': '^[0-9]{10}$'}],
 'headers': True,
 'name': None,
 'path': None,
 'primary_key': 'primary_key',
 'use': True}

Now we will create all the needed objects in order to model and sample our dataset

In [None]:
normal_dn = DataNavigator