### What You'll Build

* A table derived from sample data containing fake (but realistic) PII
* RBAC protections for that table
* Dynamic Data Masking policies to protect the PII table
* Row Access Policies to protect the PII table

Normally, we would create multiple users, each assigned to a different role. 

To keep things simple and to avoid having a lot of users and passwords exposed in this tutorial, we're just going to create the roles and assign them all to our user.

This way, we can switch between roles and see how they work without logging out and logging back in.

In [None]:
use warehouse PII_WH;
use database PII_EXAMPLE;
use schema DATA;

In [None]:
-- create users are roles for the demo
use role accountadmin;

create or replace role admin;
create or replace  role marketing;
create or replace  role infosec;
create or replace  role executive;


REVOKE ROLE admin FROM USER bharris;

In [None]:
from snowflake.snowpark.context import get_active_session
session = get_active_session()

user_name = session.sql("SELECT current_user()").collect()[0][0]

In [None]:
grant role admin to user {{user_name}};
grant role marketing to user {{user_name}};
grant role infosec to user {{user_name}};
grant role executive to user {{user_name}};

And we can see what our user has assigned to them.

In [None]:
show grants to user {{user_name}};

At this point, we have access to all the roles, but they're not really doing much.

Lets create some fake data so we can start to work with roles.

In [None]:
-- Here we grab 200 rows of fake but realistic PII from the sample data in the TPCDS testing set to use for our walkthrough. 
-- Also note that the C_BIRTH_COUNTRY and OPTIN columns will be populated at random with one of three values.
use schema PII_EXAMPLE.DATA;

create or replace table CUSTOMERS as (
    SELECT 
        a.C_SALUTATION,
        a.C_FIRST_NAME,
        a.C_LAST_NAME,
        CASE UNIFORM(1,3,RANDOM()) WHEN 1 THEN 'UK' WHEN 2 THEN 'US' ELSE 'FRANCE' END AS C_BIRTH_COUNTRY,
        a.C_EMAIL_ADDRESS,
        b.CD_GENDER,
        b.CD_CREDIT_RATING,
        CASE UNIFORM(1,3,RANDOM()) WHEN 1 THEN 'YES' WHEN 2 THEN 'NO' ELSE NULL END AS OPTIN
    FROM 
        SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CUSTOMER a,
        SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CUSTOMER_DEMOGRAPHICS b
    WHERE
        a.C_CUSTOMER_SK = b.CD_DEMO_SK and 
        a.C_SALUTATION is not null and
        a.C_FIRST_NAME is not null and
        a.C_LAST_NAME is not null and
        a.C_BIRTH_COUNTRY is not null and
        a.C_EMAIL_ADDRESS is not null and 
        b.CD_GENDER is not null and
        b.CD_CREDIT_RATING is not null
    LIMIT 200 )
;

grant ownership on table PII_EXAMPLE.DATA.CUSTOMERS to role admin;

Now we're going to grant rights to roles for our table above.

In [None]:
use role accountadmin;

-- grant rights to roles for the demo objects
grant usage on database PII_EXAMPLE to role marketing;
grant usage on database PII_EXAMPLE to role executive;
grant usage on database PII_EXAMPLE to role infosec;
grant usage on database PII_EXAMPLE to role admin;

grant usage on schema PII_EXAMPLE.DATA to role admin;
grant usage on schema PII_EXAMPLE.DATA to role marketing;
grant usage on schema PII_EXAMPLE.DATA to role executive;
grant usage on schema PII_EXAMPLE.DATA to role infosec;

grant select on table PII_EXAMPLE.DATA.CUSTOMERS to role marketing;
grant select on table PII_EXAMPLE.DATA.CUSTOMERS to role executive;


-- We also have to give permissions to our warehouse to these roles

grant usage on warehouse PII_WH to role admin;
grant usage on warehouse PII_WH to role marketing;
grant usage on warehouse PII_WH to role infosec;
grant usage on warehouse PII_WH to role executive;

In [None]:
-- show that the current role (accountadmin) cannot currently see the data
use role accountadmin;
select * from PII_EXAMPLE.DATA.CUSTOMERS limit 50;

In [None]:
-- But our marketing role can see it 
use role marketing;

use warehouse PII_WH;
select * from PII_EXAMPLE.DATA.CUSTOMERS limit 50;

In [None]:
-- And our infosec role can't see it.
use role infosec;

use warehouse PII_WH;
select * from PII_EXAMPLE.DATA.CUSTOMERS limit 50;

Now we start putting controls in place on the data itself. To do this, we need to give the rights to create and apply policies. Since these rights can be granted on their own to encourage separation of duties, we will have the admin who owns the data objects grant policy control to the fictional infosec group.

In [None]:
use role accountadmin;

grant CREATE ROW ACCESS POLICY on schema PII_EXAMPLE.DATA to role infosec;
create or replace table PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING (
  role_name varchar,
  national_letter varchar,
  allowed varchar
);
grant ownership on table PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING to role infosec;
grant create masking policy on schema PII_EXAMPLE.DATA to role infosec;

The first control will apply is a row access policy to ensure only authorized people get any information at all. The most common form this policy takes is a policy that reads from a table where the rules are maintained - a mapping table.

In [None]:
use role infosec;

insert into PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING
  values
  ('ACCOUTADMIN','','FALSE'),
  ('ADMIN','','FALSE'),
  ('MARKETING','UK','TRUE'),
  ('INFOSEC','','FALSE'),
  ('EXECUTIVE','FRANCE','TRUE');

While we do have the outline of our mapping policy, it isn't being applied. to do that, we're going to actually create the policy.


In [None]:
use role infosec;

create or replace row access policy PII_EXAMPLE.DATA.CONTROL_BY_COUNTRY as (C_BIRTH_COUNTRY varchar) returns boolean ->
  case
      -- check for full read access
      when exists ( 
            select 1 from PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING
              where role_name = current_role()
                and C_BIRTH_COUNTRY like national_letter
                and allowed = 'TRUE'
          ) then true
      -- always default deny
      else false
  end
;

Now we grant the rights to apply this policy to the admin for the data set. It's normal for the security and governance folks to maintain the policy logic, while the people closer to the data apply the policies since they are aware of in which tables the data which needs protecting lives. Of course, this is even better when automated through governance and security solutions that takes the human element out entirely.

In [None]:
use role accountadmin;
use warehouse PII_WH;

grant apply on row access policy PII_EXAMPLE.DATA.CONTROL_BY_COUNTRY to role admin;

-- start doing this as the admin role
use role admin;
alter table PII_EXAMPLE.DATA.CUSTOMERS add row access policy PII_EXAMPLE.DATA.CONTROL_BY_COUNTRY on (C_BIRTH_COUNTRY);

In [None]:
-- Now lets see who can see this data

-- insert into PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING
--   values
--   ('ACCOUTADMIN','','FALSE'),
--   ('ADMIN','','FALSE'),
--   ('MARKETING','UK','TRUE'),
--   ('INFOSEC','','FALSE'),
--   ('EXECUTIVE','FRANCE','TRUE');

use role marketing;
select * from  PII_EXAMPLE.DATA.CUSTOMERS limit 50;

In [None]:
-- Now lets see who can see this data

-- insert into PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING
--   values
--   ('ACCOUTADMIN','','FALSE'),
--   ('ADMIN','','FALSE'),
--   ('MARKETING','UK','TRUE'),
--   ('INFOSEC','','FALSE'),
--   ('EXECUTIVE','FRANCE','TRUE');

use role admin;
select * from  PII_EXAMPLE.DATA.CUSTOMERS limit 50;

In [None]:
-- Now lets see who can see this data

-- insert into PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING
--   values
--   ('ACCOUTADMIN','','FALSE'),
--   ('ADMIN','','FALSE'),
--   ('MARKETING','UK','TRUE'),
--   ('INFOSEC','','FALSE'),
--   ('EXECUTIVE','FRANCE','TRUE');

use role executive;
select * from  PII_EXAMPLE.DATA.CUSTOMERS limit 50;

In [None]:
-- Now lets see who can see this data

-- insert into PII_EXAMPLE.DATA.ROW_ACCESS_MAPPING
--   values
--   ('ACCOUTADMIN','','FALSE'),
--   ('ADMIN','','FALSE'),
--   ('MARKETING','UK','TRUE'),
--   ('INFOSEC','','FALSE'),
--   ('EXECUTIVE','FRANCE','TRUE');

use role marketing;
select * from  PII_EXAMPLE.DATA.CUSTOMERS where C_BIRTH_COUNTRY = 'FRANCE' limit 50;

Now we will lock down things at the column level. In this step, we will use conditional masking

In [None]:
use role infosec;

-- conditional masking version
create masking policy PII_EXAMPLE.DATA.HIDE_OPTOUTS as
(col_value varchar, optin string) returns varchar ->
  case
    when optin = 'YES' then col_value
    else '***MASKED***'
  end;

In [None]:
-- Grant the rights to apply the policy (replace this with the alternate policy if that's what you've used).

use role accountadmin;

grant apply on masking policy PII_EXAMPLE.DATA.HIDE_OPTOUTS to role admin;

In [None]:
-- Apply the policy to the table.

use role admin;
alter table PII_EXAMPLE.DATA.CUSTOMERS modify column C_EMAIL_ADDRESS
    set masking policy PII_EXAMPLE.DATA.HIDE_OPTOUTS using (C_EMAIL_ADDRESS, OPTIN);

In [None]:
use role marketing;
select * from PII_EXAMPLE.DATA.CUSTOMERS limit 50;

Now we will use another feature, Object Tagging. This allows you to apply important metadata right at the level where the information is stored. 

First we need to grant the rights to use the tagging feature to our users' roles.

In [None]:
use role accountadmin;

grant create tag on schema PII_EXAMPLE.DATA to role infosec;
grant apply tag on account to role admin;

In [None]:
-- The infosec role will create the tags which can be applied. Tags themselves are best managed centrally to avoid namespace explosion.
use role infosec;

create tag PII_EXAMPLE.DATA.GDPR;
create tag PII_EXAMPLE.DATA.FROM_SOURCE;

Like the policies, the tag values will be applied to specific information by the admins who are closer to the actual data. Here we apply the tags and set their values for these objects.

In [None]:
use role admin;

alter table PII_EXAMPLE.DATA.CUSTOMERS set tag 
    PII_EXAMPLE.DATA.GDPR = 'TRUE', 
    PII_EXAMPLE.DATA.FROM_SOURCE = 'HARRIS';

select system$get_tag('PII_EXAMPLE.DATA.FROM_SOURCE', 'PII_EXAMPLE.DATA.CUSTOMERS', 'table') as TAGGING;

Here, we can also take a look at this information via Snowsight.

### What is Data Classification?
Data classification, also called entity recognition or PII detection, is the process of labeling data with its semantic type after inferring the meaning of the data. For example, you may have a table named customers with a field named email: after data classification, that field could be labeled with a semantic category (email address) or a privacy category (direct identifier).

Data Classification is implemented through a single new function, EXTRACT_SEMANTIC_CATEGORIES, and a new stored procedure, ASSOCIATE_SEMANTIC_CATEGORY_TAGS. The function takes an object (a table, view, etc.) and analyzes up to 10,000 cells in each field before returning a single JSON object with the classification results and additional result metadata. The stored procedure parses the JSON object returned by the function and creates a tag with the semantic and privacy category on the column in the original object.

### EXTRACT_SEMANTIC_CATEGORIES
Returns a set of categories (semantic and privacy) for each supported column in the specified table or view. To return the categories for a column, the column must use a data type that supports classification and does not contain all NULL values.

The categories are derived from the metadata and data contained in the columns, as well as the metadata about the columns and data. The privacy categories rely on the generated semantic categories, if any.

In [None]:
-- Now we will use another feature, Classification. 
-- This will examine the information's contents and attempt to use out of the box intelligence to classify the information into categories.

-- Run the classification function on the table we've been using.
use role admin;
select * from PII_EXAMPLE.DATA.CUSTOMERS;

So, small problem here, our admin role can't actually see the data because of the row level security that we applied earlier.

I'm going to go ahead and remove our row level security for now.

In [None]:
use role admin;

alter table PII_EXAMPLE.DATA.CUSTOMERS drop row access policy PII_EXAMPLE.DATA.CONTROL_BY_COUNTRY;

In [None]:
-- Now we can try again
-- Now we will use another feature, Classification. 
-- This will examine the information's contents and attempt to use out of the box intelligence to classify the information into categories.

-- Run the classification function on the table we've been using.
use role admin;
select extract_semantic_categories('PII_EXAMPLE.DATA.CUSTOMERS');


In [None]:
select
    t.key::varchar as column_name,
    t.value as val,
    t.value:"recommendation":"privacy_category"::varchar as privacy_category,
    t.value:"recommendation":"semantic_category"::varchar as semantic_category,
    t.value:"recommendation":"coverage"::number(10,2) as probability,
    t.value:"alternates"::variant as alternates
from table(
        flatten(
            extract_semantic_categories(
                'PII_EXAMPLE.DATA.CUSTOMERS'
            )::variant
        )
    ) as t;

To assign the system tags automatically, call the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure. Note:

The fully-qualified name of the table and the function from the first step are arguments for the stored procedure.

The stored procedure reruns the EXTRACT_SEMANTIC_CATEGORIES function. If you want to preserve the results from the first step or make changes, save the results to a table prior to calling the stored procedure.

In [None]:
CALL ASSOCIATE_SEMANTIC_CATEGORY_TAGS(
   'PII_EXAMPLE.DATA.CUSTOMERS',
    EXTRACT_SEMANTIC_CATEGORIES('PII_EXAMPLE.DATA.CUSTOMERS')
);

In [None]:
select *
  from table(PII_EXAMPLE.information_schema.tag_references_all_columns('CUSTOMERS', 'table'));