<a href="https://colab.research.google.com/github/guilhermelaviola/BusinessIntelligenceAndBigDataArchitectureWithAppliedDataScience/blob/main/Class15.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Governance, Privacy, Security & Solutions**
Responsible data management in Big Data environments involves data governance, privacy, security, and cloud analytics. IT governance frameworks are usually aligned with regulations such as LGPD and GDPR ensure proper handling, integrity, and accountability of data. Key practices include anonymization, informed consent, data minimization, access control, auditing, and cybersecurity measures like encryption, VPNs, and firewalls. Real-world cases, such as Cambridge Analytica, highlight the risks of poor governance and privacy violations. The use of cloud tools like BigQuery can be introduced as a secure and efficient solution for managing, auditing, and analyzing large datasets while maintaining compliance and cost control.

In [15]:
# Importing all the necessary libraries and resources:
import hashlib
import logging
from cryptography.fernet import Fernet
import pandas as pd

## **Example: Data Management for Cloud Analytics**
Following is an illustrative Python example that ties the concepts together: responsible data management, privacy, security, auditing, and cloud analytics.


In [16]:
# Auditing: Access logging:
logging.basicConfig(
    filename='data_access.log',
    level=logging.INFO,
    format='%(asctime)s - %(message)s'
)

def log_access(user):
    logging.info(f'Dataset accessed by user: {user}')

log_access(user='data_analyst_01')

In [17]:
# Sample data:
data = {
    'name': ['Alice', 'James', 'Morten'],
    'email': ['alice@email.com', 'jimmy@email.com', 'morten@email.com'],
    'age': [25, 32, 41],
    'country': ['US', 'UK', 'DK']
}

df = pd.DataFrame(data)

In [18]:
# Data minimization (removing unnecessary columns):
df = df[['name', 'email', 'country']]

In [19]:
# Anonymization (hashing PII):
def anonymize(value):
    return hashlib.sha256(value.encode()).hexdigest()

df['user_id'] = df['email'].apply(anonymize)
df = df.drop(columns=['name', 'email'])

In [20]:
# Encryption:
key = Fernet.generate_key()
cipher = Fernet(key)

df['user_id_encrypted'] = df['user_id'].apply(
    lambda x: cipher.encrypt(x.encode()).decode()
)

df = df.drop(columns=['user_id'])

In [21]:
# Data is ready for Cloud Analytics (to be uploaded to BigQuery):
print('Secure, compliant dataset:')
print(df)

Secure, compliant dataset:
  country                                  user_id_encrypted
0      US  gAAAAABpWGvdMII--vfPl8w1we3VBqaUVgnxKPm5AuK48k...
1      UK  gAAAAABpWGvd3RlWJEeGnI2F3GocszrSixaskLKhjaLKtT...
2      DK  gAAAAABpWGvd0ESSbpmT8LDzMSoon0TBQJpw14jjKx714O...
