# Module 10 Part 3: Data Privacy and Security

# Introduction

With the distributed nature of big data systems, it has become more challenging to maintain privacy and security of user data. In this workbook, we will look at some of the legislation governing data usage and storage. It is the responsibility of the company or organization holding or processing others' data to follow all privacy rules and regulations.

This module consists of 3 parts:

- **Part 1** - Databases and SQL Basics
- **Part 2** - Tools for Moderately Sized Datasets
- **Part 3** - Data Privacy and Security

Each part is provided in a separate notebook file. The notebooks can be reviewed in any order.

# Learning Outcomes  

In this notebook, learners will:

* Understand the difference between privacy and security
* Examine data privacy requirements and regulations
* Be introduced to risk mitigation strategies

# Readings and Resources

We invite you to further supplement this notebook with the following recommended texts/resources:

- CSA (Cloud Security Alliance). (2013). CSA Releases the Expanded Top Ten Big Data Security & Privacy Challenges. https://cloudsecurityalliance.org/articles/csa-releases-the-expanded-top-ten-big-data-security-privacy-challenges/


- Government of Canada. (2017). Canada’s Anti-Spam Legislation (CASL). http://www.fightspam.gc.ca/eic/site/030.nsf/eng/home


- Government of Canada Justice Laws Website. (2019). PIPEDA (Personal Information Protection and Electronic Documents Act). https://laws-lois.justice.gc.ca/eng/acts/P-8.6/page-1.html


- Office of the Information and Privacy Commissioner of Alberta. (2019). PIPA (Personal Information Protection Act). https://www.oipc.ab.ca/legislation/pipa.aspx


- Office of the Privacy Commissioner of Canada. (2018). The Privacy Act. https://www.priv.gc.ca/en/privacy-topics/privacy-laws-in-canada/the-privacy-act/


- Queen’s Printer, British Columbia. (2019). Freedom of Information and Protection of Privacy Act. http://www.bclaws.ca/EPLibraries/bclaws_new/document/ID/freeside/96165_00


- Mather, T., Kumaraswamy, S., & Latif, S. (2009). Cloud Security and Privacy. An enterprise Perspective on Risks and Compliance. O’Reilly Media Inc.


- McMillan LLP. (2015). Bell Gets a Bad Rap for its RAP (Relevant Advertising Program). https://mcmillan.ca/insights/bell-gets-a-bad-rap-for-its-rap-relevant-advertising-program/


- Andrew Henderson. (2019). The 10 steps to achieving a data privacy compliance framework.  https://insights.redflaggroup.com/articles/10-steps-to-achieving-a-data-privacy-compliance-framework


- Juliana De Groot. (2019). What is the General Data Protection Regulation? Understanding & Complying with GDPR Requirements in 2019. https://digitalguardian.com/blog/what-gdpr-general-data-protection-regulation-understanding-and-complying-gdpr-data-protection

<h1>Table of Contents<span class="tocSkip"></span></h1>
<br>
<div class="toc">
<ul class="toc-item">
<li><span><a href="#Module-10-Part-3:-Data-Privacy-and-Security" data-toc-modified-id="Module-10-Part-3:-Data-Privacy-and-Security">Module 10 Part 3: Data Privacy and Security</a></span>
</li>
<li><span><a href="#Introduction" data-toc-modified-id="Introduction">Introduction</a></span>
</li>
<li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes">Learning Outcomes</a></span>
</li>
<li><span><a href="#Readings-and-Resources" data-toc-modified-id="Readings-and-Resources">Readings and Resources</a></span>
</li>
<li><span><a href="#Table-of-Contents" data-toc-modified-id="Table-of-Contents">Table of Contents</a></span>
<ul class="toc-item">
<li><span><a href="#What-Does-Privacy-and-Security-for-Big-Data-Mean?" data-toc-modified-id="What-Does-Privacy-and-Security-for-Big-Data-Mean?">What Does Privacy and Security for Big Data Mean?</a></span>
</li>
<li><span><a href="#Privacy-and-Big-Data" data-toc-modified-id="Privacy-and-Big-Data">Privacy and Big Data</a></span>
</li>
<li><span><a href="#Security-and-Big-Data" data-toc-modified-id="Security-and-Big-Data">Security and Big Data</a></span>
</li>
<li><span><a href="#Top-10-Security-and-Privacy-Challenges-in-Big-Data-Ecosystem" data-toc-modified-id="Top-10-Security-and-Privacy-Challenges-in-Big-Data-Ecosystem">Top 10 Security and Privacy Challenges in Big Data Ecosystem</a></span>
</li>
<li><span><a href="#Security-and-Privacy-Risk" data-toc-modified-id="Security-and-Privacy-Risk">Security and Privacy Risk</a></span>
</li>
<li><span><a href="#Privacy-Regulations-in-Canada" data-toc-modified-id="Privacy-Regulations-in-Canada">Privacy Regulations in Canada</a></span>
</li>
</ul>
</li>
<li><span><a href="#Risk-Mitigation-Strategies" data-toc-modified-id="Risk-Mitigation-Strategies">Risk Mitigation Strategies</a></span>
<ul class="toc-item">
<li><span><a href="#Governance" data-toc-modified-id="Governance">Governance</a></span>
</li>
</ul>
</li>
<li><span><a href="#References" data-toc-modified-id="References">References</a></span>
</li>
</ul>
</div>

## What Does Privacy and Security for Big Data Mean?

**Data privacy** focuses on the use and governance of individuals’ data.  For example, setting up policies to ensure that consumers’ personal information is being collected, shared and utilized in appropriate ways. Different aspects of privacy that must be considered include:

- What data should be collected?
- What are the permissible uses of the data?
- With whom might it be shared?
- How long should the data be retained?
- What granular access control model is appropriate?
- The consumer’s right to safeguard their information from any other party, and the laws and regulations requiring companies to protect data (Priyank et al., 2016)

**Security** concentrates on protecting data from malicious attacks and the misuse of data for profit. While security is fundamental for protecting data, it’s not sufficient for addressing privacy. Different aspects of security that must be considered include:

- Technical means to protect the data
- Protection against unauthorized access
- Confidentiality
- Integrity and availability of data
- Security controls that are put in place to manage who can access to information
- Ensuring the security system protects the enterprise/agency and that the security measures instill confidence that decisions are respected
- Awareness that it is difficult to have good privacy practices without proper data security

## Privacy and Big Data

In the big data world, it becomes important to protect the privacy of individuals from composable design risk, as well as general disclosure. Composability is a system design principle which focuses on building applications using different components. Since data is stored and maintained across many components, data security becomes more difficult.

While analyzing data, it is extremely important to maintain transparency to ensure that you have obtained the user's consent and to adhere to the agreed to uses of data. Big data systems face bigger challenges such as increased privacy risks due to granularity issues and difficulty in identifying personally identifiable information (PII). Due to these characteristics, big data systems become a target for identity thieves and other malicious actors.

Business analysts have to deal with analyzing different aspects of big data systems. Consider an e-commerce website and the associated challenges in identifying PII data. There can be a huge amount of data related to customer orders, transactions, locations, and so forth. For example, analyzing whether a user is good for credit is difficult to achieve without using PII data. Any time PII data is exposed, there is a risk that the data might be leaked.

A recent example of an alleged data breach was at Facebook and Cambridge Analytica, where the user profile data of around 87 million users was acquired by Cambridge Analytica from a U.K. based professor.

Recent legislation such as General Data Protection Regulation (GDPR) is gaining importance. GDPR is the primary law regulating companies to protect EU citizens' personal data.

## Security and Big Data

After data analysis is complete, an organization must protect the intellectual capital they derived from analytics. This data, which is likely more vulnerable and less controlled than in their native repositories becomes a risk. When information assets are concentrated, the fiduciary and due diligence risks are higher than normally encountered. New, open source and possibly less reliable technologies increase the risk of loss or non-availability.

## Top 10 Security and Privacy Challenges in Big Data Ecosystem

The Cloud Security Alliance (CSA)’s Big Data Working Group published its top 10 security and privacy challenges in 2013. They are:

1. Secure computations in distributed programming frameworks
2. Security practices for non-relational data stores
3. Secure data storage and transaction logs
4. End-point input validation/filtering
5. Real-time security monitoring
6. Scalable and composable privacy-preserving data mining and analytics
7. Cryptographically enforced data centric security
8. Granular access control
9. Granular audits
10. Data provenance

**Source**: CSA, 2013

As you can see from the list above, the top challenges revolve around carrying out computations in distributed programming frameworks in a secure way. Tools such as relational databases have been on the market for decades. By nature of being a non-distributed tool, it is easy to handle security whereas for non-relational data stores, since data is distributed, it becomes a challenge. Securing the data storage and transaction logs, end-point input validation/filtering, and monitoring are some other challenges. Data mining and analytics require data from various sources, so how to preserve privacy of the data in a scalable manner is another challenge. Using cryptographic tools to secure data, restricting access controls and detailed audits are extremely important. Data provenance, which refers to providing a historical record of data and its origins is important so as to have the ability to replay any dataflow which uses the data.

The diagram below shows how these challenges can be categorized.

![Top10ChallengesClassification.png](attachment:Top10ChallengesClassification.png)

**Source**: CSA, 2013

## Security and Privacy Risk

Information security, privacy, and internal controls are key considerations in the management of organizations due to financial, regulatory, reputational, and compliance issues. With respect to big data, all the standard risks and concerns apply, as well as additional ones arising from the concentration of data, organizational and technological factors, and the newness of the processes.

Security and privacy risk can arise in:

- Networking and communications
- Servers and operating systems
- Databases
- Applications
- Mobile devices
- Electronic data interchange (EDI) and electronic funds transfer (EFT) 
- Desktops/endpoints
- Provisioning and identify and access management (IAM)
- Logging
- Policies, procedures and enforcement
- Organizational design and segregation of duties
- Training and awareness
- Physical security

## Privacy Regulations in Canada

CASL (Canada’s anti-spam legislation) is a great example of anti-spam legislation for electronic messages. CASL requires recipients’ consent before sending any messages. It has no automatic penalties, but the CRTC (Canadian Radio-television and Telecommunications Commission) can enforce it with up to one million dollars in fines for individuals and ten million dollars in fines for businesses. It is a Canadian law, which means it can only be enforced against spammers operating in Canada.

Some examples of privacy regulations are listed below:

- Privacy Act
- PIPEDA
- PIPA (Alberta, British Columbia)
- Act Respecting the Protection of Personal Information in the Private Sector (Quebec)
- Ontario Personal Health Information Act

# Risk Mitigation Strategies

Risk mitigation strategies involve application of appropriate governance frameworks, internal control frameworks, and internal controls. These strategies focus on some of the below processes, but are not limited to:

- Policies and procedures
- Information security and privacy technology
- Design for information security, privacy
- Monitoring, audit and continuous auditing
- Organizational design and allocation of duties
- Intelligent hiring, training and enforcement

## Governance

> Data governance (DG) is the overall management of the availability, usability, integrity and security of data used in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures and a plan to execute those procedures. (TechTarget, 2019)

Governance provides a structure or framework through which:

- The objectives of the organization are set
- The means of attaining these objectives is determined 
- Performance measurement and monitoring is prescribed

Governance processes implement governance objectives. Some governance standards and legislation are listed below:

- Standards set by the Organization for Economical and Commercial Development (OECD)
- Sarbanes Oxley Act (SoX/SOA)
- Canada's Bill 198 (C-SoX)
- Enterprise Risk Management Integrated Framework, Committee of Sponsoring Organizations (COSO) of the Treadway Commission
- Control Objectives for Information Technology (COBIT)

For example, COBIT is an IT management framework developed by the ISACA to help businesses develop, organize and implement strategies around information management and governance (White, 2019).

**End of Module**

You have reached the end of this module.

If you have any questions, please reach out to your peers using the discussion boards. If you and your peers are unable to come to a suitable conclusion, do not hesitate to reach out to your instructor on the designated discussion board.

# References

- CSA (Cloud Security Alliance). (2013). CSA Releases the Expanded Top Ten Big Data Security & Privacy Challenges. https://cloudsecurityalliance.org/articles/csa-releases-the-expanded-top-ten-big-data-security-privacy-challenges/


- McMillan LLP. (2015). Bell Gets a Bad Rap for its RAP (Relevant Advertising Program). https://mcmillan.ca/insights/bell-gets-a-bad-rap-for-its-rap-relevant-advertising-program/


- Priyank J., Manasi G. & Nilay K. (2016). Big data privacy: a technological perspective and review. Journal of Big Data 2016 3(25). 26 November 2016. https://doi.org/10.1186/s40537-016-0059-y


- TechTarget. (2019). What is data governance and why does it matter?. https://searchdatamanagement.techtarget.com/definition/data-governance


- White. (2019). What is COBIT? A framework for alignment and governance. https://www.cio.com/article/3243684/methodology-frameworks/what-is-cobit-a-framework-for-alignment-and-governance.html