<a href="https://colab.research.google.com/github/guilhermelaviola/CybersecurityProblemSolvingWithDataScience/blob/main/Class01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction to Cybersecurity and Data Science**
Data Science and Cybersecurity are distinct but interconnected fields. Data Science focuses on collecting, analyzing, and interpreting large datasets, while Cybersecurity protects systems, networks, and data from threats. Their intersection is essential in a digital world where detecting anomalies—unusual patterns that may signal attacks—is crucial. Cyber threats include malware, phishing, DoS attacks, and ransomware, which require security measures such as encryption, firewalls, antivirus tools, backups, updates, monitoring, and training. Data analysis is vital for identifying, predicting, and preventing attacks through proper data collection, cleaning, visualization, and interpretation. Real cyber incidents like WannaCry, Equifax, and SolarWinds highlight the consequences of poor security. Effective incident response involves identifying and containing attacks, assessing damage, eradicating threats, recovering systems, and communicating transparently. In Brazil, incidents involving personal data must be reported to the ANPD under LGPD. Ultimately, Data Science enhances threat detection, while Cybersecurity protects the data analyzed, ensuring business continuity.

In [1]:
# Importing all the necessary libraries and resources:
import numpy as np
from sklearn.ensemble import IsolationForest
import hashlib

## **Anomaly Detection with Z-score**

In [2]:
# Example network traffic data (requests per minute):
traffic = np.array([100, 102, 98, 105, 110, 99, 500])  # 500 is abnormal
mean = np.mean(traffic)
std = np.std(traffic)

# Calculating Z-scores:
z_scores = (traffic - mean) / std

# Mark anomalies (|z| > 3):
anomalies = np.where(np.abs(z_scores) > 3)

print('Z-scores:', z_scores)
print('Anomalies at positions:', anomalies)
print('Anomalous values:', traffic[anomalies])

Z-scores: [-0.42485746 -0.4104903  -0.43922462 -0.38893956 -0.35302166 -0.43204104
  2.44857465]
Anomalies at positions: (array([], dtype=int64),)
Anomalous values: []


## **Detecting Anomalies Using Isolation Forest**

In [3]:
# Example dataset (In this case, login attempts per user per hour):
data = np.array([[10], [12], [11], [13], [15], [300]]) # 300 is suspicious

model = IsolationForest(contamination=0.1, random_state=42)
model.fit(data)

predictions = model.predict(data)

# -1 = anomaly, 1 = normal:
anomalies = data[predictions == -1]

print('Predictions:', predictions)
print('Detected anomalies:', anomalies)

Predictions: [ 1  1  1  1  1 -1]
Detected anomalies: [[300]]


## **Hashing for Data Integrity**

In [4]:
def hash_file_contents(data: str) -> str:
    return hashlib.sha256(data.encode()).hexdigest()

data = 'Sensitive system configuration'
hash_value = hash_file_contents(data)

print('SHA-256 hash:', hash_value)

SHA-256 hash: 24de032096616f57b69ce0b10ac357c95e240d4f7dc253cda16a70d5ce906f12


## **Basic Example of Simulated Login Authentication**

In [5]:
def encrypt_password(password):
    return hashlib.sha256(password.encode()).hexdigest()

# Stored password hash:
stored_hash = encrypt_password('MySecurePassword123')

# User attempt:
user_input = 'MySecurePassword123'
input_hash = encrypt_password(user_input)

if input_hash == stored_hash:
    print('Access granted.')
else:
    print('Access denied.')

Access granted.


## **Log Monitoring for Suspicious IPs**

In [6]:
logs = [
    {'ip': '192.168.0.10', 'status': 200},
    {'ip': '192.168.0.10', 'status': 403},
    {'ip': '10.0.0.20', 'status': 403},
    {'ip': '10.0.0.20', 'status': 403},
]

# Flagging IPs with repeated failures:
threshold = 2
failed_logins = {}

for log in logs:
    if log['status'] == 403: # Unauthorized attempt
        ip = log['ip']
        failed_logins[ip] = failed_logins.get(ip, 0) + 1

suspicious_ips = [ip for ip, count in failed_logins.items() if count >= threshold]

print('Suspicious IPs:', suspicious_ips)

Suspicious IPs: ['10.0.0.20']
