> /work/network_traffic_sample.csv

In [None]:
import pandas as pd

# Load the CSV file
file_path = '/work/network_traffic_sample.csv'
network_data = pd.read_csv(file_path)
network_data

It seems there was an issue with the file path or loading the file. Let me adjust and try again.

In [None]:
import pandas as pd

# Reload the CSV file after importing pandas
file_path = 'network_traffic_sample.csv'
network_data = pd.read_csv(file_path)
network_data

The network traffic data has been successfully loaded. Let me know if you need any specific analysis or processing.

> Visualize Protocol by BytesTransferred using a bar chart.

In [None]:
import matplotlib.pyplot as plt

# Group data by Protocol and sum BytesTransferred
protocol_data = network_data.groupby('Protocol')['BytesTransferred'].sum().sort_values()

# Plot the data
plt.figure(figsize=(10, 6))
protocol_data.plot(kind='bar', color='skyblue')
plt.title('Total Bytes Transferred by Protocol')
plt.xlabel('Protocol')
plt.ylabel('Bytes Transferred')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The bar chart visualizing the total Bytes Transferred by Protocol has been displayed.

The bar chart compares data usage across various network protocols. HTTP shows the highest data transfer, followed closely by SSH and DNS, suggesting these protocols are heavily used for web access, secure communications, and domain name resolution. FTP and SMTP follow with moderate levels of data transfer, typically associated with file transfers and email communication. HTTPS, despite being the secure version of HTTP, records the lowest data transfer, possibly indicating either fewer secure transactions or a focus on unencrypted traffic in this dataset. The visualization highlights how different protocols contribute to overall network traffic.

In [None]:
# Check the data types of the columns to identify the issue
network_data.dtypes

In [None]:
# Now that the 'Timestamp' column is cleaned and parsed, let's reattempt grouping and plotting.
time_protocol_data = network_data.groupby(['Timestamp', 'Protocol'])['BytesTransferred'].sum().unstack()

# Plot the data
time_protocol_data.plot(figsize=(12, 8), marker='o')
plt.title('Traffic Volume Over Time by Protocol')
plt.xlabel('Time')
plt.ylabel('Bytes Transferred')
plt.legend(title='Protocol')
plt.grid(True)
plt.tight_layout()
plt.show()

The line chart visualizing traffic volume over time by protocol has been successfully displayed.

The chart illustrates how network traffic volume (in bytes transferred) varies over time across six different protocols: DNS, FTP, HTTP, HTTPS, SMTP, and SSH. Each protocol shows fluctuating traffic levels, indicating dynamic usage patterns over time.

    HTTP and FTP frequently spike, suggesting bursts of activity, possibly from web browsing or file transfers.

    DNS and SMTP show relatively consistent activity but with occasional drops and peaks.

    HTTPS appears less frequent but has some high-volume moments, indicating sporadic but significant secure web activity.

    SSH traffic maintains moderate volume across the timeline, which may reflect consistent remote access or administrative actions.

The timestamp labels on the x-axis are densely packed, suggesting high-frequency logging, but could benefit from formatting or rotation for clarity.

In [None]:
# Count the occurrences of each Status (ALLOWED vs. BLOCKED)
status_counts = network_data['Status'].value_counts()

# Plot the pie chart
plt.figure(figsize=(8, 8))
status_counts.plot(kind='pie', autopct='%1.1f%%', startangle=90, colors=['lightgreen', 'salmon'])
plt.title('Proportion of Traffic Events: ALLOWED vs. BLOCKED')
plt.ylabel('')  # Remove the y-label for better aesthetics
plt.show()

The pie chart visualizing the proportion of traffic events (ALLOWED vs. BLOCKED) has been displayed. Indicating that 86.0% are ALLOWED and 14.0% are BLOCKED

In [None]:
# Filter data for BLOCKED traffic and group by SourceIP to count occurrences
blocked_traffic = network_data[network_data['Status'] == 'BLOCKED']
top_blocked_ips = blocked_traffic['SourceIP'].value_counts().head(5)

# Create a DataFrame for better visualization
top_blocked_ips_df = top_blocked_ips.reset_index()
top_blocked_ips_df.columns = ['SourceIP', 'BlockedCount']
top_blocked_ips_df

The table displaying the top 5 Source IPs by BLOCKED traffic has been created. Let me know if you need further analysis or adjustments.

In [None]:
import matplotlib.pyplot as plt

# Group data by Country and sum BytesTransferred
country_traffic = network_data.groupby('Country')['BytesTransferred'].sum().sort_values(ascending=False)

# Plot the data
plt.figure(figsize=(12, 8))
country_traffic.plot(kind='bar', color='orange')
plt.title('Total Bytes Transferred by Country')
plt.xlabel('Country')
plt.ylabel('Bytes Transferred')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The bar chart visualizing the total Bytes Transferred by Country has been displayed. Let me know if you need further analysis or adjustments.

In [None]:
# Fix the timestamp format issue by replacing special characters
network_data['Timestamp'] = network_data['Timestamp'].str.replace('+AC0-', '-')

# Convert 'Timestamp' to datetime and extract hour
network_data['Hour'] = pd.to_datetime(network_data['Timestamp']).dt.hour

# Group by hour and sum BytesTransferred
traffic_by_hour = network_data.groupby('Hour')['BytesTransferred'].sum()

# Plot the data
plt.figure(figsize=(12, 6))
traffic_by_hour.plot(kind='bar', color='purple')
plt.title('Traffic Volume by Hour of Day')
plt.xlabel('Hour of Day')
plt.ylabel('Bytes Transferred')
plt.xticks(rotation=0)
plt.grid(True)
plt.tight_layout()
plt.show()

The analysis of traffic events by time of day has been visualized, identifying peak usage hours.

The bar chart shows traffic volume by hour of day, with peak usage at 9 AM, 10 AM, and 11 AM, each around 20,000 bytes transferred. Traffic decreases slightly at 8 AM (approximately 15,000 bytes) and 12 PM (around 18,000 bytes).