In [13]:
import pandas as pd

df = pd.read_csv("group05_http_input.csv")
df.head(10)

Unnamed: 0,msg_id,app_protocol,src_app,dst_app,message,timestamp
0,1,HTTP,client_browser,web_server,GET /index.html,0.001
1,2,HTTP,client_browser,web_server,GET /login,0.003
2,3,HTTP,client_browser,web_server,POST /login,0.005
3,4,HTTP,web_server,client_browser,200 OK,0.007
4,5,HTTP,web_server,client_browser,302 Redirect,0.009
5,6,HTTP,client_browser,web_server,GET /index.html,0.011
6,7,HTTP,client_browser,web_server,GET /login,0.013
7,8,HTTP,client_browser,web_server,POST /login,0.015
8,9,HTTP,web_server,client_browser,200 OK,0.017
9,10,HTTP,web_server,client_browser,302 Redirect,0.019


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   msg_id        100 non-null    int64  
 1   app_protocol  100 non-null    object 
 2   src_app       100 non-null    object 
 3   dst_app       100 non-null    object 
 4   message       100 non-null    object 
 5   timestamp     100 non-null    float64
dtypes: float64(1), int64(1), object(4)
memory usage: 4.8+ KB


In [24]:
df.describe(include="all")

Unnamed: 0,msg_id,app_protocol,src_app,dst_app,message,timestamp
count,100.0,100,100,100,100,100.0
unique,,1,2,2,5,
top,,HTTP,client_browser,web_server,GET /index.html,
freq,,100,60,60,20,
mean,50.5,,,,,0.1
std,29.011492,,,,,0.058023
min,1.0,,,,,0.001
25%,25.75,,,,,0.0505
50%,50.5,,,,,0.1
75%,75.25,,,,,0.1495


In [25]:
df.isnull().sum()

msg_id          0
app_protocol    0
src_app         0
dst_app         0
message         0
timestamp       0
dtype: int64

We verified the total number of HTTP messages in the dataset.

In [26]:
len(df)

100

In [27]:
df['timestamp'].describe()

count    100.000000
mean       0.100000
std        0.058023
min        0.001000
25%        0.050500
50%        0.100000
75%        0.149500
max        0.199000
Name: timestamp, dtype: float64

We analyzed the direction of communication between client and server
to ensure bidirectional HTTP traffic.

In [28]:
df['app_protocol'].value_counts()

app_protocol
HTTP    100
Name: count, dtype: int64

In [29]:
df[['src_app', 'dst_app']].value_counts()

src_app         dst_app       
client_browser  web_server        60
web_server      client_browser    40
Name: count, dtype: int64

We verified that the timestamps are ordered correctly, indicating
a valid chronological capture.

In [30]:
df['timestamp'].is_monotonic_increasing

True

In [31]:
df['message'].value_counts()

message
GET /index.html    20
GET /login         20
POST /login        20
200 OK             20
302 Redirect       20
Name: count, dtype: int64

In [32]:
df['app_protocol'].value_counts()

app_protocol
HTTP    100
Name: count, dtype: int64

In [33]:
pd.crosstab(df['src_app'], df['message'])

message,200 OK,302 Redirect,GET /index.html,GET /login,POST /login
src_app,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
client_browser,0,0,20,20,20
web_server,20,20,0,0,0


## Part 1 – Data Exploration

In this part we analyzed the HTTP communication log provided as a CSV file.

The following steps were performed:
- Loaded the CSV file into Jupyter using pandas
- Verified the number of messages
- Checked the communication direction between client and server
- Verified that timestamps are ordered correctly

In addition, network traffic was captured using Wireshark.
The capture demonstrates real network packets at the TCP/IP layers,
including DNS and application-level communication.      


### Bonus – Wireshark Observation

During the capture we observed background network traffic such as
DNS and multicast DNS (mDNS) packets.
This demonstrates how applications communicate over the network
even when no explicit user action is taken.