# Personal Information
Name: **Mees Apeldoorn**

StudentID: **13224069**

Email: [**Mees.Apeldoorn@student.uva.nl**](youremail@student.uva.nl)

Submitted on: **21.03.2024**

# Data

### The Next Generation: a 5G Dataset with Channel and Context Metrics

 This dataset is generated from two mobility patterns (static and car), and across two application patterns(video streaming and file download). The dataset is composed of client-side cellular key performance indicators (KPIs) comprised of channel-related metrics, context-related metrics, cell-related metrics and throughput information. These metrics are generated from a well-known non-rooted Android network monitoring application, G-NetTrack Pro. To the best of our knowledge, this is the first publicly available dataset that contains throughput, channel and context information for 5G networks.

dataset:
 https://github.com/uccmisl/5Gdataset

paper:
https://cora.ucc.ie/items/4574b0c6-f441-4323-a0dd-5deda43453ec

### A 4G LTE dataset with channel and context metrics

A 4G trace dataset composed of client-side cellular key performance indicators (KPIs) collected from two major Irish mobile operators, across different mobility patterns (static, pedestrian, car, bus and train). The 4G trace dataset contains 135 traces, with an average duration of fifteen minutes per trace, with viewable throughput ranging from 0 to 173 Mbit/s at a granularity of one sample per second. Our traces are generated from a well-known non-rooted Android network monitoring application, GNetTrack Pro. This tool enables capturing various channel related KPIs, context-related metrics, downlink and uplink throughput, and also cell-related information.

dataset:
https://www.kaggle.com/datasets/aeryss/lte-dataset

paper:
https://cora.ucc.ie/items/b86aca50-bc76-4c68-8626-f50a43df944e

# Data Description


In [4]:
# Imports
import datetime
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import re
import scipy
from directory_tree import display_tree
%matplotlib inline
plt.style.use('bmh')

### Data Loading

In [11]:
display_tree('Data/',max_depth=2)

Data/
├── 4G-LTE-dataset/
│   ├── bus/
│   ├── car/
│   ├── pedestrian/
│   ├── static/
│   └── train/
└── 5G-production-dataset/
    ├── Amazon_Prime/
    ├── Download/
    └── Netflix/


We can combine all seperate files in this 'Data' directory into one big CSV, since ther share a lot of columns

In [35]:
def combine_files_in_directory(directory):
    dfs = []

    for filename in os.listdir(directory):
        file_path = os.path.join(directory, filename)
        
        if os.path.isfile(file_path):
            df = pd.read_csv(file_path)  # Adjust parameters if files are not CSV
            # Add a column to identify the type of data (based on the directory name)
            df['Type'] = os.path.basename(directory)
            dfs.append(df)
            
        # Check if the item is a directory
        elif os.path.isdir(file_path):
            dfs.extend(combine_files_in_directory(file_path))

    return dfs



base_dir = 'Data'
# Combine files from all subdirectories in the base directory
combined_dfs = combine_files_in_directory(base_dir)

# Concatenate all DataFrames into one
combined_df = pd.concat(combined_dfs, ignore_index=True)
combined_df

Unnamed: 0.1,Timestamp,Longitude,Latitude,Speed,Operatorname,CellID,NetworkMode,RSRP,RSRQ,SNR,...,Unnamed: 0,PINGAVG,PINGMIN,PINGMAX,PINGSTDEV,PINGLOSS,CELLHEX,NODEHEX,LACHEX,RAWCELLID
0,2017.11.30_16.48.26,-8.501373,51.893359,0.0,A,2.0,LTE,-102.0,-12,10.0,...,,,,,,,,,,
1,2017.11.30_16.48.26,-8.501291,51.893462,1.0,A,2.0,LTE,-102.0,-12,10.0,...,,,,,,,,,,
2,2017.11.30_16.48.27,-8.501291,51.893462,1.0,A,2.0,LTE,-102.0,-12,7.0,...,,,,,,,,,,
3,2017.11.30_16.48.28,-8.501291,51.893462,1.0,A,2.0,LTE,-102.0,-12,7.0,...,,,,,,,,,,
4,2017.11.30_16.48.29,-8.501291,51.893462,1.0,A,2.0,LTE,-102.0,-13,8.0,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363229,2019.12.16_13.36.25,-8.394601,51.886139,0.0,B,11.0,5G,-101.0,-13,-7.0,...,,-,-,-,-,-,B,A4DF,9CBA,10805003.0
363230,2019.12.16_13.36.27,-8.394601,51.886139,0.0,B,11.0,5G,-101.0,-13,-7.0,...,,-,-,-,-,-,B,A4DF,9CBA,10805003.0
363231,2019.12.16_13.36.28,-8.394601,51.886139,0.0,B,11.0,5G,-103.0,-12,5.0,...,,-,-,-,-,-,B,A4DF,9CBA,10805003.0
363232,2019.12.16_13.36.29,-8.394601,51.886139,0.0,B,11.0,5G,-103.0,-12,5.0,...,,-,-,-,-,-,B,A4DF,9CBA,10805003.0


In [38]:
combined_df['Type']

0                            bus
1                            bus
2                            bus
3                            bus
4                            bus
                   ...          
363229    Season3-StrangerThings
363230    Season3-StrangerThings
363231    Season3-StrangerThings
363232    Season3-StrangerThings
363233    Season3-StrangerThings
Name: Type, Length: 363234, dtype: object

In [36]:
#Select only relavent columns
#RSRP, RSRQ, SNR, CQI, RSSI are all key performance metrics (KPI)
combined_df = combined_df[['Timestamp', 'Speed', 'CellID', 'NetworkMode', 'RSRP', 'RSRQ',
               'SNR', 'CQI', 'RSSI', 'DL_bitrate', 'State', 'download_type', 'trace_mobility', 'TraceID']]

# force null values for empty entries
combined_df = combined_df.applymap(lambda x: x if x != "-" else np.NaN) 

#Convert types for all collumns
combined_df = combined_df.convert_dtypes()
combined_df['NetworkMode'] = combined_df['NetworkMode'].astype("string")
combined_df[['RSRP', 'RSRQ', 'SNR', 'CQI', 'RSSI', 'DL_bitrate']] =\
    combined_df[['RSRP', 'RSRQ', 'SNR', 'CQI', 'RSSI', 'DL_bitrate']].apply(pd.to_numeric)

# combine dataset naming for driving
combined_df.loc[combined_df['trace_mobility'] == 'driving', 'trace_mobility'] = 'car' 

#turn netflix and amazon_prime into a single streaming category
combined_df.loc[combined_df['download_type'].isin(['netflix', 'amazon_prime']), 'download_type'] = 'streaming'

combined_df

KeyError: "['download_type', 'trace_mobility', 'TraceID'] not in index"

### Analysis 2: 

In [None]:
# ...

### Analysis n:

In [None]:
# ...