<h2><strong>UDP traffic</strong></h2>
<h3>Prediction by Linear Regression for UDP Throughput</h3>
<p> The aim of the test is to predict UDP Throughput of Network device №3, having the data of Network device №1 and №2</p>



At the beginning let's upload some classes and create a database.

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# create a database
dataUDP = pd.DataFrame([[1, 1500, 600, 50, 38], [1, 1500, 650, 55, 45], [1, 1500, 700, 60, 52], [1, 1500, 750, 65, 60], [1, 1500, 900, 70, 67], [2, 2600, 750, 35, 30], [2, 2600, 800, 40, 31], [2, 2600, 850, 45, 32], [2, 2600, 900, 50, 33], [2, 2600, 600, 30, 28]], columns=['Device', 'Clock frequency', 'UDP Throughput', 'Utilization of CPU', 'Temperature of CPU'])
dataUDP.to_csv('dataUDP.csv', index=False)
# upload a database
df = pd.read_csv('dataUDP.csv')
df.head(10)

Unnamed: 0,Device,Clock frequency,UDP Throughput,Utilization of CPU,Temperature of CPU
0,1,1500,600,50,38
1,1,1500,650,55,45
2,1,1500,700,60,52
3,1,1500,750,65,60
4,1,1500,900,70,67
5,2,2600,750,35,30
6,2,2600,800,40,31
7,2,2600,850,45,32
8,2,2600,900,50,33
9,2,2600,600,30,28


The statistical description of a database:

In [5]:
df.describe()

Unnamed: 0,Device,Clock frequency,UDP Throughput,Utilization of CPU,Temperature of CPU
count,10.0,10.0,10.0,10.0,10.0
mean,1.5,2050.0,750.0,50.0,41.6
std,0.527046,579.750904,113.038833,12.909944,13.801771
min,1.0,1500.0,600.0,30.0,28.0
25%,1.0,1500.0,662.5,41.25,31.25
50%,1.5,2050.0,750.0,50.0,35.5
75%,2.0,2600.0,837.5,58.75,50.25
max,2.0,2600.0,900.0,70.0,67.0


<p>The next step is to create a function of Linear Regression: <i>y = a<sub>0</sub> + a<sub>1</sub>*x<sub>1</sub> + a<sub>2</sub>*x<sub>2</sub> + ... + a<sub>n</sub>*x<sub>n</sub></i>

In [6]:
x = df[['Clock frequency', 'Utilization of CPU', 'Temperature of CPU']]
y = df['UDP Throughput']
regressor = LinearRegression().fit(x, y)

Now, we can compute coefficients <i>a<sub>i</sub></i>

In [7]:
coeff_df = pd.DataFrame(regressor.coef_, x.columns, columns=['ai'])
Intercept_df = pd.DataFrame(regressor.intercept_, ['Coef. of intercept'], columns=['a0'])
Intercept_df 

Unnamed: 0,a0
Coef. of intercept,-585.279786


In [8]:
coeff_df

Unnamed: 0,ai
Clock frequency,0.309987
Utilization of CPU,13.817938
Temperature of CPU,0.21419


As soon as coefficients have been computed it is necessary to verify our model by the determination rate R<sup>2</sup>

In [9]:
r_2 = regressor.score(x, y)
print(r_2)

0.9305092835108553


<p>Since <i>R<sup>2</sup> = 93%</i> => it is possible to consider this model quite accurate to predict UDP Throughput.</p>
<p>Substitute independent values x and coefficients a:</p>

In [15]:

Task_1 = {'Clock frequency': 2000, 'Utilization of CPU': 40, 'Temperature of CPU': 54}
y_pred_TP = regressor.intercept_ + Task_1['Clock frequency']*regressor.coef_[0] + Task_1['Utilization of CPU']*regressor.coef_[1] + Task_1['Temperature of CPU']*regressor.coef_[2]
print('Predicted UDP Throughput:', y_pred_TP)

Predicted UDP Throughput: 598.9772423025435


To be sure in our results let's make one more verification by predicting all measured Throughputs:

In [12]:
y_pred = regressor.predict(x)

In [13]:
    Throughput_comparison = pd.DataFrame({'Predicted Throughput, Mbit/s' : y_pred, 'Measured Throughput, Mbit/s' : y, 'Deviation, %' : abs(((y_pred-y)/y) * 100)})
    Throughput_comparison

Unnamed: 0,"Predicted Throughput, Mbit/s","Measured Throughput, Mbit/s","Deviation, %"
0,578.736278,600,3.543954
1,649.325301,650,0.1038
2,719.914324,700,2.844903
3,790.717537,750,5.429005
4,861.30656,900,4.299271
5,710.738956,750,5.234806
6,780.042838,800,2.494645
7,849.34672,850,0.076856
8,918.650602,900,2.072289
9,641.220884,600,6.870147


In [14]:
mean_deviation = np.round(np.mean(Throughput_comparison['Deviation, %']), 2)
print(f'Mean deviation of predicted values from measured ones is {mean_deviation}%')

Mean deviation of predicted values from measured ones is 3.3%


<p><strong>Conclusion:</strong> Thus, taking into account Mean deviation value and R<sup>2</sup> value,  the predicted UDP Throughput should be equal to <i>&asymp;600 Mbit/s</i>.</p>