## This looks good! first , we will try to apply our models and algorithms and see how they perform on the dataset as it is and then we will apply SMOTE on our target varibales Y classes to see if it will improve the performance of our models and algorithms or not! as an experiment !

<u> moving on to our models and algorithms: our main goals and potential experiments </u>

some potential goals I would pursue are:

1. Predict Video Streaming QoE
- Build machine learning models to predict Mean Opinion Score (MOS) based on the various influence factors
- Identify the most important factors impacting QoE and their thresholds
- Evaluate different ML algorithms and compare performance

2. Analyze Impact of Factors on QoE
- Statistical analysis to quantify the correlation and effect size of each factor (video bitrate, buffering, etc.) on MOS
- Identify interactions between factors that influence QoE

3. Compare QoE Across Conditions
- Benchmark QoE for different network types (4G vs 3G), video types, devices
- Analyze if some conditions make QoE more sensitive to certain factors

4. Segment Users by QoE
- Apply clustering techniques to group users based on their QoE patterns
- Develop user profiles most likely to perceive poor/good QoE

Before applying any algorithms, anyone should start with thorough exploratory data analysis to understand the distribution of the data, correlation between variables, underlying trends and relationships which I have done above. This would shape the specific direction and methodology for modeling and analysis to meet the end goals. The key is tying the analysis back to providing meaningful, actionable insights into improving quality of experience.

In [64]:
# let's train a model on the train set and evaluate it on the test set

#let's import RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor

#let's instantiate the model
model = RandomForestRegressor(random_state=42)

#let's separate the target variable from the train set
X_train = train.drop(['MOS'],axis=1)
y_train = train['MOS']

#let's separate the target variable from the test set
X_test = test.drop(['MOS'],axis=1)
y_test = test['MOS']


#let's fit the model on the train set
model.fit(X_train,y_train)



In [65]:
#let's predict on the test set
y_pred = model.predict(X_test)

#let's see the first 5 predictions
y_pred[:5]

array([1.35, 3.73, 4.  , 4.  , 3.91])

# we will start with the simplest model first and then move on to more complex models and algorithms

In [66]:
#let's see the first 5 actual values
y_test[:5]

1493    1
1156    4
1253    4
561     4
1098    4
Name: MOS, dtype: int64

In [67]:
#let's see the shape of the predictions
y_pred.shape

(309,)

In [68]:
#let's see the shape of the actual values
y_test.shape

(309,)

In [69]:
#let's see the distribution of the predictions
pd.Series(y_pred).value_counts()


4.000000    54
5.000000    33
3.990000    12
4.020000    10
4.010000     6
            ..
2.952500     1
2.728333     1
3.220000     1
2.070833     1
4.350000     1
Length: 160, dtype: int64

In [70]:
# let's evaluate the model
#let's import mean_absolute_error
from sklearn.metrics import mean_absolute_error

#let's evaluate the model on the test set
mean_absolute_error(y_test,y_pred)



0.2712556015541452

In [71]:
#let's see our model accuracy
print("Our model accuracy is: {}%".format(round(model.score(X_test,y_test)*100,2)))

Our model accuracy is: 81.14%


## the accuracy of the model is 81.14% which is not bad at all! it's a good start! we will try to improve it by using more complex models and algorithms

In [72]:
# let's try logistic regression as well since we want the model to classify the predictions labels into 5 classes rather than giving us a continuous value

#let's import LogisticRegression
from sklearn.linear_model import LogisticRegression

#let's instantiate the model
model = LogisticRegression(random_state=42)

#let's fit the model on the train set
model.fit(X_train,y_train)

In [73]:
#let's predict on the test set
y_pred = model.predict(X_test)

#let's see the first 5 predictions
y_pred[:5]

array([3, 4, 4, 4, 4])

In [74]:
#let's see the first 5 actual values
y_test[:5]

1493    1
1156    4
1253    4
561     4
1098    4
Name: MOS, dtype: int64

In [75]:
#let's see the shape of the predictions
y_pred.shape

(309,)

In [76]:
#let's see the shape of the actual values
y_test.shape

(309,)

In [77]:
#let's see the distribution of the predictions
pd.Series(y_pred).value_counts()

4    193
5     60
3     29
2     15
1     12
dtype: int64

In [78]:
#let's evaluate the model on the test set
mean_absolute_error(y_test,y_pred)


0.3592233009708738

In [79]:
#let's see our model accuracy
print("Our model accuracy is: {}%".format(round(model.score(X_test,y_test)*100,2)))

Our model accuracy is: 69.9%


# see that our model accuracy has decreased to 69.9 which is perfect! now we don't even have to use SMOTE(downsample or upsample our target classes) to balance the classes!

In [82]:
# if i have to apply smote , i will do it here

#let's import SMOTE
from imblearn.over_sampling import SMOTE

#let's instantiate SMOTE
smote = SMOTE(random_state=42)

#let's fit smote on the train set
oversample = SMOTE()
X_train, y_train= oversample.fit_resample(X_train, y_train)

#let's see the shape of the train set
X_train.shape

#let's see the distribution of the target variable
pd.Series(y_train).value_counts()






4    622
1    622
3    622
5    622
2    622
Name: MOS, dtype: int64

In [85]:
#this has balanced the target variable
# let's train a model on the train set and evaluate it on the test set

#let's instantiate the model
model = RandomForestRegressor(random_state=42)

#let's fit the model on the train set
model.fit(X_train,y_train)

#let's predict on the test set
y_pred = model.predict(X_test)

#let's see the first 5 predictions
y_pred[:5]




array([1.  , 3.95, 4.  , 4.  , 3.74])

In [86]:
#let's see the first 5 actual values
y_test[:5]

1493    1
1156    4
1253    4
561     4
1098    4
Name: MOS, dtype: int64

In [87]:
#let's see the shape of the predictions
y_pred.shape

(309,)

In [88]:
#let's see the shape of the actual values
y_test.shape

(309,)

In [89]:
#let's see the distribution of the predictions
pd.Series(y_pred).value_counts()

4.000000    44
5.000000    38
1.000000     8
4.030000     8
4.060000     8
            ..
4.240000     1
4.210000     1
4.619667     1
4.150000     1
4.660000     1
Length: 153, dtype: int64

In [90]:
#let's evaluate the model on the test set
mean_absolute_error(y_test,y_pred)


0.2926870817514021

In [91]:
#let's see our model accuracy
print("Our model accuracy is: {}%".format(round(model.score(X_test,y_test)*100,2)))

Our model accuracy is: 78.46%


In [92]:
# let's try logistic regression as well since we want the model to classify the predictions labels into 5 classes rather than giving us a continuous value

#let's instantiate the model
model = LogisticRegression(random_state=42)

In [93]:
#let's fit the model on the train set
model.fit(X_train,y_train)

In [94]:
#let's predict on the test set
y_pred = model.predict(X_test)

In [95]:
#let's see the first 5 predictions
y_pred[:5]

array([1, 3, 4, 4, 4])

In [96]:
#let's see the first 5 actual values
y_test[:5]

1493    1
1156    4
1253    4
561     4
1098    4
Name: MOS, dtype: int64

In [97]:
#let's see the shape of the predictions
y_pred.shape

(309,)

In [98]:
#let's see the shape of the actual values
y_test.shape

(309,)

In [99]:
#let's see the distribution of the predictions
pd.Series(y_pred).value_counts()

4    129
5     80
3     49
2     36
1     15
dtype: int64

In [100]:
#let's evaluate the model on the test set
mean_absolute_error(y_test,y_pred)

0.36245954692556637

In [102]:
# let's see our model accuracy
print("Our model accuracy is: {}%".format(round(model.score(X_test,y_test)*100,2)))

Our model accuracy is: 68.93%


## see , I told you from the Beginning ! since the class imbalance is not that big, even though we balanced the classes using SMOTE, the accuracy of the model has decreased which is good but, not that much in difference! But,it's good to learn the impact of class imbalance on the accuracy of the model and how to deal with it!