##  Random Forest Algorithm

A simple Random Forest Regression approach to predict the completion time of a video game.

### Import Python Packages

In [1]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import pandas as pd

### Reading the dataset

In [2]:

my_df = pd.read_csv("video_game_data.csv")

In [3]:
my_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   level            30 non-null     int64
 1   ammo             30 non-null     int64
 2   completion_time  30 non-null     int64
dtypes: int64(3)
memory usage: 848.0 bytes


In [5]:
my_df.columns

Index(['level', 'ammo', 'completion_time'], dtype='object')

In [6]:
my_df

Unnamed: 0,level,ammo,completion_time
0,2,8,129
1,28,47,65
2,5,3,103
3,1,3,145
4,17,14,78
5,3,3,148
6,2,4,125
7,4,14,143
8,2,15,120
9,14,35,84


### Splitting the data into input and output objects

In [7]:
X = my_df.drop(["completion_time"], axis = 1)  # axis=1 for dropping column
y = my_df["completion_time"]


In [8]:
X

Unnamed: 0,level,ammo
0,2,8
1,28,47
2,5,3
3,1,3
4,17,14
5,3,3
6,2,4
7,4,14
8,2,15
9,14,35


In [9]:
y

0     129
1      65
2     103
3     145
4      78
5     148
6     125
7     143
8     120
9      84
10    146
11     86
12    101
13    139
14     94
15    141
16    121
17     74
18     95
19     77
20    113
21    132
22    101
23     88
24     93
25    147
26     69
27    104
28    133
29    103
Name: completion_time, dtype: int64

### Splitting the data into training and test sets

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42 )


0.2 means 20% of data is used for testing and 80% for training the model

### Instantiating the model object

In [11]:
regressor = RandomForestRegressor(random_state = 42)

### Training the model

In [12]:
regressor.fit(X_train,y_train)

RandomForestRegressor(random_state=42)

### Assessing Model Accuracy

In [13]:
y_pred = regressor.predict(X_test)

prediction_comparison = pd.DataFrame({"actual" : y_test,"prediction" : y_pred})

prediction_comparison


Unnamed: 0,actual,prediction
27,104,124.551667
15,141,125.173333
23,88,86.19
17,74,93.13
8,120,122.683333
9,84,81.9


#### Model Accuracy Score

In [14]:
r2_score(y_test, y_pred)

0.6657934159750488