# **Problem: Random forest for human activity prediction.**

Random forest for human activity prediction using UCI HAR Dataset.

**Examples:**

Change the variable 'url' by providing the google drive URL of the zip file, that you want to download.

Eg:url = 'https://drive.google.com/file/d/1K7izykrla-qEuekekLayfGddml17calY/view?usp=sharing'

Run all the cells. After executing the last cell, you will see the bthe accuracy score of the model as well.

**Notes:**

Following things are needed to be checked before running the program.
 1. Sklearn module is needed to run this program in a notebook. 
 2. Check whether you have given the correct location of your zip file.
 3. You should have access to the file in the Google Drive.


# **Import Modules**

In [1]:
# Import pandas
import pandas as pd

# Import RandomForestClassifier since we are using random forest for this problem
from sklearn.ensemble import RandomForestClassifier

# Import GridSearchCV
from sklearn.model_selection import GridSearchCV

# Import accuracy_score to calculate accuracy
from sklearn.metrics import accuracy_score

# Import gdown module to download files from google drive
import gdown

# Import zip file module to open the zip file
from zipfile import ZipFile

# **Get the file location from google drive**

In [None]:
# Please change the URL as needed (make sure you have the access to the file)

url = 'https://drive.google.com/file/d/1z_zn7vv-Sk60fdoQuPN3h9wkyP8H-FQR/view?usp=sharing'

# Derive the file id from the URL
file_id = url.split('/')[-2]

# Derive the download url of the the file
download_url = 'https://drive.google.com/uc?id=' + file_id

# Give the location you want to save
file_location = 'UCI_HAR_Dataset.zip'

# Download the file from drive
gdown.download(download_url, file_location, quiet=False)

# **Unzip the zip dataset**

In [None]:
!unzip /content/UCI_HAR_Dataset.zip -d "/content/unzipped_folder/"

# **Begin Activity prediction operation**

In [None]:
# Read train and test file using pandas
xtrain=pd.read_table(r'/content/unzipped_folder/UCI HAR Dataset/train/X_train.txt',delim_whitespace=True,header=None)


xtest=pd.read_table(r'/content/unzipped_folder/UCI HAR Dataset/test/X_test.txt',delim_whitespace=True,header=None)


ytrain=pd.read_table(r'/content/unzipped_folder/UCI HAR Dataset/train/y_train.txt',header=None)


ytest=pd.read_table(r'/content/unzipped_folder/UCI HAR Dataset/test/y_test.txt',header=None)

# Return first 5 raws of the xtrain dataframe
xtrain.head()

# Initialize randomforest classifier
classifier = RandomForestClassifier()

# Define parameters for GridSearchCV method below
parameters = {'n_estimators': [10, 100, 1000], 'max_depth': [3, 6, 9], 'max_features' : ['auto', 'log2']}

# Derive the model
model=GridSearchCV(classifier,parameters,n_jobs=-1,cv=4,scoring='accuracy',verbose=4)

# Fit training data
model.fit(xtrain.to_numpy(),ytrain.to_numpy().ravel().T)

# Get the predictions
ypred=model.predict(xtest)

# Calculate the accuracy of the model
accuracy=accuracy_score(ytest,ypred)

# Print accuracy score
print('Accuracy Score: '+ str(accuracy*100) + ' %')