Skip to content

This repository is my take on implementing Decision Tree using Dask and applying hyperparameter optimization to showcase my ability to use machine learning in Dask computing.

Notifications You must be signed in to change notification settings

rifqialf/hpo-decisiontree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Introduction

In this project, I performed supervised classification for Land Use and Land Cover (LULC) from provided EuroSAT dataset using simple Decision Tree algorithm leveraging Dask parallel computing and incorporating hyperparameter optimization to find the best parameters for the task.

I used Decision Tree algorithm for the classification since I extracted only 4 features from the dataset (Mean, Median, Range, and NDVI index), assumed to be simple enough to not using more sophisticated algorithm such as Random Forest or ANN. Those statistical features were extracted using Rasterio and stored into Dask dataframe. Hyperparameter optimization was performed to find the best Decition Tree parameters combination for the task. The decision tree algorithm resulted in prediction scores after validation, feature importance visualized in histogram, and classification report for each LULC type.

Implementation

Download the Jupyter Notebook 'hpo_assignment.ipnyb'.

Upload the file to CRIB where the data is located (since moving the data is not convenient due to the very big size).

Run the cells accordingly.

About

This repository is my take on implementing Decision Tree using Dask and applying hyperparameter optimization to showcase my ability to use machine learning in Dask computing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published