Skip to content

mfl28/MachineLearning

Repository files navigation

Machine Learning

This repo contains a compilation of machine learning projects in the form of Jupyter notebooks. For some notebooks additional data, such as bounding box annotation files are needed, these files can be found in the data folder. Pytorch is used as the underlying library for projects involving deep learning.

mltools Library Language grade: Python

This is a Python library which contains useful classes and functions for machine learning and data science tasks, such as feature exploration, object detection and classification as well as semantic segmentation using Pytorch.

How to open notebooks using Docker

Requirements: Docker, docker-compose

The repo provides a Dockerfile and docker-compose.yml to create a Docker container that starts a Jupyter Notebook server (using docker-stacks) and allows you to open the notebooks without having to install the requirements on your system. The steps to do this are:

  1. Clone the repo:
    git clone https://github.com/mfl28/MachineLearning.git
    cd MachineLearning
  2. Build the image and start the container using docker-compose up.
  3. Copy the URL shown in the terminal to your browser's address bar and replace the internal port (8888) with the mapped host port 10000.
  4. When you are done, you can shut down the server from the terminal using CTRL-C and remove the created Docker container using docker-compose down.

Notebooks

Semantic Segmentation

Kaggle Competition: Dstl Satellite Imagery Feature Detection (notebook, nbviewer, Open In Colab)

A notebook showing how to perform semantic segmentation using a fully convolutional neural network. Our aim is to locate buildings in satellite images from the Kaggle Dstl Satellite Imagery Feature Detection Challenge.

Object Detection

Humpback Whale Fluke Detection (notebook, nbviewer, Open In Colab)

A notebook showing how to perform object detection with a custom dataset using a pre-trained and subsequently fine-tuned neural network. Specifically, the aim is to detect and locate humpback whale flukes in images from the Kaggle Humpback Whale Identification Challenge. The ground truth bounding box labels for a selection of 800 images from the training dataset provided by the challenge were created using Bounding Box Editor.

VOCXMLDataset Demo (notebook, nbviewer)

A notebook showcasing the use of the VOCXMLDataset class from mltools.detection.datasets using images and annotations from the VOC2012 dataset for demonstrations.

Classification

Kaggle Competition: Humpback Whale Identification (notebook, nbviewer, Open In Colab)

In this notebook we'll train a classifier to identify humpback whales in images according to the Kaggle Humpback Whale Identification Challenge. We'll use the fast.ai deep learning library to perform this task.

Kaggle Competition: MNIST Digit Recognizer (notebook, nbviewer)

A notebook showing how to train a convolutional neural network object classifier for the MNIST Dataset from the Kaggle MNIST Digit Recognizer competition. The aim is to predict hand-drawn digits in images as accurately as possible.

Kaggle Competition: Titanic - Machine Learning from Disaster (notebook, nbviewer)

The aim of this notebook is to build a model which can predict the survival of passengers of the Titanic. Problem and data come from the Kaggle Titanic: Machine Learning from Disaster competition. We start with an exploration and visualization of the provided features, then proceed to building a feature engineering Pipeline using scikit-learn. Finally we'll experiment with several machine learning approaches to solve the prediction problem.

Releases

No releases published

Packages

No packages published

Languages