CDL Industry Practicum Project: IoT Use Cases

This repository contains the implementation of three IoT use cases on AWS together with deliverables for the practicum project in collaboration with the Center for Deep Learning (CDL) at Northwestern University.

Overview

Center for Deep Learning

The Center for Deep Learning’s mission is to act as a resource for companies seeking to establish or improve access to artificial intelligence (AI) by providing technical capacity and expertise. Their recent work include serving for deep learning, model architecture redesign, AI for IoT and general streaming, and prediction or scoring confidence. Please refer to the following resource for more information regarding CDL.

REFIT

The Center for Deep Learning is developing REFIT, a novel system that is built to consume and capitalize on IoT infrastructure by ingesting device data and employing modern machine learning approaches to infer the status of various components of the IoT system. It is specifically built upon several open source components with state-of-the-art artificial intelligence and it is notably distinguished from other IoT systems in many regards.

Project Objectives

Develop and implement three IoT use cases based on public data.
Build and end-to-end solution for each use case on AWS, mimicking the general architecuture leveraged in REFIT.
Assess the potential pros and cons of implementing a streaming-based solution in AWS versus REFIT.

Deliverables

A comprehensive final report detailing the three IoT use cases, the end-to-end solution implemented in AWS, and a preliminary comparison between AWS and REFIT.
Source code and thorough documentation as provided in this GitHub repository.

Contacts

Point of Contact - Borchuluun Yadamsuren
Technical Adviser - Diego Klabjan
Supporting Staff - Raman Khurana

Credits

The project was completed by the following MLDS students at Northwestern University: Yi (Betty) Chen, Henry Liang, Sharika Mahadevan, Ruben Nakano, Riu Sakaguchi, Sam Swain, and Yumeng (Rena) Zhang.

IoT Use Cases

Divvy Bikes

A Chicago-based bike share system, Divvy Bikes provides an affordable and convenient mode of transportation throughout cities. The raw dataset provided publicly by Divvy contains information at the trip level, including the starting and ending station and time. The business objective revolves around predicting the number of trips at various stations for the next hour to facilitate resourceful restocking of bikes. The Divvy Bikes use case leverages an LSTM model to account for long-term seasonal dependencies to predict demand.

Hard Drives

Servers comprise of hard drive disks aggregated together to form a storage pod. In particular, hard drives serve as the foundation for both the storage and retrieval of data through rotating disks. The relevant data are ammased by BackBlaze through the monitoring of various sensors in select hard drive disks. The ultimate objective involves the identification of hard drives that are close to failure to facilitate efficient predictive maintainance of server centers. More specifically, this particular use case capitalizes on an XGBoost framework to predict the useful lifetime of hard drives.

MotionSense

The MotionSense data originates from an experiment involving 24 participants performing 6 activities across 15 trials in the same environment with fixed conditions. The activities comprise of moving upstairs, going downstairs, walking, jogging, sitting and standing. The dataset consists of accelerometer and gyroscope measurements generated by sensors in the devices carried by the participants during the experiment. The MotionSense use case also implements an LSTM model for the primary objective: to predict the type of activity from the sensor readings.

AWS Implementation

Instructions

The instructions to run and test the end-to-end AWS solution for the use cases are provided here.

Solution Architecture

Data Sources

Data Ingestion

Kinesis Data Streams
- divvy-stream
- harddrive-stream
- motionsense-stream

Data Preparation

AWS Glue
- divvy_static_etl
AWS Lambda

Data Storage

AWS Lambda
- transform_and_stream_to_S3 (Divvy Bikes)
- motionsense-streamtoS3
- harddrive-streamtoS3
Amazon S3 (stores raw streaming data)
- divvy-stream-data
- harddrive-stream-data
- motionsense-stream-data

Model Inference

Amazon EC2 (hosts model endpoint)
- divvy_api
- harddrive_api
- motionsense_api
AWS Lambda (calls model API and sends prediction to WebSocket)
- divvybikes-getprediction-send2websocket
- lambda-getprediction-send2websocket (Motion Sense)
- harddrive-getprediction-send2websocket
AWS Lambda (calls model API and saves prediction to S3)
- divvybikes-getprediction-savetoS3
- motionsense-getprediction-savetoS3
- harddrive-getprediction-savetoS3
Amazon S3
- divvy-predictions
- harddrive-predictions
- motionsense-predictions

Display Predictions

Amazon EventBridge
AWS Lambda
WebSocket
- websocket-1
DynamoDB
- websocket-connections
- websocket-connections-divvybikes
- websocket-connections-harddrive

Model Retraining

Amazon S3
- divvy-retraining
- harddrive-retraining
- motionsense-retraining
Amazon EventBridge
AWS Lambda
- trigger-motionsense-retrain
- trigger-harddrive-retrain
- trigger-divvy-retrain
- stop-motionsense-retrain
- trigger-motionsense-retrain
- stop-harddrive-retrain
- trigger-harddrive-retrain
- stop-divvy-retrain
- trigger-divvy-retrain
Amazon EC2
- motionsense_retrain
- divvy_retrain
- harddrive_retrain

Solution Cost Estimation

The combined cost of the end-to-end AWS solution for the three use cases is estimated reach an annual total of $3,207.22 USD or equivalently, $267.27 USD per month. Amazon API Gateway and AWS Glue are two of the more costly AWS services employed as part of the comprehensive solution. A detailed break down of the cost estimate by service can be found here.

Solution Strengths

Low latency streaming
Real-time predictions
Simplified process to start up the EC2 instances and stop retraining modules
Great model performance across all three use cases
- Hard Drives: $R^2$ score of 0.96
- Motion Sense: $95.2$% test accuracy
- Divvy Bikes: MAPE of $22.86$%
Cost-effective cloud implementation

Potential Improvements

Visualization
- Currently, the predictions are generated as raw values.
- Augmenting an additional service to visualize past and current predictions could help further improve the AWS solution.
Throughput
- Throughput appeared to decrease inversely proportional to the stream size.
- The lambda function sending records to EC2 was identified as the likely culprit limiting the maximum potential throughput.
- Increasing the compute power and memory could serve as a potential solution.

Final Remarks

The final scope and objectives of the project have transitioned slightly from the original proposal including the implementation of the three use cases on REFIT and designing a model agnostic feature selection algorithm for time series data. These works could serve as potential avenues for consideration for future projects with CDL.

The final report detailing the entire 8 month practicum project can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
AWS		AWS
Archive		Archive
Deliverables		Deliverables
DivvyBikes		DivvyBikes
HardDrives		HardDrives
Misc		Misc
MotionSense		MotionSense
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDL Industry Practicum Project: IoT Use Cases

Overview

Center for Deep Learning

REFIT

Project Objectives

Deliverables

Contacts

Credits

IoT Use Cases

Divvy Bikes

Hard Drives

MotionSense

AWS Implementation

Instructions

Solution Architecture

Solution Cost Estimation

Solution Strengths

Potential Improvements

Final Remarks

About

Releases

Packages

Contributors 7

Languages

samswain2/CDL-UseCases

Folders and files

Latest commit

History

Repository files navigation

CDL Industry Practicum Project: IoT Use Cases

Overview

Center for Deep Learning

REFIT

Project Objectives

Deliverables

Contacts

Credits

IoT Use Cases

Divvy Bikes

Hard Drives

MotionSense

AWS Implementation

Instructions

Solution Architecture

Solution Cost Estimation

Solution Strengths

Potential Improvements

Final Remarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages