Purpose

The main reason for building and configuring this cluster is to leverage hadoop and spark's distributed storage and processing capabilities in order to build, train, test and assess Machine Learning models in-parallel for different projects at work and also to demonstrate you may not need to invest in multiple full-powered computers or servers to build a cluster.

I had to train about 300 XGBoost models on time-series financial data that would take about 7 hours to train total. Yep, we don't have powerful computers or servers at work. As a result, the goal here is to at least reduce the amount of time it takes to train all models. Furthermore, by having a separate cluster, I don't have to use my work's computer to train these models anymore, thus allowing me to continue working on other tasks while the cluster does the job for me.

Yes, of course, a full computer may have much more power in terms of cpu speed, ram, and other hardware. However, not everyone has that much money to spend (including myself). That is where Raspberry Pi's come in.

Raspberry Pi 4B - Main Specs:

Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz
1GB, 2GB or 4GB LPDDR4-3200 SDRAM (depending on model)
2.4 GHz and 5.0 GHz IEEE 802.11ac wireless, Bluetooth 5.0, BLE
Gigabit Ethernet
2 × micro-HDMI ports (up to 4kp60 supported)
4-pole stereo audio and composite video port
H.265 (4kp60 decode), H264 (1080p60 decode, 1080p30 encode)
OpenGL ES 3.0 graphics
Micro-SD card slot for loading operating system and data storage
5V DC via USB-C connector (minimum 3A*)
5V DC via GPIO header (minimum 3A*)
Power over Ethernet (PoE) enabled (requires separate PoE HAT)

Full Specifications: https://www.raspberrypi.org/products/raspberry-pi-4-model-b/specifications/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Purpose

Purpose

Home

Getting Started

Cluster Guide

References

Clone this wiki locally