Named after one of the first programs I ever wrote as a child, Richard started out as a personal effort to learn more about machine learning. The original Richard was meant to be a "virus", but the most malicious thing I could do on my Psion Series 3 personal organiser was print the phrase "Richard is gaining power" in an infinite loop.
The new version of Richard is strictly benevolent.
In its current form, Richard is a CLI application that performs classification using a neural network. Supported layer types include dense, convolutional, and max pooling, but there will likely be others in the future.
GPU acceleration is supported with Vulkan compute shaders.
- cmake
- vcpkg
- Vulkan SDK
- Visual Studio 17 2022
- cmake
- vcpkg
- Vulkan SDK
- XCode
- cmake
- vcpkg
- Vulkan SDK
To build, just run the relevant workflow from the project root.
To see the list of workflows
cmake --workflow --list-presets
For example, to make a debug build on linux
cmake --workflow --preset=linux-debug
You can also run the configure/build steps separately
cmake --preset=linux-debug
cmake --build --preset=linux-debug
To see usage
./richardcli/richardcli -h
All examples are run from the build directory, e.g. build/linux/release, and assume you have datasets located under data/.
{
"data": {
"classes": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
"shape": [784, 1, 1],
"normalization": {
"min": 0,
"max": 255
}
},
"dataLoader": {
"fetchSize": 512
},
"classifier": {
"network": {
"hyperparams": {
"epochs": 30,
"batchSize": 1024,
"miniBatchSize": 32,
},
"hiddenLayers": [
{
"type": "dense",
"size": 320,
"learnRate": 0.1,
"learnRateDecay": 1.0,
"dropoutRate": 0.0
},
{
"type": "dense",
"size": 64,
"learnRate": 0.1,
"learnRateDecay": 1.0,
"dropoutRate": 0.0
}
],
"outputLayer": {
"size": 10,
"learnRate": 0.1,
"learnRateDecay": 1.0
}
}
}
}
./richardcli/richardcli --train \
--samples ../../../data/ocr/train.csv \
--config ../../../data/ocr/config.json \
--network ../../../data/ocr/network \
--gpu
./richardcli/richardcli --eval \
--samples ../../../data/ocr/test.csv \
--network ../../../data/ocr/network \
--gpu
{
"data": {
"classes": ["cat", "dog"],
"shape": [100, 100, 3],
"normalization": {
"min": 0,
"max": 255
}
},
"dataLoader": {
"fetchSize": 512
},
"classifier": {
"network": {
"hyperparams": {
"epochs": 10,
"batchSize": 1024,
"miniBatchSize": 32,
},
"hiddenLayers": [
{
"type": "convolutional",
"depth": 32,
"kernelSize": [3, 3],
"learnRate": 0.01,
"learnRateDecay": 1.0,
"dropoutRate": 0.0
},
{
"type": "maxPooling",
"regionSize": [2, 2]
},
{
"type": "convolutional",
"depth": 64,
"kernelSize": [4, 4],
"learnRate": 0.01,
"learnRateDecay": 1.0,
"dropoutRate": 0.0
},
{
"type": "maxPooling",
"regionSize": [2, 2]
},
{
"type": "dense",
"size": 64,
"learnRate": 0.01,
"learnRateDecay": 1.0,
"dropoutRate": 0.0
}
],
"outputLayer": {
"size": 2,
"learnRate": 0.01,
"learnRateDecay": 1.0
}
}
}
}
./richardcli/richardcli --train \
--samples ../../../data/catdog/train \
--config ../../../data/catdog/config.json \
--network ../../../data/catdog/network \
--gpu
./richardcli/richardcli --eval \
--samples ../../../data/catdog/test \
--network ../../../data/catdog/network \
--gpu
Install google perftools
sudo apt install google-perftools
Build the linux-cpuprof preset
cmake --workflow --preset=linux-cpuprof
Specify the intermediate file in the CPUPROFILE environment variable and run as usual, e.g.
CPUPROFILE=./prof.out ./richardcli/richardcli --train \
--samples ../../../data/ocr/train.csv \
--config ../../../data/ocr/config_cnn.json \
--network ../../../data/ocr/network
For text output
google-pprof --text ./richardcli/richardcli ./prof.out > ./prof.txt
For graphical output
google-pprof --gv ./richardcli/richardcli ./prof.out
The text file should contain something like this
Total: 2823 samples
1166 41.3% 41.3% 1277 45.2% richard::computeCrossCorrelation
1039 36.8% 78.1% 1145 40.6% richard::computeFullCrossCorrelation
199 7.0% 85.2% 199 7.0% richard::Kernel::at (inline)
...
The first column is the number of samples spent inside the function.
The second column is this same number expressed as a percentage of the total samples taken. So in this case, we spent 41.3% of the time executing computeCrossCorrelation.
The third column is the cumulative time spent inside the function. In this example, 85.2% of the execution time is accounted for by these top 3 functions.
The next two columns tell us for how long the given function was part of the call stack. In other words, it includes time spent executing child calls. So in this example, we spent 40.6% of the time inside computeFullCrossCorrelation (including child calls), but only 36.8% actually within the computeFullCrossCorrelation function itself.