Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented small Metal abstractions and make the FFT POC usable #193

Merged
merged 147 commits into from
Apr 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
bd6dc56
Added FFTMetalState
xqft Feb 21, 2023
6b4dabc
Added device field to FFTMetalState
xqft Feb 21, 2023
bfe02c9
Added WIP twiddle factors generator in GPU
xqft Feb 21, 2023
bbf982f
Small fixes in WIP twiddle GPU generator
xqft Feb 22, 2023
575b619
Changed twiddle generator function constants to buffers, formatting a…
xqft Feb 22, 2023
6640b52
Merge branch 'main' into gpu-fft
xqft Feb 22, 2023
a51258b
Removed unused helper, formatting
xqft Feb 22, 2023
40b7048
[WIP] Workaround for result buffer, completed gen_twiddles, tests
xqft Feb 24, 2023
0ad0335
Update compiled metal lib
xqft Feb 24, 2023
088e25d
Small fixes in gen_twiddles
xqft Feb 24, 2023
0155ec9
Changed gen_twiddles_cpu loop
xqft Feb 24, 2023
2ec5b07
Merge branch 'main' of github.com:lambdaclass/lambdaworks into gpu-fft
mpaulucci Mar 2, 2023
d96d76c
Merge branch 'main' of github.com:lambdaclass/lambdaworks into gpu-fft
mpaulucci Mar 2, 2023
d432c52
Fix twiddle generation on gpu.
mpaulucci Mar 2, 2023
5ad9296
Merge branch 'main' into gpu-fft
xqft Mar 10, 2023
b361aca
Merge branch 'main' into gpu-fft
xqft Mar 10, 2023
b6a287c
Merge branch 'main' into gpu-fft
xqft Mar 10, 2023
cab941e
Fix artifacts from broken main
xqft Mar 10, 2023
883afed
Fixed threadgroup dispatching from twiddles
xqft Mar 10, 2023
07bf869
[WIP] POC Parallel FFT
xqft Mar 10, 2023
550dbd1
Move all gpu related features to a new crate
IAvecilla Mar 10, 2023
88a907e
[WIP] Fix metal kernel
xqft Mar 13, 2023
8ca7783
Merge branch 'gpu_crate' into gpu-fft
xqft Mar 14, 2023
46217d4
Merge branch 'main' into gpu-fft
xqft Mar 14, 2023
d878cf4
[WIP] Various changes related to Metal FFT
xqft Mar 14, 2023
7c4bc49
[WIP] For testing purposes
xqft Mar 14, 2023
c336fdb
Use real U32 field.
mpaulucci Mar 15, 2023
d54a3a4
[WIP] Fixed undefined behaviours
xqft Mar 15, 2023
da6b2ab
Finished FFT in Metal POC
xqft Mar 15, 2023
f23d63e
Removed unused line
xqft Mar 15, 2023
0e538b5
Added comments for unsafe lines
xqft Mar 15, 2023
601f676
Changed storage mode for metal buffers
xqft Mar 15, 2023
d1eb48c
Replaced static array with unsafe vector from pointer
xqft Mar 15, 2023
92f47ea
Proptest for Metal FFT
xqft Mar 16, 2023
5d1f8f9
Simplified stage loop
xqft Mar 17, 2023
c81a759
Split Metal setup from execution
xqft Mar 17, 2023
d0b559f
Fixed basetype size
xqft Mar 17, 2023
6690fc1
Changed FFT Metal test name
xqft Mar 17, 2023
9885cd7
Small fix
xqft Mar 17, 2023
7919037
Merge branch 'main' into gpu-fft
xqft Mar 17, 2023
df59f72
Fixed test memory leaks
xqft Mar 17, 2023
6529fb8
Added benchmarks
xqft Mar 17, 2023
5270d67
Fixed leaks on benchmarks
xqft Mar 17, 2023
1762348
Moved and revamped parallel twiddle generation
xqft Mar 17, 2023
021660f
Formatting
xqft Mar 20, 2023
978158f
Add build-essential to try to fix dependency problem
klaus993 Mar 20, 2023
fe27896
Add sudo to apt install
klaus993 Mar 20, 2023
ec8f978
Swap apt install to g++
klaus993 Mar 20, 2023
d2951f2
Try other solutions to dependency problem
klaus993 Mar 20, 2023
6561da3
Try reinstall g++
klaus993 Mar 20, 2023
f59c23c
Try installing gobjc++-mingw
xqft Mar 20, 2023
bbeef25
Try gobjc++
klaus993 Mar 20, 2023
a120a50
Try gobjc
xqft Mar 20, 2023
6011f64
Try all
xqft Mar 20, 2023
86a0af8
Add gnustep-base-common
xqft Mar 20, 2023
c015531
Added metal abstraction
xqft Mar 20, 2023
7f9d43a
Removed unnecesarry function
xqft Mar 20, 2023
0524248
[WIP] Added first iteration of FFT metal API, changed proptests
xqft Mar 20, 2023
7462883
Try cargo check as command
klaus993 Mar 21, 2023
8a85b27
Try CFLAGS
klaus993 Mar 21, 2023
23e70f8
Update CFLAGS
klaus993 Mar 21, 2023
16778ce
Export CFLAGS
klaus993 Mar 21, 2023
6e9afd7
Update CFLAGS
klaus993 Mar 21, 2023
59aa0a1
Abstracted buffer content retreiving
xqft Mar 21, 2023
e04d633
Try cross
klaus993 Mar 21, 2023
17220ef
Fix syntax
klaus993 Mar 21, 2023
8f26258
Try CFLAGS
klaus993 Mar 21, 2023
b5a0495
Fix syntax
klaus993 Mar 21, 2023
d1edd6a
Added gen_twiddles(), removed FFTMetal and FFTMetalState
xqft Mar 21, 2023
2a8aa4a
Try macos runners
klaus993 Mar 21, 2023
9e49cd3
Docs
xqft Mar 21, 2023
35221e5
Swap other CI steps to macos runner
klaus993 Mar 21, 2023
7d5e7de
Changed twiddles proptest for clarity
xqft Mar 21, 2023
a047d35
Updated benchmarks
xqft Mar 21, 2023
f0f3b85
Replaced group_size buffer with set_bytes()
xqft Mar 21, 2023
12f7562
Fix fft.metal identation
xqft Mar 21, 2023
555a0c1
Replaced gen_twiddles' result_buffer for set_bytes
xqft Mar 21, 2023
cc23271
Delete unused function
IAvecilla Mar 21, 2023
ebb4cee
Merge branch 'main' into gpu-fft
IAvecilla Mar 21, 2023
eda6dde
Add benchmark for twiddles to the general benchmark function
IAvecilla Mar 21, 2023
8cac9e1
Add conditional compilation for objc crate
IAvecilla Mar 21, 2023
69a1ec8
Swap runners back to ubuntu
klaus993 Mar 21, 2023
6b713fb
Add conditional compilation for metal crate to compile only in macos
IAvecilla Mar 21, 2023
7c819da
Revert conditional compilation for metal and objc crates
IAvecilla Mar 21, 2023
f5e5e3b
Deleted operations.rs
xqft Mar 22, 2023
b3b6faa
Added newline at end of fp.h.metal
xqft Mar 22, 2023
509890f
Merge branch 'main' into gpu-fft
xqft Mar 22, 2023
0c6f056
Reverted name change of U64TestField
xqft Mar 22, 2023
f2f5ecd
Merge branch 'gpu-fft' into metal-fft-api
xqft Mar 22, 2023
43fd83e
Change runners in CI to run in macos
IAvecilla Mar 22, 2023
d18d6d6
Ignore metal tests
IAvecilla Mar 22, 2023
f23ccbb
Removed some unsafe blocks, added comments
xqft Mar 22, 2023
cca1077
Removed metal dependency from math crate
xqft Mar 23, 2023
fac74e7
Removed GPU from default members
MauroToscano Mar 22, 2023
cf0599f
Reverted to use ubuntu on CI
xqft Mar 23, 2023
d444e64
Changed coverage host to macos
xqft Mar 23, 2023
2f7816c
Merge branch 'gpu-fft' into metal-fft-api
xqft Mar 23, 2023
a603f5b
Removed unsafe block from alloc_buffer_data
xqft Mar 23, 2023
5ea3d8a
Changed setup_pipeline to accept Option<> and buffer index
xqft Mar 23, 2023
8192adc
Docs
xqft Mar 23, 2023
ca24bdb
gpu crate filetree rethinking
xqft Mar 23, 2023
6ff2d00
poly
xqft Mar 23, 2023
fd4eb79
Add trait for polynomials using fft with gpu
IAvecilla Mar 23, 2023
3bb0a15
Implemented all configs for twiddle gen in Metal
xqft Mar 24, 2023
56cd27a
Merge branch 'metal-fft-api' into metal-fft-api-experimental
xqft Mar 24, 2023
9277f3d
Remove ignore in tests, borrow MetalState
xqft Mar 24, 2023
e90173b
Implemented polynomial interpolation in Metal and tests
xqft Mar 24, 2023
d4cd8f2
Merge branch 'main' into metal-fft-api
xqft Mar 24, 2023
905b077
Reestructured metal shaders, added target to Makefile
xqft Mar 24, 2023
3209c51
Added parallel bitrev permutation, reorg shaders
xqft Mar 26, 2023
0e3e055
Use parallel bitrev permutation in metal fft
xqft Mar 26, 2023
0d388b4
Removed shaders.metal
xqft Mar 26, 2023
beeec02
Allow only default members on coverage job
xqft Mar 27, 2023
30e8377
Added metal bitrev permutation benchmark
xqft Mar 27, 2023
ab07f9f
Added benchmarks for comparing bitrev perm. and twiddle gen.
xqft Mar 27, 2023
f12e9fc
Better names for benchmarks
xqft Mar 27, 2023
fc295fa
Added metal feature for conditional compilation
xqft Mar 27, 2023
e059b59
Made objc an optional dependency
xqft Mar 27, 2023
48bbed0
Better benchmark groups for comparing functions
xqft Mar 27, 2023
c1bea08
Removed workspace flag from fmt
xqft Mar 27, 2023
ad6b599
Revert removing default members
xqft Mar 27, 2023
6a99c9c
Implemented solution for conditional compilation using only features
xqft Mar 28, 2023
078ab00
Merge branch 'main' into metal-fft-api
xqft Mar 28, 2023
0825a38
Merge branch 'main' of github.com:lambdaclass/lambdaworks into metal-…
mpaulucci Mar 29, 2023
eacf20b
Improve fft proptests with random coefficients values
IAvecilla Mar 29, 2023
1c99faf
Adapt new metal functions to U256 Fields.
mpaulucci Mar 30, 2023
1fb2b34
Fixed sequential FFT tests
xqft Mar 30, 2023
87cdbe2
Fix comments
xqft Mar 30, 2023
c2b103e
Fix format and clippy errors
IAvecilla Mar 30, 2023
1fa7737
Delete unnecesary slice patterns for coefficients arrays
IAvecilla Mar 30, 2023
a7ee965
Add file for test helpers and DFT function
IAvecilla Mar 31, 2023
d90e634
Delete unnecesary proptest macro uses
IAvecilla Mar 31, 2023
2f476f7
Merge branch 'fft_proptests_improvement' into metal-fft-api
xqft Mar 31, 2023
4d1d031
Merge branch 'main' into metal-fft-api
xqft Mar 31, 2023
0f019ab
Added more random vec proptest, fixed metal FFT
xqft Mar 31, 2023
fd08c10
Merge branch 'fft_proptests_improvement' into metal-fft-api
xqft Mar 31, 2023
c167766
Fixed poly non_pow_of_two proptest
xqft Mar 31, 2023
e674a18
Implemented From trait to FFTMetalError
xqft Mar 31, 2023
27f5581
Implemented FFT with blowup in Metal
xqft Mar 31, 2023
00c654a
Fix gen_twiddles doc
xqft Mar 31, 2023
3d3a2b9
Implement evaluate_offset_fft_metal for Polynomial
xqft Mar 31, 2023
67b971c
Reorganized Metal FFT proptests
xqft Mar 31, 2023
9d3511f
Merge branch 'main' into metal-fft-api
xqft Apr 10, 2023
5768de5
Fix docs
xqft Apr 10, 2023
c8c61a4
Remove useless map_err
xqft Apr 10, 2023
5e4242c
Make Metal GPU poly trait public
IAvecilla Apr 11, 2023
2394609
Add new line in shader file
IAvecilla Apr 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
- name: Install cargo-llvm-cov
uses: taiki-e/install-action@cargo-llvm-cov
- name: Run tests and generate code coverage
run: cargo llvm-cov --all-features --workspace --lcov --output-path lcov.info
run: cargo llvm-cov --lcov --output-path lcov.info
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
Expand Down
6 changes: 0 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
[workspace]
default-members = [
"math",
"crypto",
"proving_system/stark",
"proving_system/plonk"
]
members = [
"math",
"crypto",
Expand Down
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ test:
cargo test

clippy:
cargo clippy --all-targets --all-features -- -D warnings
cargo clippy --all-targets -- -D warnings

docker-shell:
docker build -t rust-curves .
Expand All @@ -20,5 +20,9 @@ benchmarks:
benchmark:
cargo criterion --bench ${BENCH}

METALPATH = gpu/src/metal/shaders
build-metal:
xcrun -sdk macosx metal $(METALPATH)/all.metal -o $(METALPATH)/lib.metallib
xqft marked this conversation as resolved.
Show resolved Hide resolved

docs:
cd docs && mdbook serve --open
19 changes: 16 additions & 3 deletions gpu/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,28 @@ edition = "2021"

[dependencies]
lambdaworks-math = { path = "../math" }
lambdaworks-crypto = { path = "../crypto"}
thiserror = "1.0.38"
rand = "0.8.5"
metal = "0.24.0"
metal = { version = "0.24.0", optional = true }
objc = { version = "0.2.7", optional = true }

[features]
metal = ["dep:metal", "dep:objc"]
cuda = []

# Some features activate compilation of code which isn't
# supported in all machines (e.g. metal, cuda), so we won't
# use `--all-features` in any case, instead every feature
# that should compile in all cases will require to be added
# as default. If you don't want to compile with all of these
# use `--no-default-features`.
default = []

[dev-dependencies]
proptest = "1.1.0"
criterion = "0.4.0"
objc = "0.2.7"

[[bench]]
name = "metal_benchmarks"
harness = false
required-features = ["metal"]
11 changes: 0 additions & 11 deletions gpu/benches/all_benchmarks.rs

This file was deleted.

60 changes: 38 additions & 22 deletions gpu/benches/benchmarks/metal.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
use criterion::Criterion;
use lambdaworks_gpu::fft::fft_metal::*;
use lambdaworks_gpu::metal::{abstractions::state::MetalState, fft::ops::*};
use lambdaworks_math::{
fft::bit_reversing::in_place_bit_reverse_permute,
field::{element::FieldElement, traits::IsTwoAdicField},
field::{test_fields::u32_test_field::U32TestField, traits::RootsConfig},
};
Expand All @@ -10,42 +9,35 @@ use rand::random;
type F = U32TestField;
type FE = FieldElement<F>;

fn gen_coeffs(pow: usize) -> Vec<FE> {
let mut result = Vec::with_capacity(1 << pow);
fn gen_coeffs(order: usize) -> Vec<FE> {
let mut result = Vec::with_capacity(1 << order);
for _ in 0..result.capacity() {
result.push(FE::new(random()));
}
result
}

pub fn metal_fft_benchmarks(c: &mut Criterion) {
let mut group = c.benchmark_group("metal_fft");
let mut group = c.benchmark_group("FFT");

for order in 20..=24 {
for order in 21..=24 {
let coeffs = gen_coeffs(order);
group.throughput(criterion::Throughput::Elements(1 << order)); // info for criterion

// the objective is to bench ordered FFT, including twiddles generation and Metal setup
group.bench_with_input(
format!("parallel_nr_2radix_2^{order}_coeffs"),
format!("Parallel NR radix2 FFT for 2^{order} elements"),
&coeffs,
|bench, coeffs| {
bench.iter(|| {
// TODO: autoreleaspool hurts perf. by 2-3%. Search for an alternative
objc::rc::autoreleasepool(|| {
let coeffs = coeffs.clone();
let metal_state = MetalState::new(None).unwrap();
let twiddles =
F::get_twiddles(order as u64, RootsConfig::BitReverse).unwrap();
let fft_metal = FFTMetalState::new(None).unwrap();
let command_buff_encoder = fft_metal
.setup_fft("radix2_dit_butterfly", &twiddles)
.unwrap();

let mut result = fft_metal
.execute_fft(&coeffs, command_buff_encoder)
.unwrap();

in_place_bit_reverse_permute(&mut result);
fft(&coeffs, &twiddles, &metal_state).unwrap();
});
});
},
Expand All @@ -56,22 +48,46 @@ pub fn metal_fft_benchmarks(c: &mut Criterion) {
}

pub fn metal_fft_twiddles_benchmarks(c: &mut Criterion) {
let mut group = c.benchmark_group("metal_fft");
let mut group = c.benchmark_group("FFT twiddles generation");
group.sample_size(10); // it becomes too slow with the default of 100

for order in 2..=4 {
for order in 21..=24 {
group.throughput(criterion::Throughput::Elements(1 << order)); // info for criterion

// the objective is to bench ordered FFT, including twiddles generation and Metal setup
group.bench_with_input(
format!("parallel_twiddle_factors_2^({order}-1)_elems"),
format!("Parallel twiddles generation for 2^({order}-1) elements"),
&order,
|bench, order| {
bench.iter(|| {
// TODO: autoreleaspool hurts perf. by 2-3%. Search for an alternative
objc::rc::autoreleasepool(|| {
let metal_state = FFTMetalState::new(None).unwrap();
let _gpu_twiddles = metal_state.gen_twiddles::<F>(*order).unwrap();
let metal_state = MetalState::new(None).unwrap();
gen_twiddles::<F>(*order, RootsConfig::Natural, &metal_state).unwrap();
});
});
},
);
}

group.finish();
}

pub fn metal_bitrev_permutation_benchmarks(c: &mut Criterion) {
let mut group = c.benchmark_group("Bit-reverse permutation");

for order in 21..=24 {
let coeffs = gen_coeffs(order);
group.throughput(criterion::Throughput::Elements(1 << order)); // info for criterion

group.bench_with_input(
format!("Parallel bitrev permutation for 2^{order} elements"),
&coeffs,
|bench, coeffs| {
bench.iter(|| {
// TODO: autoreleaspool hurts perf. by 2-3%. Search for an alternative
objc::rc::autoreleasepool(|| {
let metal_state = MetalState::new(None).unwrap();
bitrev_permutation(coeffs, &metal_state).unwrap();
});
});
},
Expand Down
1 change: 1 addition & 0 deletions gpu/benches/metal_benchmarks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use criterion::{criterion_group, criterion_main, Criterion};
mod benchmarks;

fn run_metal_benchmarks(c: &mut Criterion) {
benchmarks::metal::metal_bitrev_permutation_benchmarks(c);
benchmarks::metal::metal_fft_twiddles_benchmarks(c);
benchmarks::metal::metal_fft_benchmarks(c);
}
Expand Down
Loading