Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Cargo Feature for Enabling SSE #77

Merged
merged 11 commits into from Sep 21, 2019

Conversation

@ethindp
Copy link
Contributor

commented Aug 23, 2019

SSE and AVX support are now enabled if found. As a result, SSe, SSE2, and XSAVE support are now required and will halt the boot process if they are not found. AVX support is enabled at stage 4 and is not required.

…found.
@64

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2019

To my knowledge, is_x86_feature_detected! isn't available in core.

My personal opinion (which I think Phil shares) is that we should try and keep as many things in pure Rust as possible, dipping into inline assembly if needed. The x86_64 crate provides abstractions for modifying control registers, and core provides abstractions to access cpuid, so you should be able to get quite far in pure Rust.

Finally, it might be a good idea to pass some information in the BootInfo structure letting the kernel know which features were enabled. SSE/SSE2 can unconditionally be enabled, but should probably be informed about XSAVE/SSE3/SSE4/AVX/AVX2/AVX512 being enabled or not. An opt-in or opt-out mechanism for enabling these things might be nice too.

@ethindp

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2019

src/main.rs Outdated
@@ -87,6 +89,9 @@ extern "C" {

#[no_mangle]
pub unsafe extern "C" fn stage_4() -> ! {
if is_x86_feature_detected!("avx") {

This comment has been minimized.

Copy link
@bjorn3
@phil-opp

This comment has been minimized.

Copy link
Member

commented Aug 24, 2019

Thanks for the pull request!

Unconditionally enabling SSE for kernels is a bad idea since it considerably increases the state that the kernel has to save on each context switch, thereby decreasing performance. For that reason, we should only add this feature behind an optional cargo feature.

Regarding the implementation: Is there a reason for enabling SSE before switching to 64-bit mode? Otherwise I would prefer to keep the bootstrap assembly unchanged and implement it as a normal Rust function instead (sprinkled with inline assembly), as @64 proposed. If there are missing features in x86_64, we can of course add them.

Further, I would like to keep the work of the bootloader to a minimum and move all tasks that can be done in the kernel itself to separate crates instead. Enabling SSE is something that might be useful in the bootloader because it allows to compile the kernel for an SSE target. However, dynamically enabling AVX is not useful for the kernel since the kernel still needs a dynamic check if AVX was enabled or not before using it. So it can just enable AVX itself if desired or call into a crate that performs the initialization.

@ethindp

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2019

@64

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2019

If you don’t know how to do something, hop on gitter and I’ll be happy to help out.

But yeah, I agree with Phil here. Better to have an option to enable SSE/SSE2 as a cargo feature, and better to stick to pure rust with inline asm where possible.

ethindp added 2 commits Aug 24, 2019
@ethindp

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2019

I've moved SSE code into rust (in stage 4) and made SSE and AVX a feature. AVX requires SSE features to be enabled (though only in the cargo manifest). Te bootloader will also check before enabling SSE and AVX to ensure they're actually supported. AVX still calls ASM code (I submitted a gitter msg about the problem). In summary, the Intel SDMs say the following on XGETBV:

Reads the contents of the extended control register (XCR) specified in the ECX register into registers EDX:EAX. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The EDX register is loaded with the high-order 32 bits of the XCR and the EAX register is loaded with the low-order 32 bits. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.) If fewer than 64 bits are implemented in the XCR being read, the values returned to EDX:EAX in unimplemented bit locations are undefined.

And this on XSETBV:

Writes the contents of registers EDX:EAX into the 64-bit extended control register (XCR) specified in the ECX register. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The contents of the EDX register are copied to high-order 32 bits of the selected XCR and the contents of the EAX register are copied to low-order 32 bits of the XCR. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are ignored.) Undefined or reserved bits in an XCR should be set to values previously read.

I know, that's a lot. But I'm not really sure how to translate this:

enable_avx:
    push rax
    push rcx
    xor rcx, rcx
    xgetbv
    or eax, 7
    xsetbv
    pop rcx
    pop rax
    ret

into rust. I'll definitely need to use inline ASM either way.

ethindp added 6 commits Aug 24, 2019
… of time)
@ethindp

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2019

I have updated the SSE support and removed AVX. I use bit_field for now to set the bits for CR0 and CR4, but anyone is free to change this. I've tested that code though in my kernel and it does work!

@ethindp ethindp closed this Sep 12, 2019
@ethindp ethindp reopened this Sep 12, 2019
@ethindp

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2019

Accidentally closed the PR. I'd love more tests; mine can't be the only ones... :)

Copy link
Member

left a comment

Thanks for the update, looks much better now!

I left a few comments below, otherwise this looks good to me.

Cargo.toml Outdated
@@ -17,6 +17,7 @@ xmas-elf = { version = "0.6.2", optional = true }
x86_64 = { version = "0.7.2", optional = true }
usize_conversions = { version = "0.2.0", optional = true }
fixedvec = { version = "0.2.4", optional = true }
bit_field = "*"

This comment has been minimized.

Copy link
@phil-opp

phil-opp Sep 13, 2019

Member

Wildcard dependencies are not recommended because cargo might select an incompatible version. For example, it can lead to compilation failures when another dependency requires a very old version of bit_field. Just set it to the latest version (0.10.0) instead.

Cargo.toml Outdated
@@ -17,6 +17,7 @@ xmas-elf = { version = "0.6.2", optional = true }
x86_64 = { version = "0.7.2", optional = true }
usize_conversions = { version = "0.2.0", optional = true }
fixedvec = { version = "0.2.4", optional = true }
bit_field = "*"

This comment has been minimized.

Copy link
@phil-opp

phil-opp Sep 13, 2019

Member

Also, please make this dependency optional like the other dependencies above.

Cargo.toml Outdated
@@ -34,6 +35,7 @@ binary = ["xmas-elf", "x86_64", "usize_conversions", "fixedvec", "llvm-tools", "
vga_320x200 = ["font8x8"]
recursive_page_table = []
map_physical_memory = []
sse = []

This comment has been minimized.

Copy link
@phil-opp

phil-opp Sep 13, 2019

Member

After making the bit_field dependency optional, you need to change this line to

Suggested change
sse = []
sse = ["bit_field"]
src/main.rs Outdated
asm!("mov $0, %cr4" :: "r" (cr4) : "memory");
}
}

This comment has been minimized.

Copy link
@phil-opp

phil-opp Sep 13, 2019

Member

Could you move this into an enable_sse function in a new sse module? Also, it should be called from main, not stage_4.

@@ -6,3 +6,4 @@ edition = "2018"

[dependencies]
x86_64 = "0.3.4"
bootloader = {path = "..", features=["sse"]}

This comment has been minimized.

Copy link
@phil-opp

phil-opp Sep 13, 2019

Member

I'm not sure if this is a good way to test it because we no longer test it without the sse feature this way. Instead we should create multiple test kernels with different feature combinations like we do for bootimage. This does not need to be part of this PR, though.

I think the best way forward is to merge this without tests (undoing the modifications to the test-kernel) and add proper tests in a follow-up pull request.

@ethindp

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

…emoved botloader dep from test kernel.
@phil-opp

This comment has been minimized.

Copy link
Member

commented Sep 20, 2019

Thanks for the updates! Looks good now.

I am confused; where, exactly, is the main function defined?

Oh sorry, I meant the read_elf function (we should really rename it to something more fitting). The stage_4 function is just a thin wrapper that sets the ss segment and reads the addresses from the extern statics, so extending it doesn't seem fitting.

I think a good place for the call to enable_sse is somewhere at the end of the read_elf function, e.g. right before the let entry_point = … line.

@ethindp

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2019

Copy link
Member

left a comment

Thanks! Let get this merged.

@phil-opp phil-opp merged commit 537fc71 into rust-osdev:master Sep 21, 2019
4 checks passed
4 checks passed
rust-osdev.bootloader Build #20190920.1 succeeded
Details
rust-osdev.bootloader (Job linux) Job linux succeeded
Details
rust-osdev.bootloader (Job mac) Job mac succeeded
Details
rust-osdev.bootloader (Job windows) Job windows succeeded
Details
@phil-opp phil-opp changed the title SSE and AVX support (untested) Add a Cargo Feature for Enabling SSE Sep 21, 2019
phil-opp added a commit that referenced this pull request Sep 21, 2019
@phil-opp

This comment has been minimized.

Copy link
Member

commented Sep 21, 2019

Published as version 0.8.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.