Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Cargo Feature for Enabling SSE #77

Merged
merged 11 commits into from
Sep 21, 2019
Merged

Conversation

ethindp
Copy link
Contributor

@ethindp ethindp commented Aug 23, 2019

SSE and AVX support are now enabled if found. As a result, SSe, SSE2, and XSAVE support are now required and will halt the boot process if they are not found. AVX support is enabled at stage 4 and is not required.

@64
Copy link
Contributor

64 commented Aug 24, 2019

To my knowledge, is_x86_feature_detected! isn't available in core.

My personal opinion (which I think Phil shares) is that we should try and keep as many things in pure Rust as possible, dipping into inline assembly if needed. The x86_64 crate provides abstractions for modifying control registers, and core provides abstractions to access cpuid, so you should be able to get quite far in pure Rust.

Finally, it might be a good idea to pass some information in the BootInfo structure letting the kernel know which features were enabled. SSE/SSE2 can unconditionally be enabled, but should probably be informed about XSAVE/SSE3/SSE4/AVX/AVX2/AVX512 being enabled or not. An opt-in or opt-out mechanism for enabling these things might be nice too.

@ethindp
Copy link
Contributor Author

ethindp commented Aug 24, 2019 via email

src/main.rs Outdated
@@ -87,6 +89,9 @@ extern "C" {

#[no_mangle]
pub unsafe extern "C" fn stage_4() -> ! {
if is_x86_feature_detected!("avx") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

@phil-opp
Copy link
Member

Thanks for the pull request!

Unconditionally enabling SSE for kernels is a bad idea since it considerably increases the state that the kernel has to save on each context switch, thereby decreasing performance. For that reason, we should only add this feature behind an optional cargo feature.

Regarding the implementation: Is there a reason for enabling SSE before switching to 64-bit mode? Otherwise I would prefer to keep the bootstrap assembly unchanged and implement it as a normal Rust function instead (sprinkled with inline assembly), as @64 proposed. If there are missing features in x86_64, we can of course add them.

Further, I would like to keep the work of the bootloader to a minimum and move all tasks that can be done in the kernel itself to separate crates instead. Enabling SSE is something that might be useful in the bootloader because it allows to compile the kernel for an SSE target. However, dynamically enabling AVX is not useful for the kernel since the kernel still needs a dynamic check if AVX was enabled or not before using it. So it can just enable AVX itself if desired or call into a crate that performs the initialization.

@ethindp
Copy link
Contributor Author

ethindp commented Aug 24, 2019 via email

@64
Copy link
Contributor

64 commented Aug 24, 2019

If you don’t know how to do something, hop on gitter and I’ll be happy to help out.

But yeah, I agree with Phil here. Better to have an option to enable SSE/SSE2 as a cargo feature, and better to stick to pure rust with inline asm where possible.

@ethindp
Copy link
Contributor Author

ethindp commented Aug 24, 2019

I've moved SSE code into rust (in stage 4) and made SSE and AVX a feature. AVX requires SSE features to be enabled (though only in the cargo manifest). Te bootloader will also check before enabling SSE and AVX to ensure they're actually supported. AVX still calls ASM code (I submitted a gitter msg about the problem). In summary, the Intel SDMs say the following on XGETBV:

Reads the contents of the extended control register (XCR) specified in the ECX register into registers EDX:EAX. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The EDX register is loaded with the high-order 32 bits of the XCR and the EAX register is loaded with the low-order 32 bits. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are cleared.) If fewer than 64 bits are implemented in the XCR being read, the values returned to EDX:EAX in unimplemented bit locations are undefined.

And this on XSETBV:

Writes the contents of registers EDX:EAX into the 64-bit extended control register (XCR) specified in the ECX register. (On processors that support the Intel 64 architecture, the high-order 32 bits of RCX are ignored.) The contents of the EDX register are copied to high-order 32 bits of the selected XCR and the contents of the EAX register are copied to low-order 32 bits of the XCR. (On processors that support the Intel 64 architecture, the high-order 32 bits of each of RAX and RDX are ignored.) Undefined or reserved bits in an XCR should be set to values previously read.

I know, that's a lot. But I'm not really sure how to translate this:

enable_avx:
    push rax
    push rcx
    xor rcx, rcx
    xgetbv
    or eax, 7
    xsetbv
    pop rcx
    pop rax
    ret

into rust. I'll definitely need to use inline ASM either way.

@ethindp
Copy link
Contributor Author

ethindp commented Sep 12, 2019

I have updated the SSE support and removed AVX. I use bit_field for now to set the bits for CR0 and CR4, but anyone is free to change this. I've tested that code though in my kernel and it does work!

@ethindp ethindp closed this Sep 12, 2019
@ethindp ethindp reopened this Sep 12, 2019
@ethindp
Copy link
Contributor Author

ethindp commented Sep 12, 2019

Accidentally closed the PR. I'd love more tests; mine can't be the only ones... :)

Copy link
Member

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, looks much better now!

I left a few comments below, otherwise this looks good to me.

Cargo.toml Outdated
@@ -17,6 +17,7 @@ xmas-elf = { version = "0.6.2", optional = true }
x86_64 = { version = "0.7.2", optional = true }
usize_conversions = { version = "0.2.0", optional = true }
fixedvec = { version = "0.2.4", optional = true }
bit_field = "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard dependencies are not recommended because cargo might select an incompatible version. For example, it can lead to compilation failures when another dependency requires a very old version of bit_field. Just set it to the latest version (0.10.0) instead.

Cargo.toml Outdated
@@ -17,6 +17,7 @@ xmas-elf = { version = "0.6.2", optional = true }
x86_64 = { version = "0.7.2", optional = true }
usize_conversions = { version = "0.2.0", optional = true }
fixedvec = { version = "0.2.4", optional = true }
bit_field = "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please make this dependency optional like the other dependencies above.

Cargo.toml Outdated
@@ -34,6 +35,7 @@ binary = ["xmas-elf", "x86_64", "usize_conversions", "fixedvec", "llvm-tools", "
vga_320x200 = ["font8x8"]
recursive_page_table = []
map_physical_memory = []
sse = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the bit_field dependency optional, you need to change this line to

Suggested change
sse = []
sse = ["bit_field"]

src/main.rs Outdated
asm!("mov $0, %cr4" :: "r" (cr4) : "memory");
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into an enable_sse function in a new sse module? Also, it should be called from main, not stage_4.

@@ -6,3 +6,4 @@ edition = "2018"

[dependencies]
x86_64 = "0.3.4"
bootloader = {path = "..", features=["sse"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a good way to test it because we no longer test it without the sse feature this way. Instead we should create multiple test kernels with different feature combinations like we do for bootimage. This does not need to be part of this PR, though.

I think the best way forward is to merge this without tests (undoing the modifications to the test-kernel) and add proper tests in a follow-up pull request.

@ethindp
Copy link
Contributor Author

ethindp commented Sep 19, 2019 via email

@phil-opp
Copy link
Member

Thanks for the updates! Looks good now.

I am confused; where, exactly, is the main function defined?

Oh sorry, I meant the read_elf function (we should really rename it to something more fitting). The stage_4 function is just a thin wrapper that sets the ss segment and reads the addresses from the extern statics, so extending it doesn't seem fitting.

I think a good place for the call to enable_sse is somewhere at the end of the read_elf function, e.g. right before the let entry_point = … line.

@ethindp
Copy link
Contributor Author

ethindp commented Sep 20, 2019 via email

Copy link
Member

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let get this merged.

@phil-opp phil-opp merged commit 537fc71 into rust-osdev:master Sep 21, 2019
@phil-opp phil-opp changed the title SSE and AVX support (untested) Add a Cargo Feature for Enabling SSE Sep 21, 2019
phil-opp added a commit that referenced this pull request Sep 21, 2019
@phil-opp
Copy link
Member

Published as version 0.8.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants