Skip to content

Conversation

maarquitos14
Copy link
Contributor

This commit adds a design document for the implementation of a Device Configuration File required by Device Aspect Traits and sycl-aspect-filter.

maarquitos14 and others added 6 commits May 4, 2023 14:26
First iteration of design doc for Device Configuration File.
Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
@maarquitos14 maarquitos14 requested a review from a team as a code owner May 9, 2023 14:31
@maarquitos14 maarquitos14 marked this pull request as draft May 9, 2023 14:31
def AspectExt_intel_device_id : AspectEntry<37>;
def AspectExt_intel_memory_clock_rate : AspectEntry<38>;
def AspectExt_intel_memory_bus_width : AspectEntry<39>;
def AspectEmulated : AspectEntry<40>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, this further intensifies my worries about using the aspect values in the original config file design. By adding this, we suddenly need to keep this non-runtime file in sync with the runtime, which we have previously tried our best to not need.

Would it be possible to make this completely oblivious to existing aspects and just require the users to provide the specific names or ID? The IR should contain a mapping of names and IDs, so either should be enough to work with in the end, at least for IR passes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, this further intensifies my worries about using the aspect values in the original config file design. By adding this, we suddenly need to keep this non-runtime file in sync with the runtime, which we have previously tried our best to not need.

To add more context here, see #5892. Essentially, we would like our compiler implementation to be acceptable in LLVM upstream and that means that it should be generic. Compiler should not be dependent on exact SYCL RT/Headers and instead should provide some interface for SYCL-specific things, so different SYCL implementations can be built on top of the compiler.

By hardcoding those numbers here, we have two problems:

  1. It is easy to get them de-synced with our header files, which could cause obscure errors to happen
  2. It makes this feature impossible to upstream, because it is now locked-in to a certain SYCL headers implementation

From that point of view, having default config file is actually better, because it is not a part of the compiler, but it is a part of our bigger implementation. However: I still think that embedded defaults are way better from performance (and technical) point of view; even separate config file which requires exact numbers associated with aspects is error-prone and doomed to contain bugs

Tagging @bader for awareness here.

Would it be possible to make this completely oblivious to existing aspects and just require the users to provide the specific names or ID? The IR should contain a mapping of names and IDs, so either should be enough to work with in the end, at least for IR passes.

We could operate solely on names for the purpose of filtering device images when doing AOT compilation. However, I think that actual numbers are also required for other uses of this config, such as: diagnostic messages from -fsycl-fixed-targets or defining a macro for any_device_has/all_devices_have.

Possible solution here, which should allow us to proceed forward and make the interface more or less generic is to make the mapping configurable as well.

The idea is to declare that:

def AspectExt_intel_device_id : AspectEntry<37>;
def AspectExt_intel_memory_clock_rate : AspectEntry<38>;
def AspectExt_intel_memory_bus_width : AspectEntry<39>;
def AspectEmulated : AspectEntry<40>;

To be a default mapping, which is hardcoded for our own implementation and that is perfectly fine for own fork, since we target our own headers. However, for the purposes of upstream, the same mapping should be configurable as the rest of default config through a command line option and/or config file. That kind of renders those defaults useless, but still provides some kind of infrastructure, which can be used by downstream projects to set their own defaults accordingly.

The .td file should of course be extended with string literals corresponding to those numbers to be able to read user-provided config file, which would look like:

target:
- aspects: fp16, fp16, ...

This does not resolve a problem of the default mapping being de-synced with our own headers, but perhaps we can come up with some integration test for that matter to ensure that we haven't missed any aspect and that the mapping is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think that actual numbers are also required for other uses of this config, such as: diagnostic messages from -fsycl-fixed-targets or defining a macro for any_device_has/all_devices_have.

I don't think those strictly need the values. For the fixed-targets I would assume it would have access to the mappings either way, otherwise it would not be able to produce any useful diagnostics to the user about what aspects are problematic. For the any_device_has/all_devices_have, the macros they generate only use the value because that fit with the original design of the configuration file using values. We could change the macros to use the aspect names instead. It may even make the macros more readable, albeit more verbose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with @steffenlarsen that the config file could contain only the names of the aspects, and not their numerical values. Let's consider all the places in the compiler that will use the config file:

  • The driver needs the config file to know the set of legal values for -fsycl-targets. This logic doesn't need the aspects at all, only the target name.

  • The driver needs the config file to macros for any_device_has and all_devices_have. As @steffenlarsen says, this design can be changed, so the driver emits macros using the name of the aspect (e.g. __SYCL_ALL_DEVICES_HAVE_cpu__) rather than the aspect number (e.g. __SYCL_ALL_DEVICES_HAVE_1__). I think this is an improvement in the design anyways.

  • The LLVM IR pass that propagates aspects needs the config file to diagnose errors when compiling with --fsycl-fixed-targets. This pass can use the !sycl_aspects metadata to translate aspect names to numbers. Therefore, it is OK if the config file contains the aspect names.

  • I intend to change the design for AOT optional features, so we no longer have the sycl-aspect-filter tool. Instead, sycl-post-link will read the config file and use that information to filter the kernels that are included in each device image. This case requires a little more research. Does the !sycl_aspects metadata get propagated through llvm-link, so that it is available to sycl-post-link? Looking at a simple example, I think it does. If so, then sycl-post-link can use that metadata to translate between aspect names and numbers.

BTW, I think it would be good to expand the "Requirements" section above to list these 4 specific places in the compiler that will use the config file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the !sycl_aspects metadata get propagated through llvm-link, so that it is available to sycl-post-link?

Linker shouldn't drop the metadata. It may be altered a bit if several input modules contain the same metadata, i.e. they will be merged - we probably need to check if that would still be recoverable for our needs in sycl-post-link, but likely it will be.

I intend to change the design for AOT optional features, so we no longer have the sycl-aspect-filter tool. Instead, sycl-post-link will read the config file and use that information to filter the kernels that are included in each device image.

BTW, @gmlueck, could you please share more details here? I though that the main motivation for sycl-aspect-filter tool is to overcome clang driver limitation of having statically known list of commands. Are we going to emit "reduced" modules for some of targets, i.e. remove kernels which are not supported there?

Also, to state this here explicitly: we had an offline sync some time ago and I don't have objections against using strings instead of integers as aspects identification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, @gmlueck, could you please share more details here? I though that the main motivation for sycl-aspect-filter tool is to overcome clang driver limitation of having statically known list of commands. Are we going to emit "reduced" modules for some of targets, i.e. remove kernels which are not supported there?

Yes, this was my thought. The current POR is for sycl-post-link to emit all the kernels and then filter out the incompatible ones with a separate tool sycl-aspect-filter. Rather than doing this, I think sycl-post-link can read the config file and filter out the incompatible kernels. My proposed design for "if_device_has" requires the sycl-post-link tool to read the config file anyways, so it may as well also do the filtering.

I think the list of commands is still known statically because the set of AOT targets is known statically (from the -fsycl-targets option). For example, if the user compiles for two Intel GPU AOT targets with -fsycl-targets=intel_gpu_pvc,intel_gpu_dg1, then the compiler driver can invoke sycl-post-link like:

sycl-post-link [....] -o intel_gpu_pvc=<table1> -o intel_gpu_dg1=<table2>

In this case file table <table1> will contain the device images for PVC and <table2> will contain the device images for DG1. Each file table will then be processed, as normal, through llvm-for-each to convert LLVM bitcode to SPIR-V and then compiling the SPIR-V via ocloc.

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a section about testing to this design document. There is a danger that the config file will get out-of-sync with the actual device capabilities, so we need some testing to validate that it is not out-of-sync.

Can we have a unit test that generates C++ code from the TD file and then compares the aspects listed in the TD file with the aspects reported via device::has? We would need to run this unit test on each device that is listed in the TD file.

@maarquitos14
Copy link
Contributor Author

We should add a section about testing to this design document. There is a danger that the config file will get out-of-sync with the actual device capabilities, so we need some testing to validate that it is not out-of-sync.

Can we have a unit test that generates C++ code from the TD file and then compares the aspects listed in the TD file with the aspects reported via device::has? We would need to run this unit test on each device that is listed in the TD file.

I have been thinking about this. I guess we could have a simple Python script that could do what you suggest. @AlexeySachkov what do you think?

@maarquitos14 maarquitos14 marked this pull request as ready for review June 5, 2023 13:56
@mdtoguchi
Copy link
Contributor

@maarquitos14, there is some discussion in #8658 (comment) around using an ID version number for specific device handling. Could that be integrated into the config file?

@maarquitos14
Copy link
Contributor Author

maarquitos14 commented Jun 8, 2023

@maarquitos14, there is some discussion in #8658 (comment) around using an ID version number for specific device handling. Could that be integrated into the config file?

@mdtoguchi From what I understand, there is an ID per target/device, so I think it would be enough if we added a new (int?) field to the TargetInfo class to represent the GMDID.

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default config part looks good to me, but it is unclear how exactly the mechanism for extending that config will be implemented

bits<1> maySupportOtherAspects = otherAspects;
list<int> subGroupSizes = listSubGroupSizes;
string aotToolchain = toolchain;
string aotToolchain-options = option;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this field be list<string> to make sure that we can represent several options with it?

Copy link
Contributor Author

@maarquitos14 maarquitos14 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me how we are going to be using this value, but if it's just something that we will simply pass on to the aot command, couldn't we simply add as many options as required in a single string? Like string aotToolchainOptions ="-option1 val1, -option2 val2". Is there any scenario where we would need different strings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I don't think we really want to parse AOT toolchain options and can store them as a single string. In that case, however, .td file should contain something like -device skl instead of just skl.

Additionally, we need to set dependencies adequately so that this command is
run before any of the consumers need it.

### Changes to the DPC++ Frontend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is a bit unclear: I understand that clang driver may read the option, open a file and modify the map we have with default config, but what about other tools, like sycl-post-link? Will that option be propagated there, so the tool does the same handling?

Will we copy-paste the code opening a file and modifying default config, or will we put into helper header (one which contains TargetInfo struct definition)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super familiar with how different processes are invoked from the driver, so I was assuming here that options are propagated to all the processes. Thus, yes, my original idea was having the different processes to do the same handling. I think handling the map is easier, while modifying files is more error-prone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super familiar with how different processes are invoked from the driver, so I was assuming here that options are propagated to all the processes.

-### would help here. It prints commands which compiler driver would execute, without actually invoking them. Just add this option to any compilation command (even the simplest one), to explore the flow.

Generally, driver options are not propagated down to tools automatically and each option is handled on case by case bases. Not all options are applicable for all tools, not all options have 1:1 equivalents, etc. A few examples here: -fsycl-device-code-split is only needed for sycl-post-link and it looks like -split there. -g is being transformed into several options like -dwarf-version=5 -debug-info-kind=....

Thus, yes, my original idea was having the different processes to do the same handling. I think handling the map is easier, while modifying files is more error-prone.

Then we should document that every tool which is supposed to work with the config, should be extended with this new option and it should be propagated from the driver to a tool.

may_support_other_aspects: true/false
sub-group-sizes: [1, 2, 4, 8]
aot-toolchain: ocloc
aot-toolchain-ocloc-device: skl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this key should contain ocloc in its name, because it kind of obvious that this is ocloc option based on aot-toolchain value.

We also probably want to be able to pass several different flags in here for some targets and therefore need to come up with YAML syntax which will allow us to do so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, during prototype implementation I realized this name is not great. I think aot-toolchain-options works best.

information.

## Testing
There is a danger that the device configuration file will get out-of-sync with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test already for any_device_has, which goes over each item in aspects.def file and tries to instantiate any_device_has with that enumerator.

We could do the same for default device config. if_architecture_is could be used to detect which exact device we are running on and .inc file with default config can be included into SYCL unit-tests as well (perhaps we may want to select a different location for it, so it is clear that this test requires some compiler internals to be present), i.e. that is not exactly a runtime test.

However, I still don't think that such approach will allow us to be sure that nothing went out-of-sync, because it requires us to run the test on every different target we have in our config file. We most likely won't have all of them available to us at any time even for manual run of the test, not to say about automated launch of it.

Not that we shouldn't run this test in our regular testing, but just highlighting that it won't automatically give us 100% coverage over all known targets.

An idea for an extra test: we need a mechanism to compare list of aspects known to SYCL RT (aspects.def) with list of aspects available in .td file with default config description. This way, if someone adds a new aspect, they won't forget to register it in a .td file and hopefully, will not forget to add it to descriptions of known targets in that .td file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea for an extra test: we need a mechanism to compare list of aspects known to SYCL RT (aspects.def) with list of aspects available in .td file with default config description. This way, if someone adds a new aspect, they won't forget to register it in a .td file and hopefully, will not forget to add it to descriptions of known targets in that .td file.

This would be just to make sure that new aspects added to aspects.def are also added to the .td file list of aspects, if I understand correctly, but it doesn't really check whether a device is returning the correct set of aspects. Is my understanding correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea for an extra test: we need a mechanism to compare list of aspects known to SYCL RT (aspects.def) with list of aspects available in .td file with default config description. This way, if someone adds a new aspect, they won't forget to register it in a .td file and hopefully, will not forget to add it to descriptions of known targets in that .td file.

This would be just to make sure that new aspects added to aspects.def are also added to the .td file list of aspects, if I understand correctly, but it doesn't really check whether a device is returning the correct set of aspects. Is my understanding correct?

Right, that is a separate test, which is intended to at least highlight that default config maybe outdated, because it is not even aware of that new aspect. It is not 100% bulletproof, because it doesn't check that you actually used the aspect in description of any target. However, such test doesn't require any actual HW to be present on a machine and therefore could also be useful during development when new aspects are added.

@gmlueck
Copy link
Contributor

gmlueck commented Jun 13, 2023

From what I understand, there is an ID per target/device, so I think it would be enough if we added a new (int?) field to the TargetInfo class to represent the GMDID.

Yes, the GMDID is just an integer. It's often displayed as a triple like "12.60.7" (PVC) but stored as a single hex number "0x030f0007".

The GMDID is specific to Intel GPUs, so we'll need to create some other unique integer identifier for other device types.

The goal is to have some unique identifier for each AOT device image which specifies its corresponding device. I was thinking this could be a pair of numbers (device category, device id). We'd probably have an enumeration of device categories (Intel GPU, Nvidia GPU, AMD GPU, etc.) Each category would decide the meaning of "device id". For Intel GPUs, the "device id" will be the GMDID.

@maarquitos14
Copy link
Contributor Author

From what I understand, there is an ID per target/device, so I think it would be enough if we added a new (int?) field to the TargetInfo class to represent the GMDID.

Yes, the GMDID is just an integer. It's often displayed as a triple like "12.60.7" (PVC) but stored as a single hex number "0x030f0007".

The GMDID is specific to Intel GPUs, so we'll need to create some other unique integer identifier for other device types.

The goal is to have some unique identifier for each AOT device image which specifies its corresponding device. I was thinking this could be a pair of numbers (device category, device id). We'd probably have an enumeration of device categories (Intel GPU, Nvidia GPU, AMD GPU, etc.) Each category would decide the meaning of "device id". For Intel GPUs, the "device id" will be the GMDID.

Okay, so that seems doable within the device config file.

@maarquitos14 maarquitos14 requested a review from a team as a code owner June 15, 2023 10:48
@maarquitos14
Copy link
Contributor Author

Friendly ping @asudarsa @mdtoguchi.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Let's do it! 🚀

Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
@maarquitos14 maarquitos14 requested a review from a team as a code owner June 19, 2023 15:08
@AlexeySachkov AlexeySachkov merged commit 3f4b778 into intel:sycl Jun 20, 2023
steffenlarsen pushed a commit that referenced this pull request Jun 29, 2023
Prototype implementation for default Device Config File designed in
#9371. Part of the testing is not present yet, nor is the ability to
extend the default config file.

---------

Signed-off-by: Maronas, Marcos <marcos.maronas@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants