Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include support for Windows on Arm on BUILD.bazel along with proper Volterra detection #220

Merged
merged 2 commits into from
Mar 17, 2024

Conversation

everton1984
Copy link
Contributor

This MR includes support for building with Bazel on cpu arm64_windows, I also tried this on my Volterra Windows Dev Kit and noticed that the core string seems different from what the current source code defines. I don't know if this is because my hardware is a bit different or not.

I ran the tests with the following results

[==========] Running 132 tests from 28 test suites.
[----------] Global test environment set-up.
[----------] 1 test from PROCESSORS_COUNT
[ RUN      ] PROCESSORS_COUNT.non_zero
[       OK ] PROCESSORS_COUNT.non_zero (0 ms)
[----------] 1 test from PROCESSORS_COUNT (0 ms total)

[----------] 1 test from PROCESSORS
[ RUN      ] PROCESSORS.non_null
[       OK ] PROCESSORS.non_null (0 ms)
[----------] 1 test from PROCESSORS (0 ms total)

[----------] 13 tests from PROCESSOR
[ RUN      ] PROCESSOR.non_null
[       OK ] PROCESSOR.non_null (0 ms)
[ RUN      ] PROCESSOR.valid_smt_id
[       OK ] PROCESSOR.valid_smt_id (0 ms)
[ RUN      ] PROCESSOR.valid_core
[       OK ] PROCESSOR.valid_core (0 ms)
[ RUN      ] PROCESSOR.consistent_core
[       OK ] PROCESSOR.consistent_core (0 ms)
[ RUN      ] PROCESSOR.valid_cluster
[       OK ] PROCESSOR.valid_cluster (0 ms)
[ RUN      ] PROCESSOR.consistent_cluster
[       OK ] PROCESSOR.consistent_cluster (0 ms)
[ RUN      ] PROCESSOR.valid_package
[       OK ] PROCESSOR.valid_package (0 ms)
[ RUN      ] PROCESSOR.consistent_package
[       OK ] PROCESSOR.consistent_package (0 ms)
[ RUN      ] PROCESSOR.consistent_l1i
[       OK ] PROCESSOR.consistent_l1i (0 ms)
[ RUN      ] PROCESSOR.consistent_l1d
[       OK ] PROCESSOR.consistent_l1d (0 ms)
[ RUN      ] PROCESSOR.consistent_l2
[       OK ] PROCESSOR.consistent_l2 (0 ms)
[ RUN      ] PROCESSOR.consistent_l3
[       OK ] PROCESSOR.consistent_l3 (0 ms)
[ RUN      ] PROCESSOR.consistent_l4
[       OK ] PROCESSOR.consistent_l4 (0 ms)
[----------] 13 tests from PROCESSOR (7 ms total)

[----------] 1 test from CORES_COUNT
[ RUN      ] CORES_COUNT.within_bounds
[       OK ] CORES_COUNT.within_bounds (0 ms)
[----------] 1 test from CORES_COUNT (0 ms total)

[----------] 1 test from CORES
[ RUN      ] CORES.non_null
[       OK ] CORES.non_null (0 ms)
[----------] 1 test from CORES (0 ms total)

[----------] 10 tests from CORE
[ RUN      ] CORE.non_null
[       OK ] CORE.non_null (0 ms)
[ RUN      ] CORE.non_zero_processors
[       OK ] CORE.non_zero_processors (0 ms)
[ RUN      ] CORE.consistent_processors
[       OK ] CORE.consistent_processors (0 ms)
[ RUN      ] CORE.valid_core_id
[       OK ] CORE.valid_core_id (0 ms)
[ RUN      ] CORE.valid_cluster
[       OK ] CORE.valid_cluster (0 ms)
[ RUN      ] CORE.consistent_cluster
[       OK ] CORE.consistent_cluster (0 ms)
[ RUN      ] CORE.valid_package
[       OK ] CORE.valid_package (0 ms)
[ RUN      ] CORE.consistent_package
[       OK ] CORE.consistent_package (0 ms)
[ RUN      ] CORE.known_vendor
[       OK ] CORE.known_vendor (0 ms)
[ RUN      ] CORE.known_uarch
[       OK ] CORE.known_uarch (0 ms)
[----------] 10 tests from CORE (5 ms total)

[----------] 1 test from CLUSTERS_COUNT
[ RUN      ] CLUSTERS_COUNT.within_bounds
[       OK ] CLUSTERS_COUNT.within_bounds (0 ms)
[----------] 1 test from CLUSTERS_COUNT (0 ms total)

[----------] 1 test from CLUSTERS
[ RUN      ] CLUSTERS.non_null
[       OK ] CLUSTERS.non_null (0 ms)
[----------] 1 test from CLUSTERS (0 ms total)

[----------] 14 tests from CLUSTER
[ RUN      ] CLUSTER.non_null
[       OK ] CLUSTER.non_null (0 ms)
[ RUN      ] CLUSTER.non_zero_processors
[       OK ] CLUSTER.non_zero_processors (0 ms)
[ RUN      ] CLUSTER.valid_processors
[       OK ] CLUSTER.valid_processors (0 ms)
[ RUN      ] CLUSTER.consistent_processors
[       OK ] CLUSTER.consistent_processors (0 ms)
[ RUN      ] CLUSTER.non_zero_cores
[       OK ] CLUSTER.non_zero_cores (0 ms)
[ RUN      ] CLUSTER.valid_cores
[       OK ] CLUSTER.valid_cores (0 ms)
[ RUN      ] CLUSTER.consistent_cores
[       OK ] CLUSTER.consistent_cores (0 ms)
[ RUN      ] CLUSTER.valid_cluster_id
[       OK ] CLUSTER.valid_cluster_id (0 ms)
[ RUN      ] CLUSTER.valid_package
[       OK ] CLUSTER.valid_package (0 ms)
[ RUN      ] CLUSTER.consistent_package
[       OK ] CLUSTER.consistent_package (0 ms)
[ RUN      ] CLUSTER.consistent_vendor
[       OK ] CLUSTER.consistent_vendor (0 ms)
[ RUN      ] CLUSTER.consistent_uarch
[       OK ] CLUSTER.consistent_uarch (0 ms)
[ RUN      ] CLUSTER.consistent_midr
[       OK ] CLUSTER.consistent_midr (0 ms)
[ RUN      ] CLUSTER.consistent_frequency
[       OK ] CLUSTER.consistent_frequency (0 ms)
[----------] 14 tests from CLUSTER (7 ms total)

[----------] 1 test from PACKAGES_COUNT
[ RUN      ] PACKAGES_COUNT.within_bounds
[       OK ] PACKAGES_COUNT.within_bounds (0 ms)
[----------] 1 test from PACKAGES_COUNT (0 ms total)

[----------] 1 test from PACKAGES
[ RUN      ] PACKAGES.non_null
[       OK ] PACKAGES.non_null (0 ms)
[----------] 1 test from PACKAGES (0 ms total)

[----------] 10 tests from PACKAGE
[ RUN      ] PACKAGE.non_null
[       OK ] PACKAGE.non_null (0 ms)
[ RUN      ] PACKAGE.non_zero_processors
[       OK ] PACKAGE.non_zero_processors (0 ms)
[ RUN      ] PACKAGE.valid_processors
[       OK ] PACKAGE.valid_processors (0 ms)
[ RUN      ] PACKAGE.consistent_processors
[       OK ] PACKAGE.consistent_processors (0 ms)
[ RUN      ] PACKAGE.non_zero_cores
[       OK ] PACKAGE.non_zero_cores (0 ms)
[ RUN      ] PACKAGE.valid_cores
[       OK ] PACKAGE.valid_cores (0 ms)
[ RUN      ] PACKAGE.consistent_cores
[       OK ] PACKAGE.consistent_cores (0 ms)
[ RUN      ] PACKAGE.non_zero_clusters
[       OK ] PACKAGE.non_zero_clusters (0 ms)
[ RUN      ] PACKAGE.valid_clusters
[       OK ] PACKAGE.valid_clusters (0 ms)
[ RUN      ] PACKAGE.consistent_cluster
[       OK ] PACKAGE.consistent_cluster (0 ms)
[----------] 10 tests from PACKAGE (5 ms total)

[----------] 1 test from UARCHS_COUNT
[ RUN      ] UARCHS_COUNT.within_bounds
[       OK ] UARCHS_COUNT.within_bounds (0 ms)
[----------] 1 test from UARCHS_COUNT (0 ms total)

[----------] 1 test from UARCHS
[ RUN      ] UARCHS.non_null
[       OK ] UARCHS.non_null (0 ms)
[----------] 1 test from UARCHS (0 ms total)

[----------] 5 tests from UARCH
[ RUN      ] UARCH.non_null
[       OK ] UARCH.non_null (0 ms)
[ RUN      ] UARCH.non_zero_processors
[       OK ] UARCH.non_zero_processors (0 ms)
[ RUN      ] UARCH.valid_processors
[       OK ] UARCH.valid_processors (0 ms)
[ RUN      ] UARCH.non_zero_cores
[       OK ] UARCH.non_zero_cores (0 ms)
[ RUN      ] UARCH.valid_cores
[       OK ] UARCH.valid_cores (0 ms)
[----------] 5 tests from UARCH (2 ms total)

[----------] 1 test from L1I_CACHES_COUNT
[ RUN      ] L1I_CACHES_COUNT.within_bounds
[       OK ] L1I_CACHES_COUNT.within_bounds (0 ms)
[----------] 1 test from L1I_CACHES_COUNT (0 ms total)

[----------] 1 test from L1I_CACHES
[ RUN      ] L1I_CACHES.non_null
[       OK ] L1I_CACHES.non_null (0 ms)
[----------] 1 test from L1I_CACHES (0 ms total)

[----------] 13 tests from L1I_CACHE
[ RUN      ] L1I_CACHE.non_null
[       OK ] L1I_CACHE.non_null (0 ms)
[ RUN      ] L1I_CACHE.non_zero_size
[       OK ] L1I_CACHE.non_zero_size (0 ms)
[ RUN      ] L1I_CACHE.valid_size
[       OK ] L1I_CACHE.valid_size (0 ms)
[ RUN      ] L1I_CACHE.non_zero_associativity
[       OK ] L1I_CACHE.non_zero_associativity (0 ms)
[ RUN      ] L1I_CACHE.non_zero_partitions
[       OK ] L1I_CACHE.non_zero_partitions (0 ms)
[ RUN      ] L1I_CACHE.non_zero_line_size
[       OK ] L1I_CACHE.non_zero_line_size (0 ms)
[ RUN      ] L1I_CACHE.power_of_2_line_size
[       OK ] L1I_CACHE.power_of_2_line_size (0 ms)
[ RUN      ] L1I_CACHE.reasonable_line_size
[       OK ] L1I_CACHE.reasonable_line_size (0 ms)
[ RUN      ] L1I_CACHE.valid_flags
[       OK ] L1I_CACHE.valid_flags (0 ms)
[ RUN      ] L1I_CACHE.non_inclusive
[       OK ] L1I_CACHE.non_inclusive (0 ms)
[ RUN      ] L1I_CACHE.non_zero_processors
[       OK ] L1I_CACHE.non_zero_processors (0 ms)
[ RUN      ] L1I_CACHE.valid_processors
[       OK ] L1I_CACHE.valid_processors (0 ms)
[ RUN      ] L1I_CACHE.consistent_processors
[       OK ] L1I_CACHE.consistent_processors (0 ms)
[----------] 13 tests from L1I_CACHE (7 ms total)

[----------] 1 test from L1D_CACHES_COUNT
[ RUN      ] L1D_CACHES_COUNT.within_bounds
[       OK ] L1D_CACHES_COUNT.within_bounds (0 ms)
[----------] 1 test from L1D_CACHES_COUNT (0 ms total)

[----------] 1 test from L1D_CACHES
[ RUN      ] L1D_CACHES.non_null
[       OK ] L1D_CACHES.non_null (0 ms)
[----------] 1 test from L1D_CACHES (0 ms total)

[----------] 13 tests from L1D_CACHE
[ RUN      ] L1D_CACHE.non_null
[       OK ] L1D_CACHE.non_null (0 ms)
[ RUN      ] L1D_CACHE.non_zero_size
[       OK ] L1D_CACHE.non_zero_size (0 ms)
[ RUN      ] L1D_CACHE.valid_size
[       OK ] L1D_CACHE.valid_size (0 ms)
[ RUN      ] L1D_CACHE.non_zero_associativity
[       OK ] L1D_CACHE.non_zero_associativity (0 ms)
[ RUN      ] L1D_CACHE.non_zero_partitions
[       OK ] L1D_CACHE.non_zero_partitions (0 ms)
[ RUN      ] L1D_CACHE.non_zero_line_size
[       OK ] L1D_CACHE.non_zero_line_size (0 ms)
[ RUN      ] L1D_CACHE.power_of_2_line_size
[       OK ] L1D_CACHE.power_of_2_line_size (0 ms)
[ RUN      ] L1D_CACHE.reasonable_line_size
[       OK ] L1D_CACHE.reasonable_line_size (0 ms)
[ RUN      ] L1D_CACHE.valid_flags
[       OK ] L1D_CACHE.valid_flags (0 ms)
[ RUN      ] L1D_CACHE.non_inclusive
[       OK ] L1D_CACHE.non_inclusive (0 ms)
[ RUN      ] L1D_CACHE.non_zero_processors
[       OK ] L1D_CACHE.non_zero_processors (0 ms)
[ RUN      ] L1D_CACHE.valid_processors
[       OK ] L1D_CACHE.valid_processors (0 ms)
[ RUN      ] L1D_CACHE.consistent_processors
[       OK ] L1D_CACHE.consistent_processors (0 ms)
[----------] 13 tests from L1D_CACHE (7 ms total)

[----------] 1 test from L2_CACHES_COUNT
[ RUN      ] L2_CACHES_COUNT.within_bounds
[       OK ] L2_CACHES_COUNT.within_bounds (0 ms)
[----------] 1 test from L2_CACHES_COUNT (0 ms total)

[----------] 1 test from L2_CACHES
[ RUN      ] L2_CACHES.non_null
[       OK ] L2_CACHES.non_null (0 ms)
[----------] 1 test from L2_CACHES (0 ms total)

[----------] 12 tests from L2_CACHE
[ RUN      ] L2_CACHE.non_null
[       OK ] L2_CACHE.non_null (0 ms)
[ RUN      ] L2_CACHE.non_zero_size
[       OK ] L2_CACHE.non_zero_size (0 ms)
[ RUN      ] L2_CACHE.valid_size
[       OK ] L2_CACHE.valid_size (0 ms)
[ RUN      ] L2_CACHE.non_zero_associativity
[       OK ] L2_CACHE.non_zero_associativity (0 ms)
[ RUN      ] L2_CACHE.non_zero_partitions
[       OK ] L2_CACHE.non_zero_partitions (0 ms)
[ RUN      ] L2_CACHE.non_zero_line_size
[       OK ] L2_CACHE.non_zero_line_size (0 ms)
[ RUN      ] L2_CACHE.power_of_2_line_size
[       OK ] L2_CACHE.power_of_2_line_size (0 ms)
[ RUN      ] L2_CACHE.reasonable_line_size
[       OK ] L2_CACHE.reasonable_line_size (0 ms)
[ RUN      ] L2_CACHE.valid_flags
[       OK ] L2_CACHE.valid_flags (0 ms)
[ RUN      ] L2_CACHE.non_zero_processors
[       OK ] L2_CACHE.non_zero_processors (0 ms)
[ RUN      ] L2_CACHE.valid_processors
[       OK ] L2_CACHE.valid_processors (0 ms)
[ RUN      ] L2_CACHE.consistent_processors
[       OK ] L2_CACHE.consistent_processors (0 ms)
[----------] 12 tests from L2_CACHE (6 ms total)

[----------] 1 test from L3_CACHES_COUNT
[ RUN      ] L3_CACHES_COUNT.within_bounds
[       OK ] L3_CACHES_COUNT.within_bounds (0 ms)
[----------] 1 test from L3_CACHES_COUNT (0 ms total)

[----------] 12 tests from L3_CACHE
[ RUN      ] L3_CACHE.non_null
[       OK ] L3_CACHE.non_null (0 ms)
[ RUN      ] L3_CACHE.non_zero_size
[       OK ] L3_CACHE.non_zero_size (0 ms)
[ RUN      ] L3_CACHE.valid_size
[       OK ] L3_CACHE.valid_size (0 ms)
[ RUN      ] L3_CACHE.non_zero_associativity
[       OK ] L3_CACHE.non_zero_associativity (0 ms)
[ RUN      ] L3_CACHE.non_zero_partitions
[       OK ] L3_CACHE.non_zero_partitions (0 ms)
[ RUN      ] L3_CACHE.non_zero_line_size
[       OK ] L3_CACHE.non_zero_line_size (0 ms)
[ RUN      ] L3_CACHE.power_of_2_line_size
[       OK ] L3_CACHE.power_of_2_line_size (0 ms)
[ RUN      ] L3_CACHE.reasonable_line_size
[       OK ] L3_CACHE.reasonable_line_size (0 ms)
[ RUN      ] L3_CACHE.valid_flags
[       OK ] L3_CACHE.valid_flags (0 ms)
[ RUN      ] L3_CACHE.non_zero_processors
[       OK ] L3_CACHE.non_zero_processors (0 ms)
[ RUN      ] L3_CACHE.valid_processors
[       OK ] L3_CACHE.valid_processors (0 ms)
[ RUN      ] L3_CACHE.consistent_processors
[       OK ] L3_CACHE.consistent_processors (0 ms)
[----------] 12 tests from L3_CACHE (6 ms total)

[----------] 1 test from L4_CACHES_COUNT
[ RUN      ] L4_CACHES_COUNT.within_bounds
[       OK ] L4_CACHES_COUNT.within_bounds (0 ms)
[----------] 1 test from L4_CACHES_COUNT (0 ms total)

[----------] 12 tests from L4_CACHE
[ RUN      ] L4_CACHE.non_null
[       OK ] L4_CACHE.non_null (0 ms)
[ RUN      ] L4_CACHE.non_zero_size
[       OK ] L4_CACHE.non_zero_size (0 ms)
[ RUN      ] L4_CACHE.valid_size
[       OK ] L4_CACHE.valid_size (0 ms)
[ RUN      ] L4_CACHE.non_zero_associativity
[       OK ] L4_CACHE.non_zero_associativity (0 ms)
[ RUN      ] L4_CACHE.non_zero_partitions
[       OK ] L4_CACHE.non_zero_partitions (0 ms)
[ RUN      ] L4_CACHE.non_zero_line_size
[       OK ] L4_CACHE.non_zero_line_size (0 ms)
[ RUN      ] L4_CACHE.power_of_2_line_size
[       OK ] L4_CACHE.power_of_2_line_size (0 ms)
[ RUN      ] L4_CACHE.reasonable_line_size
[       OK ] L4_CACHE.reasonable_line_size (0 ms)
[ RUN      ] L4_CACHE.valid_flags
[       OK ] L4_CACHE.valid_flags (0 ms)
[ RUN      ] L4_CACHE.non_zero_processors
[       OK ] L4_CACHE.non_zero_processors (0 ms)
[ RUN      ] L4_CACHE.valid_processors
[       OK ] L4_CACHE.valid_processors (0 ms)
[ RUN      ] L4_CACHE.consistent_processors
[       OK ] L4_CACHE.consistent_processors (0 ms)
[----------] 12 tests from L4_CACHE (6 ms total)

[----------] Global test environment tear-down
[==========] 132 tests from 28 test suites ran. (93 ms total)
[  PASSED  ] 132 tests.

with cpu-info.exe returning

Packages:
        0: Snapdragon (TM) 8cx Gen 3
Microarchitectures:
        4x Cortex-A78
        4x Cortex-X1
Cores:
        0: 1 processor (0), ARM Cortex-A78
        1: 1 processor (1), ARM Cortex-A78
        2: 1 processor (2), ARM Cortex-A78
        3: 1 processor (3), ARM Cortex-A78
        4: 1 processor (4), ARM Cortex-X1
        5: 1 processor (5), ARM Cortex-X1
        6: 1 processor (6), ARM Cortex-X1
        7: 1 processor (7), ARM Cortex-X1
Logical processors:
        0
        1
        2
        3
        4
        5
        6
        7

and isa-info.exe returning

Instruction sets:
        ARM v8.1 atomics: yes
        ARM v8.1 SQRDMLxH: yes
        ARM v8.2 FP16 arithmetics: yes
        ARM v8.2 FHM: no
        ARM v8.2 BF16: no
        ARM v8.2 Int8 dot product: yes
        ARM v8.2 Int8 matrix multiplication: no
        ARM v8.3 JS conversion: no
        ARM v8.3 complex: no
SIMD extensions:
        ARM SVE: no
        ARM SVE 2: no
Cryptography extensions:
        AES: yes
        SHA1: yes
        SHA2: yes
        PMULL: yes
        CRC32: yes

@everton1984
Copy link
Contributor Author

@Maratyszcza Sorry to ping here, is there something I need to do to get this reviewed? Like creating a Issue or something?

@Maratyszcza
Copy link
Collaborator

I no longer maintain this project, defer to @malfet

Include detection for Volterra, Windows Dev Kit.
@everton1984
Copy link
Contributor Author

@malfet Any chance you could take a look at this please?

@malfet
Copy link
Contributor

malfet commented Mar 15, 2024

@everton1984 please fix clang-formatting, otherwise LGTM

@everton1984
Copy link
Contributor Author

@everton1984 please fix clang-formatting, otherwise LGTM

@malfet Done. Hopefully it worked. Thanks a lot for reviewing!

@malfet malfet merged commit 6543fec into pytorch:main Mar 17, 2024
11 checks passed
@ozanMSFT
Copy link
Contributor

@malfet , @everton1984 ;

First of all, thanks for the contribution.

However, we are considering that this PR is causing an issue for detection Ampere(R) Altra(R) Processor.

Thus, we created an issue with the details and a possible solution with PR.

malfet pushed a commit that referenced this pull request Apr 17, 2024
**Summary:**

Resolves #236

Also related to [PR 220](#220) change.

```
"Unknown chip model name 'Ampere(R) Altra(R) Processor'.
Please add new Windows on Arm SoC/chip support to arm/windows/init.c!"
```

---

**Previous error details:**

The error's reason was:

`woa_chip_name` (`windows-arm-init.h`)  enum had only 4 elements (stored in `woa_chip_name_last`)

```c
enum woa_chip_name {
	woa_chip_name_microsoft_sq_1 = 0,
	woa_chip_name_microsoft_sq_2 = 1,
	woa_chip_name_microsoft_sq_3 = 2,
	woa_chip_name_ampere_altra = 3,
	woa_chip_name_unknown = 4,
	woa_chip_name_last = woa_chip_name_unknown
};
```

However, `woa_chips[]`  (`init.c`) has a duplicated value for `woa_chip_name_microsoft_sq_3` due to different strings for same target after the [PR 220](#220)

> Strings are `Snapdragon (TM) 8cx Gen 3` and `Snapdragon Compute Platform`

And this was causing following `for loop` (`init.c`) is not checking for all elements in `woa_chips[]`.

```c
for (uint32_t i = 0; i < (uint32_t)woa_chip_name_last; i++) {
	size_t compare_length = wcsnlen(woa_chips[i].chip_name_string, CPUINFO_PACKAGE_NAME_MAX);
	int compare_result = wcsncmp(text_buffer, woa_chips[i].chip_name_string, compare_length);
	if (compare_result == 0) {
		chip_info = woa_chips + i;
		break;
	}
}
```

---

**Fix Details:**

We added `woa_chip_name_microsoft_sq_3_devkit` to maintain **one to one** relationship between `woa_chip_name` (`windows-arm-init.h`) and `woa_chips[]` (`init.c`).

Also, we especially specified indexes with `enums` to prevent future duplications and increase readability of the code and relationship.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants