Support for AMD Family 17h Processors #16

Closed
sarnex opened this Issue Mar 10, 2017 · 48 comments

Comments

Projects
None yet

sarnex commented Mar 10, 2017

Hi,

The new AMD Ryzen chips are Family 17h, and are unsupported by lm-sensors.

Please add support for them, and let me know if you need any information, as I have one.

Thanks,
Sarnex

xpander69 commented Mar 11, 2017

Any Updates?

sjug commented Mar 12, 2017

Is this tool actively maintained?

Owner

groeck commented Mar 12, 2017

Not paid, not 24/7.

This was always supposed to be a shadow repository. Unfortunately the main repository is gone.

sjug commented Mar 12, 2017

@groeck How much work is it to add sensor compatibility with new processors and motherboards? Ryzen is a pretty big deal, maybe you can crowdfund something if that would help? What can we do to help short of learning C?

Owner

groeck commented Mar 12, 2017

Money isn't an issue. Time is.
Question though is what you are asking for. Support for the processor with sensors-detect ? That would be a couple of lines of python code, but doesn't really mean much. Support for the processors with the "sensors" command ? That has nothing to do with this repository; it would have to be added with a hardware monitoring driver, probably by extending the existing k10temp and fam15h_power drivers. That might require a few lines of C code if the functionality is the same as with family 15/16 chips, or it might require new drivers. Patches for all those changes are welcome.

thanks for explaining.

Owner

groeck commented Mar 22, 2017

Datasheet for Family 17h CPUs is not published by AMD. Adding temperature sensor support to the kernel driver will require additional technical information which is not currently available. Specifically, it is not known in which PCI device the temperature sensor resides, and if the registers are identical to earlier chips.

sarnex commented Apr 21, 2017

https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf

Does that datasheet contain anything useful? Thanks

Owner

groeck commented Apr 21, 2017

@sarnex: Unfortunately not. The required information is traditionally in the "BIOS and Kernel
Developer’s Guide" which has not been published.

BeerSerc commented Apr 21, 2017

Please ignore my comment, I had a cached version opened which didn't have the latest comments, my bad!

Owner

groeck commented May 7, 2017

@sarnex @BeerSerc : Some additional information: I was told that the document needed is in fact the PPR. Problem is that there are two versions of this document, one that is public and one that is available only under NDA. The version available under NDA supposedly includes the required information.

sjug commented May 7, 2017

Great news, how do we sign the NDA and get the document?

sarnex commented May 8, 2017

Owner

groeck commented May 8, 2017

@sarnex: Yes, that is exactly the problem.

@Skaronator Skaronator referenced this issue in OpenMediaVault-Plugin-Developers/openmediavault-sensors Jun 26, 2017

Open

No graphs appear in OMV sensors, only a sad graph icon. #7

tomreyn commented Jun 30, 2017

Related:

At the time of writing, the latest Processor Programming Reference (PPR) for AMD Family 17h Models 00h-0Fh Processors (publication #54945) is revision 1.14, released 5/3/2017.

And it seems the information buyers of this CPU require to reliably make use of the product is still missing.

Anybody here contacted AMD to ask for such info?

johnbridgman commented Jul 29, 2017

Is lack of information still an issue ? If so I can ask around internally, either to get it released publicly or under NDA with an exemption allowing use of the knowledge in open source driver development (we had to do that at least once for sensitive areas of the GPU).

I am on the GPU side of AMD so will probably have to ask a few dumb questions re: exactly what we are looking for... what I'm a bit fuzzy on is "on die" temperature logic vs "on motherboard" logic which I believe are what are being discussed in the following thread:

https://linuxconfig.org/monitor-amd-ryzen-temperatures-in-linux-with-latest-kernel-modules

The reason for my confusion is that from what I remember all of the temperature measurement for our CPUs on Linux has been via motherboard logic (which I thought was hooked up to one or more sensors on the CPU) rather than on-die logic, but that is only what I have picked up from casual browsing while setting up my own systems at home.

If the "motherboard sensors" are not reporting temperature information from the CPU chip then what temperatures are they reporting ?

Last dumb question - what is the relationship between the the modules (eg it87) in Guenter's repo and the corresponding copies in upstream kernel drivers/hwmon ? Upstream seems to be a couple of months older at first glance - guessing Guenter's repo is the development tree and changes there eventually go upstream ?

Thanks,
John

Owner

groeck commented Aug 3, 2017

@johnbridgman: Sure, if you can get an exception, that would help. Which information is needed: The PCI ID and location of the REPORTED_TEMP_CTRL_OFFSET register (or whatever it is called on family 17h; the name is from family 15h model 0x60 and 0x70) as well as the location of the index register to read it. The index register on family 15h is 0xb8, the offset is 0xd8200ca4. Also, it would help to know if there is a means to read the temperature offset from the CPU (20 degrees C for 1700X and 1800X), or if it is necessary to calculate the offset from the CPU type.

Re drivers here - yes, those are development versions, and sometimes experimental. Normally I keep drivers in sync with upstream, but the it87 driver has deviated so much that I'll need several weeks to bring the drivers in sync. Unfortunately I don't have that time right now.
Since someone is going to ask: No, I can not just copy the drivers over, even though I am the upstream hwmon maintainer. The rule of "one logical change per patch" also applies to maintainers.

echo $((0xsetpci -s 00:0.0 60.l=0x59800 && setpci -s 00:0.0 64.l/2097152*5/4))

Owner

groeck commented Sep 5, 2017

@rozhuk-im: That helped, thanks. Patch for the k10temp driver submitted upstream. Too late for the v4.14 kernel, but it will be available in v4.15.

@groeck groeck closed this Sep 5, 2017

cemeyer commented Sep 5, 2017

@groeck Re: https://lkml.org/lkml/2017/9/4/503 , FYI At least some (1950X and I think 1920X) Threadripper CPU models have a 27°C offset too, if you want to try to correct a few more models automatically.

Owner

groeck commented Sep 5, 2017

cemeyer commented Sep 5, 2017

My system isn't running Linux, but FreeBSD reports "AMD Ryzen Threadripper 1950X 16-Core Processor" in the early boot messages. x86info also reports: "Processor name string (BIOS programmed): AMD Ryzen Threadripper 1950X 16-Core Processor".

For info the Ryzen 5 1600 model name string as reported by /proc/cpuinfo under Linux is AMD Ryzen 5 1600 Six-Core Processor.

@Robyer Robyer referenced this issue in openhardwaremonitor/openhardwaremonitor Sep 5, 2017

Open

AMD Ryzen Support for reading CPU voltage and Temperature #957

Owner

groeck commented Sep 5, 2017

Ryzen 5 should not have temperature offsets.

Owner

groeck commented Sep 5, 2017

cemeyer commented Sep 5, 2017

Yeah, it's awful. I don't know what AMD is thinking. In FreeBSD we just punt on the issue and have the user configure an offset if they want to.

ar1111 commented Sep 11, 2017

@groeck Here's what I see in /proc/cpuinfo for the Threadripper 1950x:

vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD Ryzen Threadripper 1950X 16-Core Processor

Goddard commented Sep 22, 2017

Here is output of cat /proc/cpuinfo

sudo cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD Ryzen Threadripper 1950X 16-Core Processor
stepping : 1
microcode : 0x8001129
cpu MHz : 2200.000
cache size : 512 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 16
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
bugs : fxsave_leak sysret_ss_attrs null_seg
bogomips : 6787.11
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

processor : 1
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD Ryzen Threadripper 1950X 16-Core Processor
stepping : 1
microcode : 0x8001129
cpu MHz : 2200.000
cache size : 512 KB
physical id : 0
siblings : 32
core id : 1
cpu cores : 16
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
bugs : fxsave_leak sysret_ss_attrs null_seg
bogomips : 6787.11
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

and repeats 32 times of course

Is it currently possible to use sensors to read 1950X cpu temps? I've tried rebasing up to here onto Linux v4.13.5, and am unable to see cpu temps. Do I need to do more than just patch my kernel?

Owner

groeck commented Oct 14, 2017

linux-next has support for it. For convenience I created a repository named 'k10temp' which supports it.

I had a bunch of issues with v4.14rc4 that make me fairly sure linux-next won't work either, but I'll give it a shot. As for the k10temp repo, I believe that's no better than rebasing the commits I rebased, right?

Owner

groeck commented Oct 14, 2017

Hard for me to say; I don't know what you mean with "rebase". Do you mean you cherry-picked the patches on top of v4.13.5 ? That is pretty much what I did as well, though I only have ryzen systems, not threadripper. Threadripper does have pci device 1022:1463 (two of them, actually), so the driver should work. Questions then would be if dmesg shows anything, if the driver is loaded, and if "lspci -vv -d 1022:1463" shows anything useful.

The driver does seem to work for those two pci devices. But is that representative of the CPU temp?

cemeyer commented Oct 15, 2017

Yes? It's the documented temperature sensor for the CPU.

OH geez that's embarrassing. Sorry, I'm really out of my element here. I thought there'd be one per core, like on all my other machines.

e-dard commented Oct 19, 2017

@groeck not sure if this is useful or not since there are already comments for the 1950X above, but here is /proc/cpuinfo for the 1920X. It's as you might expect:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 23
model		: 1
model name	: AMD Ryzen Threadripper 1920X 12-Core Processor
Owner

groeck commented Oct 19, 2017

Okay, so I'm a noob to the whole Ryzen shebang, after all of the aforementioned updates would sensors return for Ryzen 3 1200 on latest Ubuntu out of box? Or would I have to mess with it?

Owner

groeck commented Nov 3, 2017

The changes are not yet upstream, so the latest Ubuntu won't be sufficient. Upstream kernel 4.15 will support Ryzen (and Threadripper) temperature sensors.

Goddard commented Nov 3, 2017

Some rumblings 18.04 will have it so I assume through HWE people on older version like 16.04 might get it as well?
https://www.phoronix.com/scan.php?page=news_item&px=Ubuntu-18.04-LTS-Linux-4.15

tomreyn commented Nov 3, 2017

Maybe, but this issue tracker of a software which also happens to be available on your favorite Linux distribution is definitely not the right place to discuss whether a certain kernel version will be available in a certain release of this (your) favorite Linux distributions release.

Owner

groeck commented Nov 4, 2017

I am not involved in Canonical's decisions about Linux kernel releases, sorry.

Goddard commented Nov 4, 2017

It would be cool if we had a way to private message people on GitHub for instances like this. Sorry, totally off-topic I know.

whompy commented Nov 13, 2017

I just noticed a bug in the present code:

  • { 0x17, "AMD Ryzen 7 1600X", 20000 },
    1600X is a Ryzen 5 model as shown in cpuinfo:
    AMD Ryzen 5 1600X Six-Core Processor

Not sure if there is a better place to report bugs on this, so I threw it here (Sorry!)

Owner

groeck commented Nov 13, 2017

Thanks for the k10temp report. It might be better to report k10temp issues in the k10temp repository or upstream.

For what its worth HWInfo64 can read the core temp, and on my 1950x (Aorus gaming 7) the offset does indeed appear to be 27 degree between Tdie and Tcng or whatever the short hand for "reported temps" is...

Now i am not good enough to find logfiles or whatever but if you all can explain the steps to me i am more then willing to provide whatever data i can since i love me some OHWM

Owner

groeck commented Dec 16, 2017

Not sure what the (remaining) problem is. sensors-detect from ToT here supports reporting family 17h sensors, and k10temp in the ToT kernel does as well. The problem reported by whompy@ has been fixed as well. If there is still a problem, please open a separate issue and provide details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment