Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong temp reading on MT7915_phy0 #729

Open
Sandokan71 opened this issue Jan 12, 2023 · 25 comments
Open

Wrong temp reading on MT7915_phy0 #729

Sandokan71 opened this issue Jan 12, 2023 · 25 comments

Comments

@Sandokan71
Copy link

I did some tests for temp reading. I get the following readings from internal sensors in standby (no devices connected on WiFi) with a room temp of 20C:

phy0 2.4Ghz -> 68C
phy1 5Ghz -> 43C
On the SoC the reading by internal sensor is 45C.

The temperatures detected with a thermal scanner (my bet was 3-4C low) are:
on 2.4Ghz -> 37.1C
on 5Ghz -> 40.5C
On the Soc I get 45.5C.

It seems to me that the temp reading by the sensor on 2.4Ghz chip is not so correct.

@dangowrt
Copy link
Member

As this issue was reported first in BananaPi forum to occur on BPi-R3, let me add some details:
This is MT7986A with MT7975PN and MT7975N front-ends. The wrong temperature readings correspond to the MT7975N chip in charge of 2.4 GHz.

@Sandokan71
Copy link
Author

Adding an information about test I made.
Graphics of first 2H from a cold start in attach (measurament are made with a unique heatsink on both chip) shows that on 2.4Ghz front-end MT7975N (Phy0) there is an offset of about 27 Celsius above what expected. This assuming that both 2.4 and 5Ghz chips have a similar behaviour.

2023-01-17

@frank-w
Copy link

frank-w commented Jan 27, 2023

can confirm the difference

root@bpi-r3:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
66000
45000

measured the chips with infrared thermometer

2g4: 47°C
5G: 44°C

@Sandokan71
Copy link
Author

yes, and observing the graph it is impossible that few seconds from the device start the 2.4Ghz chip is at 55C and the 5Ghz chip at 27C.

@ryderlee1110
Copy link
Contributor

Here is the output of my MT7986 reference board. Looks normal.

root@OpenWrt:/# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
44000
48000

@Sandokan71
Copy link
Author

After many tests and measurements I confirm bad temp reading on sensors of the 2.4ghz chip on my board. Maybe a problem on the chip but only on temp reading? The chip real temp seems normal and it works regular.

@ryderlee1110
Copy link
Contributor

BPI R3?

@frank-w
Copy link

frank-w commented Feb 2, 2023

@ryderlee1110 does your ref-board use MT7975N too for 2g4?

@ryderlee1110
Copy link
Contributor

MT7976 for 2/6g

@frank-w
Copy link

frank-w commented Feb 2, 2023

So we maybe need different offset or calculation for this chip

https://github.com/openwrt/mt76/blob/master/mt7915/init.c#L55

https://github.com/openwrt/mt76/blob/master/mt7915/mcu.c#L3108

When looking at the graph above,offset/command is right,but value itself seems not millicelsius or need some other calibration data?

@Sandokan71
Copy link
Author

Here a graph comparing MT7975N and MT7975N on three days with and without fan cooling to explore more temps range. The calculations seems to me correct. Probably it is only an offset issue.

2023-02-03

@frank-w
Copy link

frank-w commented Feb 7, 2023

i guess more the eeprom (which maybe sets the temp value offset) is wrong...

i see function mt7915_eeprom_name in mt7915/eeprom.c which selects the eeprom, but this function seems not to be called on my r3 as i do not see my printks i added there...

i try to further debug, but this function seems to be called only if there is no eeprom...stop wait...we have added eeprom in dts...both in my repo and openwrt...maybe this is the wrong for out frontend-chips

@frank-w
Copy link

frank-w commented Feb 7, 2023

same output with disabled eeprom-data in dts

root@bpi-r3:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
43000
23000

my debug shows now that MT7975_DUAL_ADIE (MT7986_EEPROM_MT7975_DUAL_DEFAULT) option is used after first eeprom-load (mt7915_eeprom_load) fails now in mt7915_eeprom_init with ret=-22, second one (mt7915_eeprom_load_default) returns 0

https://elixir.bootlin.com/linux/v6.2-rc6/source/drivers/net/wireless/mediatek/mt76/mt7915/eeprom.c#L60

@dangowrt
Copy link
Member

dangowrt commented Feb 7, 2023

Maybe this is a bug in the EEPROM data supplied by SinoVoip and we should actually just fix that...

@frank-w
Copy link

frank-w commented Feb 7, 2023

I loaded the eeprom which is available in linux-firmware git

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek

But yes,it can be wrong

@frank-w
Copy link

frank-w commented Feb 27, 2023

@ryderlee1110 any idea how to get further here?

@Sandokan71
Copy link
Author

The issue still there, any idea on how to solve?

@codingtony
Copy link

I'm confirming the issue on a Banana r3 with OpenWRT r22537-32f134fbdf. I used a thermometer gun and I get a reading of maximum 40C and the sensor reports 63C.

@Sandokan71
Copy link
Author

I purchased a second BPI-R3, and on this one the detected temperature is correct.
Something differs between the two boards.

@frank-w
Copy link

frank-w commented Apr 15, 2023

What is hardware revision and can you look on the frontend chip if this is still a mt7975?

@Sandokan71
Copy link
Author

Both have the same revision v1.1 and the same IC.

@dangowrt
Copy link
Member

dangowrt commented Apr 15, 2023

board assembly process...

I purchased a second BPI-R3, and on this one the detected temperature is correct.
Something differs between the two boards.

It could be that efuse inside the MT7975 ICs doesn't come with valid thermal calibration which should have been done by the board vendor...

@Sandokan71
Copy link
Author

I agree with you. It would be useful to know if it is possible to set properly the efuse.
After long time monitoring I can confirm that on my original board the 2.4Ghz have +27C offset.
It is not good to see temperatures of 60-75C with 20C ambient temp but since they are actually 33/48C I am not so worried about this.
However, I hope this will not result in strange behavior if temperatures rise further when ambient temp will rise to 30C and over. Like thermal protection engage or similar.

@frank-w
Copy link

frank-w commented Apr 15, 2023

Or at least detect the problematic firmware (or invalid calibration data) from driver to handle it there (maybe off-tree for affected boards to hold mainlinedriver clean for this)?

@skramstad
Copy link

Sorry to bump this issue again, but I have the opposite of what's posted earlier.
Rev 1.1

root@bpi:~# cat /sys/class/ieee80211/phy*/hwmon*/temp1_input
49000 <-- 2g
66000 <-- 5g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants