Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlnx-sfp-plugin] enhancement to support transceiver sensor monitoring #1839

Merged
merged 4 commits into from
Aug 2, 2018
Merged

[mlnx-sfp-plugin] enhancement to support transceiver sensor monitoring #1839

merged 4 commits into from
Aug 2, 2018

Conversation

keboliu
Copy link
Collaborator

@keboliu keboliu commented Jul 3, 2018

- What I did
enhancement to support tranceiver sensor monitoring

- How I did it
* Modify the eeprom folder to make it accessably from pmon container
* implement the get_transceiver_change_event API

- How to verify it
run test on mlnx platform.

- Description for the changelog
file change list
modified: device/mellanox/x86_64-mlnx_lssn2700-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2100-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2410-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
modified: device/mellanox/x86_64-mlnx_msn2740-r0/plugins/sfputil.py

- A picture of a cute animal (not mandatory but encouraged)

* Modify the eeprom folder to make it accessably from pmon container
* implement the get_transceiver_change_event API

file change list

	modified:   device/mellanox/x86_64-mlnx_lssn2700-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2100-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2410-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2740-r0/plugins/sfputil.py

signed-off-by Liu Kebo kebol@mellanox.com
@@ -64,6 +92,22 @@ def get_presence(self, port_num):
return True

return False

def get_presence_status_file_path(self, port_num):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code added in this PR is duplicated among all the plugins. Is there a reason this functionality can't live in sfputilbase.py? Will it not be shared by other platforms?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the file path is specific for each platform, it's even various among platforms from one vendor, so this has to be specific for each platform.


INSIDE_DOCKER = determine_in_docker()

if INSIDE_DOCKER:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only connect to the Redis DB if running inside a Docker container? If this is to determine whether or not the plugin is being used by xcvrd, it's not an intuitive check. I think it would be better to create a function to connect to the State DB and call that function in get_transceiver_change_event().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sfputil plugin is also called by 'show interface transceiver' CLI on the host, but the swsscommon is not available on the host, to make it work both on the host and the docker, need to conditionally import swsscommon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised.


try:
path = "/sys/class/i2c-adapter/i2c-2/2-0048/hwmon/hwmon7/qsfp%d_status" % (port_num+1)
#path = "/bsp/qsfp/qsfp%d_status" % (port_num+1)
Copy link
Collaborator

@qiluo-msft qiluo-msft Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#path = [](start = 12, length = 7)

remove commented code. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@@ -149,3 +193,21 @@ def reset(self, port_num):
return False

return False

def get_transceiver_change_event(self, timeout=0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_transceiver_change_event [](start = 8, length = 28)

This is vendor independent code. Let's move it out of plugin.
Who is using it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft this function is intended to let each vendor have their own implementation, because each vendor may have different ways to get the SFP plug in/out notification. Here is the implementation on mlnx platform.

Copy link
Collaborator

@qiluo-msft qiluo-msft Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue https://github.com/Azure/SONiC/blob/gh-pages/doc/transceiver-monitor-hld.md is not a good design, at least for Mellanox platform.

Mellanox syncd docker could directly get SFP notification and write to STATE_DB. There is no benefit to write to redis once, blocking waiting on it in another container (pmon) and write redis again. The disadvantages are new dependencies (like in this file), fragile system, and hard to debug.

If this is true for Mellanox, how could it benefit other vendors?


In reply to: 200867307 [](ancestors = 200867307)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with you if we only consider mlnx platform, and at the very beginning, my design is to post the transceiver info to DB from syncd since mlnx can get the sfp change event from there. But after discussing with Guohan and community, we should have a common solution for all the vendors, design as common sfp change event API and left it to be implemented by each vendor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally understood. This is a perfect to time to revisit the design.


In reply to: 201230542 [](ancestors = 201230542)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking in -- are we or are we not revisiting the design? I believe the current design is the common solution for all vendors that @keboliu mentioned above. Therefore, I don't believe there is a need to revisit the design.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jleveque Yes, this the common solution for all vendors that have discussed.
@qiluo-msft would like to get your view on this.
more background information about this, for other vendors, may possible to get this event from sysfs, which will be very easy for them to implement this common API since sysfs is accessible from pmon. you may check with Hui who have some investigation on other platforms.

Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments.


# Subscribe to state table for SFP change notifications
sel = swsscommon.Select()
sel_tbl = swsscommon.NotificationConsumer(state_db, 'TRANSCEIVER_NOTIFY')
Copy link
Collaborator

@qiluo-msft qiluo-msft Jul 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRANSCEIVER_NOTIFY [](start = 57, length = 18)

Just curious which component provide the notification? As https://github.com/Azure/SONiC/blob/gh-pages/doc/OIDsforSensorandTransciver.MD, why let the provider write STATE DB directly? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft on top of the previous comments, on mlnx platform, this notification is exposed from another daemon inside syncd(##1841), and that daemon post the notification to the DB. For other vendors, they may have their implementation, for example, get this notification directly from sysfs.

@@ -64,6 +92,21 @@ def get_presence(self, port_num):
return True

return False

def get_presence_status_file_path(self, port_num):
Copy link
Collaborator

@qiluo-msft qiluo-msft Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_presence_status_file_path [](start = 8, length = 29)

Who is calling this function? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, not called anymore, call in the original solution. removed.

INSIDE_DOCKER = determine_in_docker()

if INSIDE_DOCKER:
from swsscommon import swsscommon
Copy link
Collaborator

@qiluo-msft qiluo-msft Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swsscommon [](start = 9, length = 10)

Changing behaviors based on library availability or docker environment is not a good idea. #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised. the swsscommon only imported in the new add function 'get_transceiver_change_event', since it will only be called from pmon container.

@keboliu
Copy link
Collaborator Author

keboliu commented Jul 16, 2018

the check on mellanox build failure seems not related to this PR, can we trigger the build check again w/o push a new commit?

+ wget https://apt.dockerproject.org/repo/pool/main/d/docker-engine/docker-engine_1.11.1-0~jessie_amd64.deb -qO /tmp/tmp.b93r6vOAU2
+ sudo mkdir -p ./fsroot/etc/systemd/system/docker.service.d/
+ sudo cp files/docker/docker.service.conf ./fsroot/etc/systemd/system/docker.service.d/
+ sudo LANG=C chroot ./fsroot useradd -G sudo,docker admin -c 'Default admin user,,,' -m -s /bin/bash
useradd: group 'docker' does not exist
+ rm -f /tmp/tmp.b93r6vOAU2
+ sudo umount ./fsroot/proc
+ true
[  FAIL LOG END  ] [ target/sonic-mellanox.bin ]
slave.mk:458: recipe for target 'target/sonic-mellanox.bin' failed
make: *** [target/sonic-mellanox.bin] Error 1
Makefile:81: recipe for target 'all' failed

@keboliu
Copy link
Collaborator Author

keboliu commented Jul 18, 2018

check fail due to Jenkins issue?

Build was aborted
ERROR: Step ‘Archive the artifacts’ failed: no workspace for mellanox/buildimage-mlnx-all-pr #2958
Adding one-line test results to commit status...

@qiluo-msft
Copy link
Collaborator

qiluo-msft commented Jul 18, 2018

Yes, the Jenkins server stuck on 7/17, and we restart the service. You could always add a new Github comment "Retest this please" to trigger the build check without committing code. #Closed

@andriymoroz-mlnx andriymoroz-mlnx changed the title [mlnx-sfp-plugin] enhancement to support tranceiver sensor monitoring [mlnx-sfp-plugin] enhancement to support transceiver sensor monitoring Jul 18, 2018
@lguohan lguohan dismissed qiluo-msft’s stale review August 2, 2018 06:17

let's merge and test and kebo will do new iterations

@lguohan
Copy link
Collaborator

lguohan commented Aug 2, 2018

@jleveque , any pending issue on your side? if not, can you approve and let's test.

@lguohan lguohan merged commit a215bcd into sonic-net:master Aug 2, 2018
@keboliu keboliu deleted the sfputil-plugin-enhance branch August 22, 2018 10:30
lguohan pushed a commit that referenced this pull request Sep 6, 2018
#1839)

* [mlnx-sfpplugin] enhancement to support tranceiver sensor monitoring

* Modify the eeprom folder to make it accessably from pmon container
* implement the get_transceiver_change_event API

file change list

	modified:   device/mellanox/x86_64-mlnx_lssn2700-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2100-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2410-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py
	modified:   device/mellanox/x86_64-mlnx_msn2740-r0/plugins/sfputil.py

signed-off-by Liu Kebo kebol@mellanox.com

* remove commented code

* revise the get_transceiver_change_event implementation and remove unused function

* remove blank
vdahiya12 added a commit to vdahiya12/sonic-buildimage that referenced this pull request Sep 29, 2021
0b5f90b (HEAD -> 202012, origin/202012) [show techsupport] fix bash errors in generate_dump script (sonic-net#1844)
388c50c [202012][warmboot] Add new preboot health check: verify db integrity (sonic-net#1839)
d73dc98 [config] support for configuring muxcable to standby mode of operation (sonic-net#1837)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
lguohan pushed a commit that referenced this pull request Sep 30, 2021
0b5f90b (HEAD -> 202012, origin/202012) [show techsupport] fix bash errors in generate_dump script (#1844)
388c50c [202012][warmboot] Add new preboot health check: verify db integrity (#1839)
d73dc98 [config] support for configuring muxcable to standby mode of operation (#1837)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants