Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMON enhancements for Chassis HLD #646

Merged
merged 18 commits into from
May 17, 2021

Conversation

mprabhu-nokia
Copy link
Contributor

This is a design document proposal for Chassis support and PMON enhancements for Chassis from the Nokia-SONiC team

@ghost
Copy link

ghost commented Jul 7, 2020

CLA assistant check
All CLA requirements met.

@mprabhu-nokia mprabhu-nokia changed the title Mprabhu chassis pmon PMON enhacements for Chassis HLD Jul 7, 2020
@mprabhu-nokia mprabhu-nokia changed the title PMON enhacements for Chassis HLD PMON enhancements for Chassis HLD Jul 7, 2020
@shyam77git
Copy link
Contributor

Comment-1)
In Sec 2, point 4: One instance of PMON will be run on per line card and also on a control card. PMON will communicate to each via redis instanace running on control card.
a) Is the redis instance DB on Control card's PMON a globalDB?
i.e. it will get sensors data (temp, voltage, current) from all LCs and store in RP's PMON's redisDB?
b) If so, when and in which release this support would be available?
c) CC (Control card aka supervisor) would be interested in displaying all cards (including LCs) in its show platform inventory. Would LC inventory info be sync'ed from LC to CC over this PMON and stored in CC's PMON redisDB?
d) How would CC come to know that LC is down/not-operational?
Should LC operational state be sync'ed as well ? This would help CC remove LC info if LC is shutdown/removed

Comment-2)
In Sec 2 point 6: I believe this gRPC IPC mechanism is in addition to current model (which is python plugin API supported by vendor). Can you please comment/confirm?
Depending upon the use-case platform/vendor can choose either of them? or is there any guideline to choose/prefer?
Can some use-case(s) be cited w.r.t. gRPC usage?

Comment-3)
Sec 2.2.1 mentions "In a line card, SONiC would start multiple instance of SONiC containers such as database, swss, syncd and bgp per forwarding ASIC. This determination is done using HWSKU and data file under device////asic.conf".
whereas, SONiC_multi_asic_hld sec 2.4.1 mentions: The file asic.conf is present in the directory /usr/share/sonic/device//
Can you clarify the right location/placement of "asic.conf" file?
IMO, it should be under .../// as asic.conf may vary from one hwsku to another under the same platform.

Comment-4)
Follow-on question of above:
asic.conf provides max NPUs possible on a given hwsku.
CC caters to cpu-less Fabric cards (FC) NPUs.
However, not all FCs are present always or FC may go-down/shutdown.
Does SONiC expects Namespaces (docker containers) of all FC NPUs to be spawned even if some FCs are not present in the Chassis? or platform bootstrap on CC to determine FCs presence and then spawn docker containers for NPUs on those FCs?

Comment-5)
Would Fault Management/Handling section come later?
use-cases like - LC/FC/CC shutdown/reload; chassis reload; LC/FC/CC/FT/PSU OIR

@shyam77git
Copy link
Contributor

shyam77git commented Aug 13, 2020

Comment-6) Section 2.2.3 (Chassis monitoring daemon)
platform-vendor to always determine and update remote cards (LC) state to CC's chassisd daemon?
or would there be any card state update based on LC REDIS-DB to CC REDIS-DB sync?

Comment-7) Section 2.2.4 Chassis Midplane Connectivity
#1 Exchanging monitoring information between line-cards and control-cards
Would SONiC layer (PMON/chassisd) be running some Heartbeats/keepalives between LC and CC?
or platform-vendor to run at its end and detect/report failure?
or both?

Comment-8) Does SONiC support or need/plan to support config shutdown of a card (CC/LC/FC)?
i.e. user/operator plans to config shutdown a card?
In that case, suggest showing this information in show platform details command

Comment-9) SONiC planning to support periodic punching of HW watchdog?
some process/daemon in SONiC to periodically punch HW watchdog (via platform support).
Failing to do would mean NOS's SW (SONiC) not in good state and action is required to collect kernel core and reset/reload the card.

mprabhu-nokia and others added 3 commits August 13, 2020 00:46
This commit includes restructure of the document to include requirements and also detailed workflow to describe chassis
specific callflows, etc.
@minionatwork
Copy link
Contributor

@shyam77git

Some inline comments.

Comment1:
a) As a design choice, we have 2 options for pmon to pmon communcation. I have updated the pull with more information. We could go with using redis DB on control card (aka supervisor) for communication between supervisor PMON and LC PMON.
b) We are planning to send the PR out soon. There are multiple PRs will be done for supporting VOQ chassis.
c) There are PMON 2.0 API's added to get status of linecard and its upto platform vendor to implement its logic.
d) same as c

Comment 2:
PMON 2.0 already supports user space drivers and kernel space drivers. Its platform vendor's choice for implementing the APIs, however our recommendation is to use user space and RPC like (gRPC or thrift).

Comment 3:
Good point. We will update asic.conf location and its details.

Comment 4:
As part of fabric design, this question will be covered. Right now, we assume all FC are available at the startup time. if there is change, asic.conf needs to be updated and rebooted. (one of the choice)

Comment 5:
We will add to this document.

Comment 6:
Thats good option to have it. DB to DB sync also possible. Our proposal is to directly subscribe to all LC PMON redis-db or all LC PMON directly writes to Supervisor redis-db.

Comment-7) Both to be implemented.

Comment-8) Thats right. SONiC will support operations per linecard such as shutdown, reboot wherever platform driver supports.

Comment-9) SONiC planning to support periodic punching of HW watchdog?
This is part of PMON 2.0 API abstraction. Control card driver will have HW watchdog and also LC driver will do HW watchdog in kernel.

@mprabhu-nokia
Copy link
Contributor Author

In doc/pmon/pmon-chassis-images/pmon-chassis-psu.png, in order to compute power budget, need to determine power consumption of remote cards (LCs).
how is power_per_LC determined? This info to come from LC.
Assuming this variable in diagram referring to power consumption at LC

This would be the maximum consumed power by the LC. So, would remain constant per LC-type.

@mprabhu-nokia
Copy link
Contributor Author

Following two CLIs are at both cards (Supervisor and LC):
show environment
show platform temperature

Can you please update/confirm on the following:

1. LC could cater to its local card info and local sensors only so that's what's expected out of these CLIs

That is correct. Each LC is running a separate SONiC instance. 'show platform temperature' will cater to the local card information.

2. These ones on the Supervisor (CC) is expected to cater to self (local) + FCs.
   a) So, what's display output format? Is it like following?
   
   sensor A - info
   ...
   ...
   sensor Z - info
   ```
     <FC0>
     sensor A - info 
      ...
      ...
      sensor Z - info
   ```

b) Would there be an extension of these CLIs to support location option ?
like show platform loc <> ; show environment loc <>
The supervisor or control-card will show its own temperature sensors. Since the management planes are independent for each of these cards, there has been no requirement to extend the CLI on control-cards.

c) Would these CLIs display LC sensors info (show environment) and LC card/platform info (show platform) to have holistic view on Supervisor (CC)?

@mprabhu-nokia
Copy link
Contributor Author

In doc/pmon/pmon-chassis-images/pmon-chassis-distributed-db.png, which option is being planned?
In option1, is supervisor going to pull data from LC at regular intervals?

In that case, I'd suggest option 2 as it would update Supervisor from LC about any runtime change happening at LCs' end, vs option 1, where Supervisor has to come and pull after certain interval. Mostly the sensors data / thermal conditions at the board varies only when there is change/impact to it w.r.t bandwidth, pkt size, ASICs usage etc. on that board. So, better remote end (LC) notify Supervisor of such changes to save cpu-cycles on supervisor.

Another thing, option 2 populating Global Redis-DB on Supervisor/CC.
a) This may help show environment (on CC) provide holistic view of all remote card sensors data.
b) with this option, would Thermal sensors data from remote LCs to go to ThermalCtl-d of CC's local DB or this GlobalDB (on CC)? I'd think GlobalDB would be preferred.
Can you please share your thoughts?

As concluded in our last HLD review, we will go with Option-2. We will introduce a "GLOBAL_STATE_DB" where all the LCs will push their information to.

@mprabhu-nokia
Copy link
Contributor Author

In doc/pmon/pmon-chassis-images/pmon-chassis-layout.png, is Device Manager a platform owned process? platform can formulate (design/impl.) it per the platform?
and it always interfaces with PMON's Monitoring process via IPC?
In that case, platform plugins' from SONiC layer towards platform is separate/independent path and is primarily for get/set purposes from PMON's psud, syseeprmd etc. towards platform?

That is correct- pmon is mainly used for monitoring. Device-manager is a Nokia specific place holder for platform code. A vendor could have user-space or kernel drivers or a mix of both. If sysfs cannot be used, any IPC could be used to get/set the information.

@mprabhu-nokia
Copy link
Contributor Author

May be am missing, but don't happen to see info on following three sub-infrastructure under Platform management:
FPD
LED
OBFL
Can you please share the link/pointer? or plan to add/discuss them?

Unfortunately, this is out of scope of this HLD at present. We have listed them in "Future Items" section for tracking.

@minionatwork
Copy link
Contributor

Went through the doc and these revised changes. Can you please update/confirm on following understanding?
ChassisD is a daemon and part of PMON container.
From platform stack standpoint, there would be total 2 DBs on every card (CC aka supervisor, LC).
One DB is referred as local REDIS-DB and another one as global REDIS-DB.

local REDIS-DB content/ownership:
The below-mentioned state DB and config DB, both are part of local REDIS-DB on CC/Supervisor.
show platform output would come from local REDIS-DB of respective card (CC/Supervisor , LC)

Only one global REDIS-DB and it would reside on CC/Supervisor. Its content/ownership:
local environmental sensors of CC (and FCs on it) + sensors of all remote LCs

'Yes' for all of the questions.

@minionatwork
Copy link
Contributor

Follow-on question:

  • 'show platform psustatus' and 'show platform fan' data is stored in local REDIS-DB or global REDIS-DB ? of PMON/ChassisD

It is stored in local redis DB in the CC as whatever existing sonic state-db schema.

@minionatwork
Copy link
Contributor

I believe it means monitoring LCs (line-cards) status too on Supervisor/CC to ensure all cards(present in the chassis) shows up in show platform.
Can you please confirm?
In this document, noticed following APIs:
get_num_linecards() ; get_all_linecards (); get_linecard_presence ()
Don't seem them yet on master branch of https://github.com/Azure/sonic-platform-common/blob/master/sonic_platform_base/chassis_base.py
So, they would be introduced and come later? Tentatively when or which branch?

As chassis_mgr/chassis owner, chassisD to detect FCs too - i.e. how many max FCs slots, which all Fabric slots have FC present etc.
Recommend looking into adding following to chassis_base. py:
get_num_fabriccards() ; get_all_fabriccards (); get_fabriccard_presence ()

This will be provided by existing module_base class. There is a type to differentiate to say what type the module is. looks for get_type().

jleveque pushed a commit to sonic-net/sonic-platform-daemons that referenced this pull request Nov 10, 2020
Introducing chassisd to monitor status of cards on a modular chassis

HLD: sonic-net/SONiC#646

**-What I did**
Introducing a new process to monitor status of control, line and fabric cards.

**-How I did it**
Support of monitoring of line-cards and fabric-cards. This runs in the main thread periodically.
It updates the STATE_DB with the status information. 'show platform chassis-modules' will read from the STATE_DB

Support of handling configuration of moving the cards to administratively up/down state. The handling happens as part
of a separate thread that waits on select() for config event from a CHASSIS_MODULE table in CONFIG_DB.
jleveque pushed a commit to sonic-net/sonic-platform-daemons that referenced this pull request Nov 11, 2020
PSUd changes to computer power-budget for Modular chassis

HLD: sonic-net/SONiC#646

PSUd will introduce power requirements calculations. Platform APIs are introduced to provide consumers and total consumed power. Number of PSUs will help provide total supplied power

**Output of STATE-DB:**
```
  "CHASSIS_INFO|chassis_power_budget 1": {
    "expireat": 1603182970.639244,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "SUPERVISOR consumed_power": "80.0",
      "FABRIC-CARD consumed_power": "185.0",
      "FAN consumed_power": "999",
      "LINE-CARD consumed_power": "1000.0",
      "PSU supplied_power": "9000.0"
    }
  },
```
jleveque pushed a commit to sonic-net/sonic-platform-daemons that referenced this pull request Nov 11, 2020
Enhance thermalctld to write to chassis state-DB on a modular chassis

HLD: sonic-net/SONiC#646

In a modular chassis, the thermal information from all line-cards
will be updated to the chassis state-DB in the control-card.

Additionally, minimum and maximum temperatures will be recorded.
The fan control algorithm used by certain vendors will require
this information.
jleveque pushed a commit to sonic-net/sonic-platform-common that referenced this pull request Nov 12, 2020
sonic-platform-base: Changes to introduce APIs for modular chassis for power-consumption and supplied

HLD: sonic-net/SONiC#646

PSUd APIs for power requirement calculations

get_maximum_supplied_power() - per PSU
get_status_master_led() - get master psu led status. Class method.
set_status_master_led() - set master psu led status. Class method.

get_maximum_consumed_power(self) - per consumer API. Consumers are modules, Fans
jleveque pushed a commit to sonic-net/sonic-platform-common that referenced this pull request Nov 12, 2020
sonic-platform-base: Changes to introduce APIs for modular chassis for thermalctld

HLD: sonic-net/SONiC#646

Introducing thermal APIs to get min and max temperatures of each sensors
  - get_minimum_recorded()
  - get_maximum_recorded()
Copy link

@paulmenzel paulmenzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding some minor style comments found while reading through it.

doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Show resolved Hide resolved
@gechiang
Copy link
Contributor

gechiang commented Dec 8, 2020

In doc/pmon/pmon-chassis-images/pmon-chassis-distributed-db.png, which option is being planned?
In option1, is supervisor going to pull data from LC at regular intervals?
In that case, I'd suggest option 2 as it would update Supervisor from LC about any runtime change happening at LCs' end, vs option 1, where Supervisor has to come and pull after certain interval. Mostly the sensors data / thermal conditions at the board varies only when there is change/impact to it w.r.t bandwidth, pkt size, ASICs usage etc. on that board. So, better remote end (LC) notify Supervisor of such changes to save cpu-cycles on supervisor.
Another thing, option 2 populating Global Redis-DB on Supervisor/CC.
a) This may help show environment (on CC) provide holistic view of all remote card sensors data.
b) with this option, would Thermal sensors data from remote LCs to go to ThermalCtl-d of CC's local DB or this GlobalDB (on CC)? I'd think GlobalDB would be preferred.
Can you please share your thoughts?

As concluded in our last HLD review, we will go with Option-2. We will introduce a "GLOBAL_STATE_DB" where all the LCs will push their information to.

Noticed that in this morning community presentation the doc did not indicate which option was picked even though the agreement is option 2. Please update the doc to reflect this.

@mprabhu-nokia
Copy link
Contributor Author

In doc/pmon/pmon-chassis-images/pmon-chassis-distributed-db.png, which option is being planned?
In option1, is supervisor going to pull data from LC at regular intervals?
In that case, I'd suggest option 2 as it would update Supervisor from LC about any runtime change happening at LCs' end, vs option 1, where Supervisor has to come and pull after certain interval. Mostly the sensors data / thermal conditions at the board varies only when there is change/impact to it w.r.t bandwidth, pkt size, ASICs usage etc. on that board. So, better remote end (LC) notify Supervisor of such changes to save cpu-cycles on supervisor.
Another thing, option 2 populating Global Redis-DB on Supervisor/CC.
a) This may help show environment (on CC) provide holistic view of all remote card sensors data.
b) with this option, would Thermal sensors data from remote LCs to go to ThermalCtl-d of CC's local DB or this GlobalDB (on CC)? I'd think GlobalDB would be preferred.
Can you please share your thoughts?

As concluded in our last HLD review, we will go with Option-2. We will introduce a "GLOBAL_STATE_DB" where all the LCs will push their information to.

Noticed that in this morning community presentation the doc did not indicate which option was picked even though the agreement is option 2. Please update the doc to reflect this.

Done. thanks.

jleveque pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 16, 2020
HLD: sonic-net/SONiC#646

Introducing chassisd process to monitor status of the control, line and fabric cards in a modular chassis.

- Why I did it
Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Chassisd will be a central entity that has visibility of the entire chassis.

- How I did it
Chassisd process will monitor cards in the main thread. Another configuation_handling_task is created to listen to CONFIG_DB for admin_status up/down events. The monitored status is persisted in REDIS-DB.
judyjoseph pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 16, 2020
HLD: sonic-net/SONiC#646

In modular chassis, add CHASSIS_STATE_DB on control card

Why I did it
Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Control-Card CHASSIS_STATE_DB will be the central DB to maintain any state information of cards that is accessible to control-card/

How I did it
Adding another DB on an existing REDIS instance running on port 6380.
jleveque pushed a commit to sonic-net/sonic-platform-daemons that referenced this pull request Dec 16, 2020
Enhance chassisd to monitor midplane status of the cards in modular chassis

HLD: sonic-net/SONiC#646

-What I did
Add monitoring of the midplane or internal ethernet network between supervisor and line-card modules.

-How I did it
Along with status monitoring, also monitor the midplane reachability between supervisor and modules.
It updates the STATE_DB with the status information. 'show chassis-modules midplane-status' will read from the STATE_DB
sujinmkang pushed a commit to sujinmkang/sonic-platform-daemons that referenced this pull request Jan 16, 2021
Enhance chassisd to monitor midplane status of the cards in modular chassis

HLD: sonic-net/SONiC#646

-What I did
Add monitoring of the midplane or internal ethernet network between supervisor and line-card modules.

-How I did it
Along with status monitoring, also monitor the midplane reachability between supervisor and modules.
It updates the STATE_DB with the status information. 'show chassis-modules midplane-status' will read from the STATE_DB
@judyjoseph
Copy link
Contributor

judyjoseph commented Mar 22, 2021

@mprabhu-nokia, I see most of the comments resolved, Thanks.
In the xcvrd/SFP I see details on spawning multiple threads per namespace. We don't do that now, we connect to the different databases in other namespaces from the same xcvrd/ledd daemon process. Spawning thread per namespace was resulting in more number of threads. Could you update this as an alternate method suggested.

@mprabhu-nokia
Copy link
Contributor Author

@mprabhu-nokia, I see most of the comments resolved, Thanks.
In the xcvrd/SFP I see details on spawning multiple threads per namespace. We don't do that now, we connect to the different databases in other namespaces from the same xcvrd/ledd daemon process. Spawning thread per namespace was resulting in more number of threads. Could you update this as an alternate method suggested.

Fixed and referenced the preferred approach document in there.

Copy link

@paulmenzel paulmenzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but some minor nits.

doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
doc/pmon/pmon-chassis-design.md Outdated Show resolved Hide resolved
Copy link
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

@judyjoseph
Copy link
Contributor

judyjoseph commented Apr 9, 2021

@jleveque Would you take a quick look as well ..thanks

@jleveque
Copy link
Contributor

jleveque commented Apr 9, 2021

@Staphylo, @keboliu, @Junchao-Mellanox: Please review, as well.

@anshuv-mfst
Copy link
Collaborator

Chassis subgroup meeting 5/12:
@judyjoseph - could you please help with merge.

@judyjoseph judyjoseph merged commit d237dd4 into sonic-net:master May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants