Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TACACS server monitor design document. #1467

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
88 changes: 88 additions & 0 deletions doc/aaa/TACACS+ Server Monitor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# TACACS+ server monitor design

## Overview

SONiC device usually configured with multiple TACACS+ server, when a server is unreachable, SONiC device will try to connect with next TACACS+ server.

SONiC device will communicate with TACACS+ server in following scenarios:
1. Remote user login to SONiC device.
2. Remote user run commands on SONiC device.

There is a timeout for each server, the default value is 5 seconds, this means if the first server not reachable, SONiC device will stuck there when user login or running commands.

To improve this issue, SONiC will add a TACACS+ server monitor to change server priority, a server unreachable or slow response will be downgrade.

### Functional Requirement
- Monit TACACS+ server unreachable event from COUNTER_DB.
- Monit TACACS+ server slow response event from COUNTER_DB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which component write to the counter_db, it is not clear from the design doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monit service will write COUNTER_DB, add detail to design doc.

- Change server priority based unreachable event and slow response event.
- Not change any other server attribute.
- Not change any other TACACS+ config.

### Counter DB schema
#### TACPLUS_SERVER_LATENCY Table schema
```
; Key
server_key = IPAddress ; TACACS+ server’s address
; Attributes
latency = 1*10DIGIT ; server network latency in MS, -1 for connect to server timeout
```

### Config DB schema
#### TACPLUS_MONITOR Table schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can TACPLUS_MONITOR be disabled? For example, if the table is not defined, does it mean that TACPLUS Monitor is disabled, i.e. no monitoring, and effective priorities are same as configured priorities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 'enable' flag for disable this feature. when feature disabled, configured priorities will same as configured priorities.

```
; Key
config_key = 'config' ; The configuration key
; Attributes
time_window = 1*5DIGIT ; Monitor time window in minute, default is 5
high_latency_threshold = 1*5DIGIT ; High latency threshold in ms, default is 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing yang mode design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, yang model added.

```

# 3 Limitation

- Service priority change will have 1 minutes delay, this is because monit service will run profile every 1 minutes.
liuh-80 marked this conversation as resolved.
Show resolved Hide resolved

# 4 Design

```
+------------+
| Monit |
+-----+------+
|
+------------v--------------+ +---------------------+
| | | |
| | | |
| TACACS+ Monitor |------>| COUNTER_DB |
| | | |
| | | |
+------------+--------------+ +---------------------
|
+---------v---------+ +-------+--------+
| | | |
| TACACS config file+---------------> config file |
| generate script | | |
+-------------------+ +-------+--------+

```
- TACACS+ monitor is a Monit profile.
- TACACS+ monitor will perdically check TACACS server latency and update latency to COUNTER_DB.
- The latency in COUNTER_DB TACPLUS_SERVER_LATENCY table is average latency in recent time window.
- The time window side defined in CONFIG_DB TACPLUS_MONITOR table.
- TACACS+ monitor also will write warning message to syslog when following event happen:
- Any server latency is -1, which means the server is unreachable.
- Any server latency is bigger than high_latency_threshold.
- Hostcfgd will monitor TACPLUS_SERVER_LATENCY table, and will re-generate TACACS config file when following event happen:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the threshold to determine high latency v.s. not. how do we choose the threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threshold is 20ms. which is based on experience when handle TACACS server latency/unreachable issue, I can share more detail in review meeting, but may not necessarily to write that in public doc.

- Any server latency is -1, which means the server is unreachable.
- Any server latency is bigger than high_latency_threshold.
- When hostcfgd generate TACACS config file, server priority calculated according to following rules:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need an option to maintain backward compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, add 'enable' flag, this feature can be disable by this flag.

- Get server priority info from CONFIG_DB TACPLUS_SERVER table.
- Change high latency server and un-reachable server priority to 1, this is because 1 is the smallest priority, and SONiC device will use high priority server first.
liuh-80 marked this conversation as resolved.
Show resolved Hide resolved
- If other server also has priority 1 in CONFIG_DB, change priority to 2
- If other server priority is no 1, using original priority in CONFIG_DB

# 5 References

## TACACS+ Authentication
https://github.com/sonic-net/SONiC/blob/master/doc/aaa/TACACS%2B%20Authentication.md
## SONiC TACACS+ improvement
https://github.com/sonic-net/SONiC/blob/master/doc/aaa/TACACS%2B%20Design.md