Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TACACS server monitor design document. #1467

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
86 changes: 86 additions & 0 deletions doc/aaa/TACACS+ Server Monitor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# TACACS+ server monitor design

## Overview

SONiC device usually configured with multiple TACACS+ server, when a server is unreachable, SONiC device will try to connect with next TACACS+ server.

SONiC device will communicate with TACACS+ server in following scenarios:
1. Remote user login to SONiC device.
2. Remote user run commands on SONiC device.

There is a timeout for each server, the default value is 5 seconds, this means if the first server not reachable, SONiC device will stuck there when user login or running commands.

To improve this issue, SONiC will add a TACACS+ server monitor to change server priority, a server unreachable or slow response will be downgrade.

### Functional Requirement
- Monit TACACS+ server unreachable event from COUNTER_DB.
- Monit TACACS+ server slow response event from COUNTER_DB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which component write to the counter_db, it is not clear from the design doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monit service will write COUNTER_DB, add detail to design doc.

- Change server priority based unreachable event and slow response event.
- Not change any other server attribute.
- Not change any other TACACS+ config.

### Counter DB schema
#### TACPLUS_SERVER_LATENCY Table schema
```
; Key
server_key = IPAddress ; TACACS+ server’s address
; Attributes
latency = 1*10DIGIT ; server network latency in MS, -1 for connect to server timeout
```

### Config DB schema
#### TACPLUS_SERVER_MONITOR Table schema
liuh-80 marked this conversation as resolved.
Show resolved Hide resolved
```
; Key
config_key = 'config' ; The configuration key
; Attributes
time_window = 1*5DIGIT ; Monitor time window in minute, default is 5
high_latency_threshold = 1*5DIGIT ; High latency threshold in ms, default is 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing yang mode design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, yang model added.

```

# 3 Limitation

- Service priority change will have 1 minutes delay, this is because monit service will run profile every 1 minutes.
liuh-80 marked this conversation as resolved.
Show resolved Hide resolved

# 4 Design

```
+------------+
| Monit |
+-----+------+
|
+------------v--------------+ +---------------------+
| | | |
| | | |
| TACACS+ Monitor |------>| COUNTER_DB |
| | | |
| | | |
+------------+--------------+ +---------------------
|
+---------v---------+ +-------+--------+
| | | |
| TACACS config file+---------------> config file |
| generate script | | |
+-------------------+ +-------+--------+

```
- TACACS+ monitor is a Monit profile.
- TACACS+ monitor will perdically check TACACS server latency and update latency to COUNTER_DB.
- The latency in COUNTER_DB TACPLUS_SERVER_LATENCY table is average latency in recent time window.
- The time window side defined in CONFIG_DB TACPLUS_SERVER_MONITOR table.
- TACACS+ monitor also will write warning message to syslog and re-generate TACACS server config file when following event happen in COUNTER_DB:
- Any server latency is -1, which means the server is unreachable.
- Any server latency is bigger than high_latency_threshold.
- TACACS+ monitor will not change TACACS server config in CONFIG_DB, it only re-generate TACACS config file based on CONFIG_DB and COUNTER_DB.
- The TACACS config file generate code will move to a new script file, both hostcfgd and TACACS+ monitor will use this file to re-generate TACACS config file.
liuh-80 marked this conversation as resolved.
Show resolved Hide resolved
- When generate TACACS config file, server priority calculated according to following rules:
- Change high latency server and un-reachable server priority to 1, this is because 1 is the smallest priority, and SONiC device will use high priority server first.
liuh-80 marked this conversation as resolved.
Show resolved Hide resolved
- If other server also has priority 1 in CONFIG_DB, change priority to 2
- If other server priority is no 1, using original priority in CONFIG_DB

# 5 References

## TACACS+ Authentication
https://github.com/sonic-net/SONiC/blob/master/doc/aaa/TACACS%2B%20Authentication.md
## SONiC TACACS+ improvement
https://github.com/sonic-net/SONiC/blob/master/doc/aaa/TACACS%2B%20Design.md