Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway Managed Storage support telemetry #51

Closed
3 tasks done
nlynch-nasa opened this issue Sep 20, 2022 · 16 comments · Fixed by #61
Closed
3 tasks done

Gateway Managed Storage support telemetry #51

nlynch-nasa opened this issue Sep 20, 2022 · 16 comments · Fixed by #61
Assignees
Milestone

Comments

@nlynch-nasa
Copy link

nlynch-nasa commented Sep 20, 2022

Checklist (Please check before submitting)

  • I reviewed the Contributing Guide.
  • I reviewed the README file to see if the feature is in the major future work.
  • I performed a cursory search to see if the feature request is relevant, not redundant, nor in conflict with other tickets.

Is your feature request related to a problem? Please describe.
We need to maintain current usage tracking on a directory-by-directory bases. If FM is used to modify the contents of our directories, our tracking would not be accurate which could lead to a loss of data.

For most commands to FM, we can gain the needed information by subscribing to the FM commands and querying OSAL directly. However, FM_DELETE_CC, FM_DELETE_ALL_CC, and FM_DECOMPRESS_CC commands are a problem because we would not have foreknowledge of the files size before the command is executed by FM. This would cause us to traverse an entire directory to correct the usage tracking.

We will have multiple remote drives. With potentially Terabyte drives, traversing the directories on a remote drives is too CPU and network intensive.

Describe the solution you'd like
FM_DELETE_CC:
typedef struct
{
CFE_MSG_TelemetryHeader_t TlmHeader; /< \brief Telemetry Header */
uint32 FileSize; /
< \brief Size of the file deleted*/
char Filename[OS_MAX_PATH_LEN]; /**< \brief Delete filename */

} FM_DeleteFileTel_t;

FM_DELETE_CC:
typedef struct
{
CFE_MSG_TelemetryHeader_t TlmHeader; /< \brief Telemetry Header */
uint64 Freed; /
< \brief Bytes freed in directory*/
char Directory[OS_MAX_PATH_LEN]; /**< \brief Directory name */

} FM_DeleteAllTel_t;

FM_DECOMPRESS_CC:
typedef struct
{
CFE_MSG_TelemetryHeader_t TlmHeader; /< \brief Telemetry Header */
uint32 SourceSize; /
< \brief Size of the compressed file*/
uint32 TagetSize; /< \brief Size of the uncompressed file*/
char Source[OS_MAX_PATH_LEN]; /
< \brief Source filename */
char Target[OS_MAX_PATH_LEN]; /**< \brief Target filename */
} FM_DecompressTel_t;

Describe alternatives you've considered
Similar feedback telemetry on following would be nice but not required
FM_COPY_CC
FM_MOVE_CC
FM_CONCAT_CC

Maybe there a way to have a common telemetry packet that include function code.

Additional context
Add any other context about the feature request here.

Requester Info
Nathan Lynch JSC-ER611

@skliper
Copy link
Contributor

skliper commented Sep 21, 2022

I'm not following why you need FM to tell you what STRM should already know. If STRM is already tracking the files, wouldn't STRM already know the size and based on the current contents of the command to FM there's no need to traverse the entire file system. Even if STRM didn't know the size of the file, it can still just update the impacted directory statistics. Alternatively, if STRM is really what's being used to manage/track the file system, why use FM at all? If STRM "owned" those transactions there wouldn't be a disconnect. Analogous to typical file system operations... FM really is just the ops interface to rm, cp, mv, etc. If the intent is to really control/manage, maybe STRM should be explicitly in the loop to protect and manage directories or whatever, vs just getting downstream notification of what something else already did.

If STRM really needs knowledge of every file manipulation call, I strongly recommend not modifying every single app that could possibly manipulate files. Instead get notifications from the OSAL layer for all open, create, delete (maybe write?) related calls. We've already got a pattern established for registering callbacks on actions (like task create), and adding callbacks on file system calls should be straight forward and much less impact compared to changing/adding tlm messages from numerous apps.

@skliper
Copy link
Contributor

skliper commented Sep 21, 2022

Note we did add tlm for CF and DS creating files since that can happen autonomously. That said, there's files that get written all over cFS that are done by command (write table to file, write app data to file, write SB routing to file, write syslog to file, write events to file, etc)... modifying every single app to add a notification of file system actions would be a big impact across many elements. Getting a notification from OSAL on file API calls is trivial and 0 impact on apps.

@nlynch-nasa
Copy link
Author

nlynch-nasa commented Sep 21, 2022 via email

@skliper
Copy link
Contributor

skliper commented Sep 21, 2022

If you are already writing a "handler" for the FM telemetry to do things STRM needs to do, why not just make that a command handler and do the delete or delete all action from within STRM instead of relying on FM? If you really plan to decompress from within that managed directory, just set it up as a library call? You don't need to use FM to delete a file... it's a single call to OSAL. STRM could completely own all the actions in it's managed area which would reduce the need to modify common apps, and operationally all other apps could just stay out of the managed area.

My hesitancy is relative to implementing in open source on a common app what seems to me to be an inconsistent pattern that may only ever be needed by one stakeholder to handle off-nominal conditions in a very specific use case. What I'm suggesting is maybe this use case is sufficiently complex and unique enough such that it would be better off handled via local/custom additions or modifications giving you the freedom to do whatever is necessary in your context and truly optimize the behavior per your requirements.

@nlynch-nasa
Copy link
Author

nlynch-nasa commented Sep 21, 2022 via email

@skliper
Copy link
Contributor

skliper commented Sep 22, 2022

I'm betting it's obvious on your side I really don't understand your use case 😄

Apologies for any frustrations it may have caused. Although I encourage project ownership of the design, implementation, and test of unique project requirements, I'll let those more informed (and responsible) negotiate the path forward.

@skliper
Copy link
Contributor

skliper commented Oct 5, 2022

FYI - if your last suggestion for FM to provide directory disk usage is acceptable it should be a trivial change (although would have performance impacts with having to walk through an entire directory querying file sizes) There's a command already to request the directory file list here:

FM/fsw/src/fm_msg.h

Lines 244 to 259 in adc716e

/**
* \brief Get DIR List to Packet command packet structure
*
* For command details see #FM_GET_DIR_PKT_CC
*/
typedef struct
{
CFE_MSG_CommandHeader_t CmdHeader; /**< \brief Command header */
char Directory[OS_MAX_PATH_LEN]; /**< \brief Directory name */
uint32 DirListOffset; /**< \brief Index of 1st dir entry to put in packet */
uint8 GetSizeTimeMode; /**< \brief Option to query size, time, and mode of files (CPU intensive) */
uint8 Spare01[3]; /**< \brief Padding to 32 bit boundary */
} FM_GetDirPktCmd_t;

and the associated telemetry is here for the main packet as well as each element (that can already include size):

FM/fsw/src/fm_msg.h

Lines 317 to 337 in adc716e

typedef struct
{
char EntryName[OS_MAX_PATH_LEN]; /**< \brief Directory Listing Filename */
uint32 EntrySize; /**< \brief Directory Listing File Size */
uint32 ModifyTime; /**< \brief Directory Listing File Last Modification Times */
uint32 Mode; /**< \brief Mode of the file (Permissions from #OS_FILESTAT_MODE) */
} FM_DirListEntry_t;
/**
* \brief Get Directory Listing telemetry packet
*/
typedef struct
{
CFE_MSG_TelemetryHeader_t TlmHeader; /**< \brief Telemetry Header */
char DirName[OS_MAX_PATH_LEN]; /**< \brief Directory Name */
uint32 TotalFiles; /**< \brief Number of files in the directory */
uint32 PacketFiles; /**< \brief Number of files in this packet */
uint32 FirstFile; /**< \brief Index into directory files of first packet file */
FM_DirListEntry_t FileList[FM_DIR_LIST_PKT_ENTRIES]; /**< \brief Directory listing file data */
} FM_DirListPkt_t;

Could just add a total size to the main packet, and it would probably just be a few lines of code changes to query every file in the directory instead of just the ones reported in the packet. Note you've got the potential for a race w/ those numbers (if something is writing/deleting/moving while you are calculating) but at least you could get a snapshot.

@nlynch-nasa
Copy link
Author

FM_GET_DIR_PKT_CC only gets the size if GetSizeTimeMode is set. It is also meant to be called multiple times to get the entire directory list.

Would you really want to traverse the entire directory on every call?
Or - Would you add a specific request parameter on the command?

You might be better off making a new command that doesn’t create a file or the or the directory list just gets the sizes. That way it does not interfere with the existing commands. I think it would be simple copy/paste and remove what’s not needed. (Just don't remove the sleep) However, you would need a specific command and telemetry.

@skliper
Copy link
Contributor

skliper commented Oct 6, 2022

A new command and new telemetry is certainly within the trade space.

DirListPktCmd already traverses the entire directory to get TotalFiles, it just doesn't currently get the size for the files not getting included in the packet. Could interpret uint8 GetSizeTimeMode as packed bits instead of a bool and only calculate TotalSize if a bit for it is set. If you just want TotalSize you could just set the request bit and call it once, if instead the full list of file names is desired you wouldn't need to set the request bit for every call. Although not definitive, it could be used as a hint there's currently file actions going on in that directory (at the performance cost of doing that calculation more than once on a traversal).

Or could just add a tlm packt with just a summary (TotalFiles, TotalSize) to be generated when a command for DirListFileCmd completes it's processing... since it does already query the size of every file when GetSizeTimeMode is set. Although it would add the overhead of creating the file, but you could just overwrite the same file each time if you don't care about the list.

Lots of options.

@nlynch-nasa
Copy link
Author

Either way would suit our needs. Is it correct assume that if it has not completed the previous call you would not be able to call the function on different directory?

@skliper
Copy link
Contributor

skliper commented Oct 6, 2022

You can queue up commands for the child task and it will process them in order, it only processes one at a time. See FM_CHILD_QUEUE_DEPTH for the depth.

@jphickey
Copy link
Contributor

jphickey commented Oct 6, 2022

Looks like I'll be pulled into the mix here to actually implement this thing....

My interpretation of the original request is that it is necessary to track the overall filesystem usage, hence why the request is to add a notification every time something adds, moves, or deletes a file.

I see two major problems with that naive approach:

  1. File operations could occur outside of FM. These will not be reported.
  2. The size of the file being moved/copied/deleted does NOT directly affect the filesystem usage/available space.

Item (2) above might seem odd but there are lots of ways that these two become disjointed:

  • Filesystems are organized into allocation units (blocks and inodes) and thus your usable space depends on the filesystem block size.
  • Sparse files may occupy fewer blocks than their reported size
  • Fragmented files may occupy more blocks than their reported size
  • Hard Links (effectively duplicate directory entries referring to the same single file)
  • Files that are open by another task or process will not be actually removed until it is closed (that is, the directory entry is unlinked, but the data blocks are still used until the references are gone)

Given all the issues/limitations of tracking file system use by single file ops, my inclination is (strongly) toward an FM command that allows one to directly determine the actual filesystem usage by querying the filesystem for that info, not by making assumptions about how file ops might affect the value. OSAL already has such a call (statfs) which reports the usage and doesn't require going through all the directory entries or anything.

@nlynch-nasa
Copy link
Author

Our App has specific directories that we want to track. There may be usage outside of those directories but we don't care about that usage.

  1. Nominally, all file transaction to those directories go through our app. However, we cannot prevent an operator from using FM to delete, move file in and out of those directories. etc. If that should happen we want to know how it has affected the directory usage. File operation in these directories will not occur from any other apps on the processor.
  2. That is true for the entire volume's usage but we need usage on a directory by directory basis. Moving/copying/deleting would affect what we care about. We don't particularly care if a block is split between directories we would however want to know how it is split between them.

Doesn't the OSAL call just give the total usage for the entire volume? In our case that doesn't help. If it however can be on directories, perhaps both Size and Side on Disk would be useful.

@jphickey
Copy link
Contributor

jphickey commented Oct 6, 2022

OK, that's interesting. Not knowing the background, I don't fully understand how the notion of "directory usage" matters, because the underlying file systems simply don't work that way. Directories don't have the concept of "usage" at all - they are just logical constructs that determine how the file names are presented to the user, nothing more. In fact, files can be in more than one directory at the same time, and sparse files and what not could very much throw off any attempt to determine how much disk space a single directory is actually using.

In short - fundamentally "disk space usage" is only a concept at the volume scope.

That being said, we can certainly estimate such a thing by naively adding up all the file sizes. Of course that will not actually be accurate by any means (at least in terms of overall volume usage) but whether that matters really depends on what you are trying to accomplish. We could add some sort of command to FM that internally loops through all the files in a dir and adds up the sizes, thereby reducing the amount of CMD/TLM traffic that normally would be required to estimate such a thing.

@nlynch-nasa
Copy link
Author

Limiting the impact on the COM traffic to get this information is the main reason for this request so a command to do that would be ideal for us.

@jphickey jphickey self-assigned this Oct 24, 2022
@jphickey
Copy link
Contributor

Digging deeper into this now.

There is already a facility to monitor the free space at the volume scope, and it involves these items:

  • A Free space "monitor list" which is table of filesystem locations of interest, which has up to 8 entries by default
  • A command FM_GET_FREE_SPACE_CC that causes the FSW to poll all the locations from the table above
  • This generates a TLM message identified by FM_FREE_SPACE_TLM_MID that contains a the response from the CC above. This has the same number of entries as the table, contains the path and a "block count" which indicates the free blocks on that volume.

Two options for the path forward:

  1. Extend the existing paradigm to also cover the concept "space used" and not just "space free"
  2. Introduce an alternate table, CC, and TLM to cover this case

My preference is for (1) because it seems like a reasonable fit and avoids the clutter and complexity of introducing a whole additional table and TLM MSGID for this, but it will break compatibility with the existing table and TLM message definition as some additional fields will need to be added. If updating the existing FM_FreeSpacePkt_t TLM message is OK and will meet the needs here, then this is what I would recommend doing.

Conversely, if there is a requirement not to change the existing TLM, then this would necessitate adding a separate TLM definition to report used space vs. the existing free space report.

Please confirm, but otherwise I'm going to assume (1) is OK and pursue that route.

jphickey added a commit to jphickey/FM that referenced this issue Oct 27, 2022
Replace the "free space" table with a more general disk monitoring
table, which can have entries for volume free space (which is what was
there) as well as an estimate of directory usage.

Table and TLM was renamed accordingly.  This changes the definition of
the TLM report to include the extra info, so it requires a ground system
update.
@jphickey jphickey mentioned this issue Oct 27, 2022
2 tasks
dzbaker added a commit that referenced this issue Nov 14, 2022
@dmknutsen dmknutsen added this to the Draco milestone Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants