Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMonitor : Added inotify support to cmonitor. #49

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: install debian-packaged dependencies
run: sudo apt install -y libgtest-dev libbenchmark-dev libfmt-dev tidy git python3 python3-dateutil python3-pip
run: sudo apt install -y libgtest-dev libbenchmark-dev libfmt-dev tidy git python3 python3-dateutil python3-pip inotify-tools
- name: install pypi-packaged dependencies
run: sudo pip3 install pytest black conan
run: sudo pip3 install pytest black conan inotify

# NOTE: since we later run "make" using the normal "builder" user, we must use Conan using that same user (so avoid the "sudo"!!)
- name: install Conan-packaged dependencies
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ all:

centos_install_prereq:
# this is just the list present in "BuildRequires" field of the RPM spec file:
yum install gcc-c++ make gtest-devel fmt-devel git
yum install gcc-c++ make gtest-devel fmt-devel git inotify-tools

test:
$(MAKE) -C collector test
Expand Down
23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,6 @@ cmonitor_collector \
--output-filename=pod-performances.json
```


### Connecting with InfluxDB and Grafana

The `cmonitor_collector` can be connected to an [InfluxDB](https://www.influxdata.com/) deployment to store collected data (this can happen
Expand Down Expand Up @@ -469,6 +468,28 @@ cmonitor_collector \
The Prometheus instance can then be used as data source for graphing tools like [Grafana](https://grafana.com/)
which allow you to create nice interactive dashboards (see examples in InfluxDB section).

### CMonitor helper tool:
cmonitor_launcher tool can be used to automate the monitoring the Kubernetes PODs.

It will perform following steps:

- Watch all files below a directory and notify an event for changes of a Pod restart or creation of a new Pod.
- Check the process name against the white-list given in the filter list.
- Execute command to launch CMonitor if the process name matches with the filter.

```
Example:
cmonitor_launcher --path /sys/fs/cgroup/memory/kubepods/burstable/
--filter process_1 process_2
--ip-port 172.0.0.1:9090 172.0.0.2:9090
--command "./cmonitor_collector --num-samples=until-cgroup-alive
--deep-collect --collect=cgroup_threads,cgroup_cpu,cgroup_memory,cgroup_network
--score-threshold=0 --sampling-interval=3 --output-directory=/home
--allow-multiple-instances --remote prometheus"
--log /home
--timeout 20
```
In the above example, cmonitor_collector will be launched automatically for process_1 and process_2 with Prometheus instance at 172.0.0.1:9090 and 172.0.0.2:9090 respectively.

### Reference Manual

Expand Down
7 changes: 5 additions & 2 deletions tools/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@ include $(ROOT_DIR)/Constants.mk
TOOLS = \
chart/cmonitor_chart.py \
filter/cmonitor_filter.py \
statistics/cmonitor_statistics.py
statistics/cmonitor_statistics.py \
launcher/cmonitor_launcher.py
SYMLINKS = \
chart/cmonitor_chart \
filter/cmonitor_filter \
statistics/cmonitor_statistics
statistics/cmonitor_statistics \
launcher/cmonitor_launcher

# cmonitor_version.py has to be listed explicitly because it does not exist yet
# when the $(wilcard) runs (after a clean checkout)
Expand Down Expand Up @@ -55,6 +57,7 @@ endif
test:
$(MAKE) -C filter test
$(MAKE) -C statistics test
$(MAKE) -C launcher test
# FIXME:
# $(MAKE) -C chart test

Expand Down
196 changes: 196 additions & 0 deletions tools/common-code/cmonitor_watcher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
#!/usr/bin/python3

#
# cmonitor_watcher.py
#
# Author: Satyabrata Bharati
# Created: April 2022
#
import inotify.adapters
import queue
import os
import time
import logging
from datetime import datetime

exit_flag = False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this flag should not be here... this class should not care about the whole app being requested to exit: it must just process the inotify events and fill a queue as fast as possible...that's it

# =======================================================================================================
# CgroupWatcher : Basic inotify class
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class looks good except for one thing: the timeout / sleep logic: it should not be included in this class.. it doesn't really belong here. The caller may want to sleep for whatever reason by watching cgroup should be fast and imply no sleeps. So please remove the "timeout" field.

# =======================================================================================================
class CgroupWatcher:
"""
- Watch all files below a directory and notify an event for changes.
- Retrieves all the process and extract the process name "/proc/<pid>/stat.
- check the process name against the white-list given in the filter list.
- store the events in Queue.
"""

def __init__(self, path, filter, timeout):
"""Initialize CgroupWatcher
Args:
path: path to watch for events.
filter: white-list against which the process-event is filtered.

"""
self.path = path
self.filter = filter
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the documentation it's not clear what is the format of self.filter honestly. Is it a list of strings? please document

self.timeout = timeout
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned before, remove self.timeout

self.myFileList = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.myFileList can be removed as variable


def __get_cgroup_version(self):
"""
Detect the cgroup version.
"""
proc_self_mount = "/proc/self/mounts"
ncgroup_v1 = 0
ncgroup_v2 = 0
with open(proc_self_mount) as file:
for line in file:
row = line.split()
fs_spec = row[0]
fs_file = row[1]
fs_vfstype = row[2]
if (fs_spec == "cgroup" or fs_spec == "cgroup2") and fs_vfstype == "cgroup2":
ncgroup_v2 += 1
else:
ncgroup_v1 += 1

if ncgroup_v1 == 0 and ncgroup_v2 > 0:
cgroup_versopn = "v2"
return cgroup_version
else:
cgroup_version = "v1"
return cgroup_version

def __get_process_name(self, pid):
"""Returns the process name for the process id.
Args:
pid: process id.

Returns:
The process name.

"""
cgroup_version = self.__get_cgroup_version()
if cgroup_version == "v1":
proc_filename = "/proc" + "/" + pid + "/stat"
else:
proc_filename = "/proc" + "/" + pid + "/cgroup.procs"
with open(proc_filename) as file:
for line in file:
parts = line.split()
process_name = parts[1].strip("()")
return process_name

def __get_pid_list(self, filename):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to __get_pid_list_from_cgroup_procs

"""Get the list of the process ids belong to a tasks file.
Args:
filename: the tasks file.

Returns:
The list of PIDs within the tasks file.

"""
list = []
with open(filename) as file:
for line in file:
list.append(line.strip())
return list

def __get_list_of_files(self, dir):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to __get_files_recursively
and remove mention of "event" from the description... this function doesn't deal with any event by itself... also remove "watched dir"... this function is generic

"""Returns the list of the files created for the event within the watched dir.
Args:
filename: dir to be watched.

Returns:
The list of files created within the watched dir.

"""
listOfFiles = os.listdir(dir)
allFiles = list()
for entry in listOfFiles:
fullpath = os.path.join(dir, entry)
if os.path.isdir(fullpath):
allFiles = allFiles + self.__get_list_of_files(fullpath)
else:
allFiles.append(fullpath)

return allFiles

def __process_task_files(self, dir):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to __check_new_cgroup_against_filter(self, cgroup_dir) and provide an example of "cgroup_dir" that this function expects

"""Process all the files for triggered-event within the watched dir.
Finds the process Ids and filter out the process name against the
provided white-list. If the process Id matches the whilte-listing
process from command-line , it store and return the file anlog with the process-name.
The process name later will be used to get the ip and port from the
command-line for the specific process.
Args:
dir: dir to be watched.

Returns:
The file along with the process name which will be used to launch cmonitor.

"""
# time.sleep(20)
time.sleep(self.timeout)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this sleep as mentioned before

logging.info(f"watcher process file sleep: {self.timeout}")
allFiles = self.__get_list_of_files(dir)
for file in allFiles:
if file.endswith("tasks"):
list = self.__get_pid_list(file)
if list:
for pid in list:
process_name = self.__get_process_name(pid)
logging.info(f"processing task file: {file} with pid: {pid}, process name: {process_name}")
match = self.__check_filter(process_name)
if match is True:
logging.info(f"Found match: {process_name}")
self.myFileList = {file: process_name}
return self.myFileList

def __check_filter(self, process_name):
"""Check process name against the whilte-list.
Args:
process_name: process name to be matched against the whilte-list from command-line.

Returns:
True if process_name matches with the white-list.

"""
for e in self.filter:
if process_name in e:
return True

def inotify_events(self, queue):
"""Main thread function for notifying events.
Monitored events that match with the white-list provided will be stored in this queue.
The events from this queue will be processed by cMonitorLauncher threading function to
launch cMonitor with appropriate command input
Args:
queue: monitored events will be stored in this queue.

Returns:

"""
logging.info(f"CgroupWatcher calling inotify_event")
i = inotify.adapters.Inotify()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a waste of resource to allocate the Inotify() class every time the inotify_events() method is invoked. this instance "i" should be allocated in the ctor and stored into "self.inotify_instance" member.

i.add_watch(self.path)
try:
for event in i.event_gen():
if event is not None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment like "if event is None, it means the Inotify() system has no more events that need to be processed... give control back to the caller"

if "IN_CREATE" in event[1]:
(header, type_names, path, filename) = event
logging.info(f"CgroupWatcher event triggered:{path},{filename}")
dir = path + filename
logging.info(f"CgroupWatcher event created:{filename}")
fileList = self.__process_task_files(dir)
if fileList:
logging.info(f"CgroupWatcher event in Queue:{fileList}")
queue.put(fileList)
# global exit_flag
if exit_flag is True:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this "if"

logging.info(f"CgroupWatcher exit_flag {exit_flag}")
exit(1)

finally:
i.remove_watch(path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this remove_watch() should be moved into the dtor of this class

11 changes: 11 additions & 0 deletions tools/launcher/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
ROOT_DIR:=$(shell readlink -f ../..)
PYTHON_COMMON_CODE=$(ROOT_DIR)/tools/common-code

run:
export PYTHONPATH=$(PYTHON_COMMON_CODE) ; \
./cmonitor_launcher.py $(ARGS)

test:
cd tests && \
export PYTHONPATH=$(PYTHON_COMMON_CODE) && \
pytest --capture=no -vv
Loading