New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add a new container Logs API #36
Comments
The progress of this feature can be tracked here - rancher/cattle#138 |
Few questions @ibuildthecloud
|
The frame format is 0 1 TS Content Where 0 is the version number (always 0 for now), 1 is the either 1 or 2 depending on stdout/stderr and the TS is timestamp in the format that Docker returns. |
Closing as #40 is closed. |
Added basic HTTP auth to template details request
# This is the 1st commit message: restricted admin additional tests # This is the commit message rancher#2: Stop hosted clusters from deleting before tests run # This is the commit message rancher#3: User GUID for the PrincipalID for Active Directory # This is the commit message rancher#4: Migrate Active Directory users to use objectGUID as the principalId # This is the commit message rancher#5: Add defaults package to extensions. Update v2 tests and extensions to use the timeout value from the new defaults package # This is the commit message rancher#6: Fix atoi call with empty string in azure auth provider # This is the commit message rancher#7: Bump csp adapter to 2.0.2-rc2 # This is the commit message rancher#8: Add retries to kubeapi requests in integration tests The downstream cluster sometimes has random disconnects that interrupt the test setup, try to make the suite more resilient when connecting to the downstream. # This is the commit message rancher#9: Add retries to K3D cluster setup About 1/10 times the integrationsetup script fails with message Failed Cluster Preparation: Failed Network Preparation: failed to create cluster network: docker failed to create new network 'k3d-auto-k3d-cluster-xtrfk': Error response from daemon: Failed to program FILTER chain: iptables failed: iptables --wait -I FORWARD -o br-c722011a5900 -j DOCKER: iptables: Resource temporarily unavailable.\n (exit status 4) Since the failure is noted to be temporary, add retries to try to avoid having ithe whole job fail. # This is the commit message rancher#10: Add timeout to integration import cluster The integration test setup has a wait both in the ImportCluster routine and after it. If a networking error on the test node causes the import job to never run, the pipeline waits undefinitely until the drone timeout, and it is never clear why the step hung. This change adds a timeout to the internal import cluster step so that the wait is shorter and the problem is more clearly logged. Also add more logging so it is clear which step is getting stuck. # This is the commit message rancher#11: Use a random ID for integration test labels 5f78652 introduced labels to resources in the steveapi integration tests so that assertions could exclude non-test-generated resources. However, since the label is deterministic, if the tests are run multiiple times on the same cluster and if the resources weren't properly cleaned up after the last test run due to an unexpected failure, the subsequent test runs would include the old resources in their results. To prevent this, use a unique ID for the resource label in the steveapi integration tests. # This is the commit message rancher#12: Update steve for new project filtering feature # This is the commit message rancher#13: Add steve API tests for filtering by projects Add integration tests for the new `projectsornamespaces` query parameter in steve. # This is the commit message rancher#14: Improving default for PSP options Improves the default for global.cattle.psp.enabled to not require manual user override on k8s 1.25 # This is the commit message rancher#15: Rebasing helm unittests to use upstrem plugin Previously, chart unit tests used a fork of helm-unittests to run. This commit commit changes the unit tests to use the upstream plugin instead, which requires small changes to the tests and omitting the tests phase on the s390x architecture. # This is the commit message rancher#16: Tests for Improving default for PSP options # This is the commit message rancher#17: Bump Rancher-Webhook to v0.3.5-rc5 # This is the commit message rancher#18: Create a CRTB for a restricted admin when a GRB gets created for it # This is the commit message rancher#19: Enqueue restricted admin's GRB if CRTB is deleted from remote cluster # This is the commit message rancher#20: Stop creating unnecessary RBAC resources for restricted admins # This is the commit message rancher#21: Updated GRB handler for resetricted-admin. # This is the commit message rancher#22: Restructure restricted-admin rule reconciliation. related-resource logic for re-enqueuing GRBs was moved from `pkg/controllers/management/authprovisioningv2` to `pkg/controllers/managementuser/rbac` `pkg/controllers/management/restrictedadminrbac/register.go` no longer creates cluster and project handlers for giving the restricted-admin rules in the local cluster namespace. This also caused the removal of unused member variables from the handler GRB handler code now ensures a CRTB for the GRB subject to the cluster-owner roleTemplate if the GRB is for a restricted-admin. If not then the handler will bind the GRB subject to the cluster-admin role if the GRB is an admin GRB. This change also caused the removal of unused member variables from the handler. # This is the commit message rancher#23: Moves restricted-admin CRTB to management context. Restricted admin now gets their CRTB for cluster-owner to downstream cluster through controllers in the management context. # This is the commit message rancher#24: Adds unit tests for restrictedadminrbac controller # This is the commit message rancher#25: Fixes admin sync error and adds unit tests. # This is the commit message rancher#26: [CAPR] Enhance new provisioning tests for etcd snapshot creation/restore, encryption key rotation, and certificate rotation (rancher#41459) * Add new operations tests and refactor v2prov test framework, refactor test frameworks to prevent repeating the same code in two places, add more etcd snapshot related tests and additional conditional checks around secret conflicts, selectively check cluster readiness when scaling, check objectstore health to prevent race condition on startup and ensure snapshot file is not failed * Add unit test for condition manipulation check for managesystemagentplan * Fix rkebootstrap controller handling of etcd node safe removal annotation * Add RKE2 manifest removal instructions to encryption key rotation and certificate rotation to help ensure system components are restarted on major operations * Add additional etcd restore stage to clean up system pods, don't generate capr cluster tokens if the cluster has plans delivered, and don't short circuit plan delivery logic if planAppliedButWaitingForProbes * Bump rancher-machine version to v0.15.0-rancher100 * Fix S3 endpoint CA rendering and prefer snapshot S3 files and arguments * Bump system-agent to v0.3.3-rc3 * Consolidate etcd machine cleanup and force remove machines on etcd restore shutdown phase * Don't autoset join URL if annotation is set * Clean up non-matching nodes on restore * Fix unnecessarily noisy certificate rotation pausing Signed-off-by: Chris Kim <oats87g@gmail.com> # This is the commit message rancher#27: Add hostname truncation validation test # This is the commit message rancher#28: feat: Allows configuration of the 'type' used in Service * Defaults to the standard ClusterIP * Allows user to override with NodePort or LoadBalancer * Allows user to customise service with provided annotations * Chart docs have been updated * This allows smooth running on GKE clusters using static IP addresses and Google managed certificates Fixes issue: rancher#16061 # This is the commit message rancher#29: Adds tests for the new service type attribute # This is the commit message rancher#30: fix: Fixed silly issue with tests # This is the commit message rancher#31: feat: Allows service annotations to be configured # This is the commit message rancher#32: fix: Added missing annotations key. Doh. # This is the commit message rancher#33: fix: Add missing empty trailing new line. # This is the commit message rancher#34: Adds a path to the Ingress rule in the Rancher chart to make it compatible with ingress controllers that require a path to be present. Fixes rancher#39638 Signed-off-by: Bastian Hofmann <mail@bastianhofmann.de> # This is the commit message rancher#35: Fix ingress path unit test. # This is the commit message rancher#36: Add multi-environment support for AKS Issue: rancher/aks-operator#98 # This is the commit message rancher#37: Updating to Fleet v0.7.0-rc.3 # This is the commit message rancher#38: Keep all nodes during etcd restore that either match the machine UID label selector or have a corresponding node ref (rancher#41564) Signed-off-by: Chris Kim <oats87g@gmail.com> # This is the commit message rancher#39: Rework errNotConfigured into a type Using a type that implements Error, we can use that type in tests without needing to know about its underlying implementation. This keeps the underlying value opaque. Should be no change in behavior. # This is the commit message rancher#40: Initial round of unit tests for Okta+LDAP These are specifically intended to test the behavior in PR rancher#41269 so they are intentionally quite limited in scope. Mostly the goal is to ensure that when an ldapProvider is configured on a SAML provider, it is actually used when a principal search is performed. This would be fairly trivial to expand to the shibboleth provider, and in the future I'd like to include a group search suite. # This is the commit message rancher#41: Add doc comments to IsNotConfigured and ErrNotConfigured # This is the commit message rancher#42: bump the SUC version in the Dockerfile # This is the commit message rancher#43: Add cluster agent tests # This is the commit message rancher#44: run constructFilesSecret both when creating and deleting a node (rancher#41003) Some NodeDrivers need to have access to the same secrets they used when creating the node. For example, the Openstack node driver needs access to the cacert file that is used to connect to Openstack. # This is the commit message rancher#45: Fix run script to check for args # This is the commit message rancher#46: Pin the rancher-webhook chart to an exact version # This is the commit message rancher#47: Bring back the old version-comparing behavior and cover it with tests # This is the commit message rancher#48: Add new logic with exact version and cover it with tests # This is the commit message rancher#49: Adjust remaining behavior for the deprecated env var # This is the commit message rancher#50: Ensure the new and old Helm values are merged # This is the commit message rancher#51: Allow downgrades only when using exact version explicitly # This is the commit message rancher#52: Add test for agent customization in fleetcluster # This is the commit message rancher#53: Do not export RestConfig of test Client; configure a RestGetter instead # This is the commit message rancher#54: Additional restricted admin tests
Overview
Logs are always useful to figure out what's going on in the distributed system especially in situations when something goes wrong.Docker currently supports getting logs from a container that logs to stdout/stderr. Everything that the process running in the container writes to stdout or stderr docker will convert to json and store in a file on the host machine's disk which you can then retrieve with the docker logs command. This feature aims to provide a way to monitor logs from the Rancher UI.
What Docker already supports?
The current way of fetching the logs of a container is to run the docker logs command with 5 possible options from the docker remote API as in version 1.16 (https://docs.docker.com/reference/api/docker_remote_api_v1.16/#get-container-logs) -
Query Parameters:
follow – 1/True/true or 0/False/false, return stream. Default false
stdout – 1/True/true or 0/False/false, show stdout log. Default false
stderr – 1/True/true or 0/False/false, show stderr log. Default false
timestamps – 1/True/true or 0/False/false, print timestamps for every log line. Default false
tail – Output specified number of lines at the end of logs: all or . Default all
Need for this feature
Currently, there is no way from the Rancher UI to monitor docker container logs. Hence, this feature aims to obtain docker logs from each container, collect and ship it using this new API to the Rancher UI so that the user can always monitor it through the console on the Rancher UI. Thus, this feature provides an alternative to the end user to monitor the logs as agains the existing ways of either using the Docker Remote API or the Docker CLI.
Tentative Design
The result of this new action(API) should return a url to a websocket on the host and also a jwt token. The websocket should connect to the host-api go process which runs on the host. (I'll update this section as I progress further)
Implementation Details
The implementation of this new API would probably be based on how the stats action and the exec action on instance have been implemented in Rancher.Currently, the host-api go process creates a secure websocket that proxies information. Rancher currently uses it to grab information from "cadvisor". Here in this new action API, I plan to call docker logs command with appropriate input parameters required for fetching the logs of a container.
Query Parameters Format -
Name Format
"follow" - boolean
"lines" - Integer (corresponds to tail in docker's remote API) - default Value - 100
"stdOut" - boolean
"stdErr" - boolean
"timestamps" - boolean
Example API usage -
http://Cattle Server IP:8080/v1/containers/container ID/?action=logs&lines=10&stdOut=true&follow=true
Use Cases Needed to be Explored/Supported
The text was updated successfully, but these errors were encountered: