Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] QoS supports of Hadoop YARN task in koordlet/slo-controller #1727

Closed
6 of 10 tasks
zwzhang0107 opened this issue Oct 30, 2023 · 2 comments
Closed
6 of 10 tasks
Labels
area/koord-manager area/koordlet area/YARN colocate yarn with k8s help wanted Extra attention is needed kind/proposal Create a report to help us improve lifecycle/stale

Comments

@zwzhang0107
Copy link
Contributor

zwzhang0107 commented Oct 30, 2023

What is your proposal:
Koordinator is almost ready to support running Hadoop YARN tasks as Batch+BE QoS in K8s. Here are the detailed design.

Since YARN tasks are running under a specified cgroup path(kubepods/besteffort/hadoop-yarn), koordinator should take the tasks into account during Batch resource calculation and QoS Management.

Why is this needed:
Resource usage of YARN tasks will be count as system usage if koordlet dose not collect it independently, which leads to the miscalculation of Batch Allocatable and the omission of QoS Management.

Is there a suggested solution, if so, please add it:
Define YARN tasks as out-of-band Host Application, see the api design for more details.

  host-application-config: |
    {
      "applications": [
        {
          "name": "yarn-task",
          "priority": "koord-batch",
          "qos": "BE",
          "cgroupPath": {
            "base": "CgroupBaseTypeKubeBesteffort",
            "parentDir": "./",
            "relativePath": "hadoop-yarn/",
          }
        }
      ]
    }
  • add plugin in koordlet metricAdvisor for collecting the resource usage of host-applications by cgroup path.(@zwzhang0107)
  • koordlet statesInformer reports the hosts application resource usage in NodeMetric.
  • slo-controller treats resource usage of YARN tasks as batch prioirty during caculcating batch allocatable
  • slo-controller exclude yarrn allocated from node annotation when calculating k8s batch allocatable
  • CPUSuppress strategy should exclude YARN tasks during system usage calculation.
  • add plugin in statesInformer for collecting YARN task info from yarn-copilot, which will be used for startegies like memory eviction.
  • MemoryEvict strategy should consider YARN tasks during eviction. since the strategy may kill batch pods/YARN tasks at the same time. koordlet should consider the request/usage of hadoop tasks for fairness.

support static qos stragegy for host application(#1639)

  • LLC/MBA insolation should consider YARN tasks pids.
  • group identity plugin consider YARN cgroup dir.
  • set memory qos for YARN cgroup dir.
Copy link

stale bot commented Mar 5, 2024

This issue has been automatically marked as stale because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close
    Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Mar 5, 2024
Copy link

stale bot commented Apr 9, 2024

This issue has been automatically closed because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Reopen this PR with /reopen
    Thank you for your contributions.

@stale stale bot closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/koord-manager area/koordlet area/YARN colocate yarn with k8s help wanted Extra attention is needed kind/proposal Create a report to help us improve lifecycle/stale
Projects
None yet
Development

No branches or pull requests

1 participant