Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drift Detection #3245

Open
jamengual opened this issue Mar 20, 2023 · 23 comments · May be fixed by #4114
Open

Drift Detection #3245

jamengual opened this issue Mar 20, 2023 · 23 comments · May be fixed by #4114
Labels
feature New functionality/enhancement help wanted Good feature for contributors needs discussion Large change that needs review from community/maintainers rfc Requirements for Comment

Comments

@jamengual
Copy link
Contributor

jamengual commented Mar 20, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Describe the user story

As a User I will like to be able to detect drift in my infrastructure automatically. Atlantis could detect the change by running a plan for all the projects defined in my atlantis.yaml file against the main branch and create a PR for the ones that have changes pending.

Describe the solution you'd like
I will like Atlantis to be able to enable drift detection, using some sort of configurable schedule/cron job to run plan against all my projects defined in my atlantis.yaml in the main branch and create PR/s for all the projects that found changes and I will like to be able to configure if I want auto apply of those drift PRs or have human intervention and slack alerting.

Describe the drawbacks of your solution
The API might need some adjustments to make this possible.
https://www.runatlantis.io/docs/api-endpoints.html

It will need to be compatible with github, gitlab and Bitbucket, but it can be incrementally released.

Atlantis does not create PRs so that will have to be implemented to make this work or something could be added to the UI to manage the drift feature.

Describe alternatives you've considered
There is a github action implementation of this already that I have tested and it works :
https://github.com/cresta/atlantis-drift-detection

it requires two actions and dependencies on actions that are not so well known so it will be ideal to implement this in atlantis internally instead of relying on different actions to do the job.

@jamengual jamengual added the feature New functionality/enhancement label Mar 20, 2023
@motatoes
Copy link

```
The API might need some adjustments to make this possible.
https://www.runatlantis.io/docs/api-endpoints.html
```

why would it be needed?

from my understanding it is adding extra configuration flags in atlantis.yml and when picked up then atlantis can start running the jobs in a cron in its own backend and after that it will create a PR everytime there is drift

The interaction with the PR itself then becomes as usual with the atlantis flow right?

@nitrocode
Copy link
Member

nitrocode commented Mar 20, 2023

why would it be needed?

I believe you're correct. If drift detection is built-in then Atlantis won't need to hit its own API.


Features I'd like to see


I kind of like having this feature outside of atlantis since its less to maintain. It would be cool to maintain a GitHub action like the one you linked to and make better use of the api

I wonder if we could take advantage of renovatebot when hitting the atlantis api?

https://github.com/renovatebot/renovate

If we went this route then api changes might be needed


If we were to build this in then it would be nice to have a couple settings available in a server configuration

ATLANTIS_DRIFT_DETECTION=true
# daily at 8am
ATLANTIS_DRIFT_DETECTION_CRON="0 8 * * *"

and in the atlantis.yaml config

# repo global to override server config
drift_detection:
  enabled: true
  cron: 0 9 * * *
projects:
  - name: ue1-dev-ecs-service-titan
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-titan
    # per project override
    drift_detection:
      enabled: false

  - name: ue1-dev-ecs-service-metro
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-metro
    drift_detection:
      enabled: true
      cron: 0 9 * * *

Atlantis would need to skip locking while it runs plans for each directory or it may block developer flow intentionally

If drift was detected (plan contains changes) then for Atlantis to open a pr, it would have to modify a file in the directory with some commented metadata

Perhaps a drift.tf or similar per directory which could be appended or overwritten whenever drift is detected

# atlantis detected changed on 2023-03-20T12:42:14+00:00

Once the file is modified or added, a pr can be created.

@nitrocode
Copy link
Member

This may be a duplicate of #1035

@jamengual
Copy link
Contributor Author

As @nitrocode explained there are two paths for this, Initially I thought about doing internally first so we can get that working and stable and then add changes to the API (PR with no changes, no locking etc) to be able to use any webhook type system to trigger the drift detection and let the user decide how to deal/create the reconcile PRs, this way the users have more control on how to deal with change. The reason for this is that I can see how many users will prefer to trigger this by other means due to control policies, auditing, security scanning etc.

@motatoes
Copy link

Thanks @nitrocode @jamengual I'm going to implement initial version based on his example from atlantis config example, will not worry about API changes for this one. Then we can iterate from there ..

server

ATLANTIS_DRIFT_DETECTION=true
# daily at 8am
ATLANTIS_DRIFT_DETECTION_CRON="0 8 * * *"

and atlantis.yml

# repo global to override server config
drift_detection:
  enabled: true
  cron: 0 9 * * *
projects:
  - name: ue1-dev-ecs-service-titan
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-titan
    # per project override
    drift_detection:
      enabled: false

  - name: ue1-dev-ecs-service-metro
    dir: components/terraform/ecs-service
    workspace: ue1-dev-ecs-service-metro
    drift_detection:
      enabled: true
      cron: 0 9 * * *

@jamengual
Copy link
Contributor Author

sounds good to me

@motatoes motatoes mentioned this issue Mar 26, 2023
11 tasks
@nitrocode
Copy link
Member

I'm thinking if this is built into atlantis, it may overload this single thread machine, so im for the option to do drift detection as a helper outside of atlantis.

If we can support a single atlantis plan run from cli and comment somewhere (like slack or any webhook) then we can do the following

Use a workflow or k8s cron to

  1. get all projects/dirs and start loop
  2. run atlantis plan locally for the current project/dir
    • then it should run the plan for a specific project
  3. If a plan shows changes, hit the web hook (which could be slack or other) with some templated text
  4. Repeat from step 2

i believe the above is basically https://github.com/cresta/atlantis-drift-detection

@jamengual
Copy link
Contributor Author

jamengual commented Apr 1, 2023 via email

@jukie
Copy link
Contributor

jukie commented Apr 1, 2023

I recently built a similar version of https://github.com/cresta/atlantis-drift-detection for Gitlab with the intention of having the VCS client bits be pluggable and less lock-in for a specific service. I need to clean it up before making it public but if that helps maybe we could create a separate repo for a drift-detector service under Atlantis which could run alongside.

Based on a scheduled pipeline it will:

  1. Run an API plan against all projects (supports explicitly including/excluding projects)
  2. If drift is found, opens a PR with a change to a dummy file (currently commits a timestamp to file)
  3. Comments atlantis plan project1|project2|project3 based on drifted projects
  4. Has support for adding assignees/reviewers

Personally I like having it as a scheduled Gitlab pipeline but that could easily be extended to a long running service or another form of trigger. We can start with an opinionated design but still allow for user choice.

@jamengual
Copy link
Contributor Author

jamengual commented Apr 1, 2023 via email

@nitrocode
Copy link
Member

nitrocode commented Apr 2, 2023

I don't know if I'd want it to create a pull request. Id rather it hit a webhook (e.g. mention the drift in slack with a link) with custom data. The link could navigate to the project in the ui and show a status of drifted. The ui could expose an apply button to run terraform apply in resolve the drift. This is how other saas platform do it.

Ideally each run of Atlantis would be in a separate run so the server isn't overloaded.

@jukie
Copy link
Contributor

jukie commented Apr 15, 2023

Still WIP but I've made this which can be orchestrated up to the user: https://github.com/jukie/atlantis-drift-detection

@GenPage GenPage added needs discussion Large change that needs review from community/maintainers rfc Requirements for Comment labels Apr 16, 2023
@sadminriley
Copy link

Any updates on this one? Would be super nice to have this in Atlantis. I noticed this has been outstanding for quite some time - #3269

@jamengual
Copy link
Contributor Author

Sadly, both developers who volunteered to build this feature never replied, so it is on pause.

We need committed community contributors to make this happen and hopefully supported by their companies to do so.

@jamengual jamengual added the help wanted Good feature for contributors label Sep 11, 2023
@marcportabellaclotet-mt

I have also been working in a custom tool, to be able to manage drift.
It is mainly focused on github, and it works by auto-discovering the repositories. Repository list to be checked can be passed as an argument if needed.

  • Uses the github api to discover the repositories where the github app is installed.
  • Compares the list of repositories with the repos.yaml config file, to check what repositories to clone.
  • Clones these repos in a temporal folder.
  • Gets the pre-workflow-hook steps from repos.yaml config file, and executes them in the downloaded folders for each git repo
  • In my use case, we generate the yaml dynamically using atlantis-yaml-generator, so the hook step is needed to mimic the atlantis server behavio.r
  • Later it parses the atlantis.yaml file, gets the correct workflow, and execute the plan workflow on the downloaded folders for each repo
  • Runs an additional tfshow command to read the result, and checks for a marker which identifies the drift or no drift result.
  • If there is drift, it optionally runs a slack notification, with drift details, or workflow plan error details.
  • Projects to be checked can be included or excluded.
  • Each action is logged, so it is easy to use a logging system to generate monitors.
  • It is running in the same atlantis server, via cronjob. I have created a custom atlantis container image with dcron enabled.

It works quite well, but it lacks some functionality like checking for atlantis locks. I will work on this.

But as shared here, maybe it would be a better approach to integrate this feature as part of atlantis core. The related PR in this thread seemed promising, but unfortunately discontinued.

@motatoes
Copy link

Hi guys, I'm sorry for the silence here last couple of weeks .. I got caught up with work and couldn't give the PR much attention. I'm going to take a look at it on the next couple of days over new years holidays! So I hope to make good progress on it :)

@motatoes motatoes linked a pull request Jan 1, 2024 that will close this issue
11 tasks
@gaurav517
Copy link

It would be so nice to have this feature.

@PScoriae
Copy link

doing my part to show interest :)

@seifrajhi
Copy link

+1
Looking forward to seeing this feature implemented soon! :)

@djsingh23
Copy link

Will this feature be available anytime soon?

@jamengual
Copy link
Contributor Author

jamengual commented May 2, 2024 via email

@nitrocode
Copy link
Member

Another way to set this up is similar to how atmos has set it up where it runs the plans across root dirs and creates open github issues when drift is found.

https://atmos.tools/integrations/github-actions/atmos-terraform-drift-detection/

@raghulkrishna
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality/enhancement help wanted Good feature for contributors needs discussion Large change that needs review from community/maintainers rfc Requirements for Comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.