Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renew SCEP certificate for hosts w/ old (non-Fleet) enrollment profile #19800

Closed
9 tasks
roperzh opened this issue Jun 17, 2024 · 30 comments
Closed
9 tasks

Renew SCEP certificate for hosts w/ old (non-Fleet) enrollment profile #19800

roperzh opened this issue Jun 17, 2024 · 30 comments
Assignees
Labels
customer-rosner customer-starchik #g-mdm MDM product group P2 Prioritize as urgent :product Product Design department (shows up on 🦢 Drafting board) story A user story defining an entire feature
Milestone

Comments

@roperzh
Copy link
Member

roperzh commented Jun 17, 2024

Goal

User story
As an organization that automatically migrated my workstations (#19387) from my old MDM solution to Fleet,
I want to renew the SCEP certificates on my hosts
so that I know MDM features (commands, configuration profiles, etc.) will work for these hosts.

Context

To renew SCEP certificates, we send an InstallProfile command with Fleet's enrollment profile to the devices.

Hosts that migrated using "Process for self-hosted macOS MDM migration to Fleet" (#19387), will have a different enrollment profile (one from the old MDM solution), so the InstallProfile command will fail and the SCEP certificate won't be renewed.

Changes

Product

  • Fleet server changes: Add a job that checks for hosts' with SCEP certs that are due to expire in the next 30 days (same interval as existing job for sending a new Fleet enrollment profile). Delivers a new enrollment profile to these hosts. A new enrollment profile will tell the host to get a new SCEP cert
    • The enrollment profile (XML) will be a hidden (documented in config for contributors) environment variable.
  • Research:
    • Confirm we can points hosts to Fleet's built-in certificate authority to issue the new SCEP certificates. Simplest/preferred approach is to update SCEP URL in the enrollment profile to point to Fleet
    • ✅ Get the customer's current enrollment profile
  • Migration script: Get SCEP expiration date for each device
  • REST API changes: No API changes
  • Outdated documentation changes: Document the hidden config in the configuration for contributors docs

Engineering

  • Database schema migrations: TODO
  • Load testing: TODO

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

  • Requires load testing: TODO
  • Risk level: Low / High TODO
  • Risk description: TODO

Manual testing steps

  1. Step 1
  2. Step 2
  3. Step 3

Testing notes

Confirmation

  1. Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. QA (@____): Added comment to user story confirming successful completion of QA.
@roperzh roperzh added story A user story defining an entire feature :product Product Design department (shows up on 🦢 Drafting board) customer-rosner labels Jun 17, 2024
@dherder
Copy link
Contributor

dherder commented Jun 18, 2024

@roperzh do we know what the SCEP certificate lifespan is for the customer devices? I do know that some MDM systems will set this to a long lived value like 2099, so in those cases it would not be an issue. If the lifespan of the certificate is short lived, I would say that this would be a P2 blocker issue.

@dherder dherder added customer-starchik P2 Prioritize as urgent labels Jun 18, 2024
@roperzh
Copy link
Member Author

roperzh commented Jun 18, 2024

@dherder good point! we should check with them, I know that micromdm/scep uses 1 year by default (-crtvalid flag) so unless they provided a custom value there, it's 1 year

@zayhanlon
Copy link
Contributor

zayhanlon commented Jun 18, 2024

@lukeheath @noahtalerman per the process, letting you know that we have this as workflow/migration blocking and added the p2 label. let me know if anything else needs to be done to escalate

@georgekarrv georgekarrv added the #g-mdm MDM product group label Jun 18, 2024
@lukeheath
Copy link
Member

@zayhanlon P2 makes sense to me. Our response for P2 is:

Response: Issue is prioritized at the top of the next sprint. If opporunity cost of waiting for the next sprint is too high, it may be considered for current sprint.

We'll prioritize this for next sprint, which is scheduled to ship 7/15. Is that soon enough?

@noahtalerman @georgekarrv

@zayhanlon
Copy link
Contributor

zayhanlon commented Jun 18, 2024

@lukeheath @georgekarrv @noahtalerman - there's a thread going in #g-customer-success https://fleetdm.slack.com/archives/C062D0THVV1/p1718733547340419?thread_ts=1718384351.332159&cid=C062D0THVV1

This new issue was surfaced by Roberto this week but is also migration blocking. I don't think 7/15 will work - any way to get it faster or patched sooner? 
 

@zwass @dherder FYI

@roperzh
Copy link
Member Author

roperzh commented Jun 18, 2024

I made it a story so it gets product feedback is that I personally only see three ways to accomplish this:

  1. We change how certificate renewals work to account for hosts with custom enrollment profiles
  2. We build some product feature that allows them/us to build a flow to renew certificates (eg: a webhook, a config in the UI)
  3. We build a script that issues cert renewals that lives outside Fleet

@noahtalerman noahtalerman self-assigned this Jun 19, 2024
@noahtalerman noahtalerman changed the title handle device SCEP renewals for self-hosted macOS MDM migration Renew SCEP certificate for hosts w/ old (non-Fleet) enrollment profile Jun 19, 2024
@noahtalerman
Copy link
Member

Thanks @roperzh!

I threw some time on your calendar to dig into the options.

@roperzh
Copy link
Member Author

roperzh commented Jun 19, 2024

we met with @noahtalerman and decided to do option 3 as a fist baby step:

We build a script that issues cert renewals that lives outside Fleet

I think this requires 3 action items:

  1. Investigate if we can use Fleet's built-in CA to issue the new SCEP certificates (@roperzh)
  2. Get required information from the customer:
    a. Expiration date for each device. This could be part of the export script (cc: @zwass)
    b. Current enrollment profile (cc: @zwass @dherder)
  3. Define where this service will be hosted, could probably live alongside the proxy? (cc: @zwass @dherder)

cc: @zayhanlon

@noahtalerman
Copy link
Member

Thanks @roperzh!

Define where this service will be hosted, could probably live alongside the proxy?

I think we decided to go with fleetdm.com instead of standing up a separate service. Why? So we can reduce surface area and understandability for Fleet contributors.

If this doesn't work please let me know.

I think this means that the enrollment profile (XML) will live as an environment variable in Heroku. We'll probably need @eashaw's help to add that variable.

I updated the issue description to reflect this.

@zwass
Copy link
Member

zwass commented Jun 19, 2024

I have several questions:

  1. Who at Fleet will write the code? Who will maintain the code?
  2. Where will the server be hosted? Who will be responsible for maintaining it? Alongside the proxy does not sound like a good option as we are currently doing that in the solutions consulting AWS and the server is intended to only live for a couple weeks while the migration is completed (see Process for self-hosted macOS MDM migration to Fleet #19387). This server seems like it needs to run indefinitely (until we make a more long-term feature for this?)
  3. How will the scripts be triggered? Is this something that the server becomes responsible for?

@zwass
Copy link
Member

zwass commented Jun 19, 2024

I think we decided to go with fleetdm.com instead of standing up a separate service. Why? So we can reduce surface area and understandability for Fleet contributors.

Does this mean we would be putting customer SCEP cert/keys into fleetdm.com? That sounds pretty risky to me as I'm not aware that fleetdm.com has been designed/audited for storage of customer data (let alone important customer secrets).

Or maybe we are just talking about using fleetdm.com to trigger script execution for the hosts that are expiring? That seems potentially less risky but still something that would need to be well-understood. Would it require API keys for customer Fleet servers?

@noahtalerman
Copy link
Member

Does this mean we would be putting customer SCEP cert/keys into fleetdm.com?

@zwass I don't think so. The enrollment profile would be an environment variable in Heroku. Once the enrollment profile is delivered the host will get the new SCEP cert from the Fleet server

Would it require API keys for customer Fleet servers?

I think so yes. We need the API key to deliver the enrollment profile via the Fleet API. This can be stored an as environment variable in Heroku.

@roperzh please correct me if I'm wrong.

@roperzh
Copy link
Member Author

roperzh commented Jun 19, 2024

Who at Fleet will write the code? Who will maintain the code?
Where will the server be hosted? Who will be responsible for maintaining it?

who's the right person to answer this? don't want it to get lost in the convo

How will the scripts be triggered? Is this something that the server becomes responsible for?

some process needs to run at an interval and send commands, we were thinking this separate server (let's say fleetdm.com) do it

the challenge of building the functionality directly into Fleet is related to crafting the right enrollment profile, we thought that having a separate service gives us freedom to hardcode the profile to the customer's needs.

@noahtalerman maybe the profile could be provided to Fleet itself as a hidden config?


@zwass another option I just thought of: what if the proxy enqueues the command (using Fleet's API) to renew the SCEP certificate the first time it redirects a host to Fleet? this gives us 1 year to properly solve this problem.

@noahtalerman
Copy link
Member

noahtalerman commented Jun 19, 2024

Who at Fleet will write the code? Who will maintain the code?

It's on the drafting board w/ the #g-mdm label. I think let's treat this as all other user stories at Fleet: bring it through estimation and into the next sprint.

Since this it sounds like the next release (2024-07-15) isn't fast enough I started a thread in #g-mdm in Slack here (internal) to chat about priority.

maybe the profile could be provided to Fleet itself as a hidden config?

@roperzh good idea. But is this because of a limitation of Heroku? If not, in order to move quickly, I think let's move forward with the current plan in the issue description.

If folks disagree, please bring jump in tomorrow's MDM design review to discuss.

Once we know what the enrollment profile will look like, we can get @eashaw's help to test. If we learn that using fleetdm.com won't work due to a Heroku limitation then I think we come back to other options.

@roperzh
Copy link
Member Author

roperzh commented Jun 19, 2024

@roperzh good idea. But is this because of a limitation of Heroku? If not, in order to move quickly, I think let's move forward with the current plan in the issue description.

If folks disagree, please bring jump in tomorrow's MDM design review to discuss.

@noahtalerman sounds good! yeah, not a limitation with Heroku, but it might be simpler to run the cron in Fleet because:

  1. It's a single server we have to worry about
  2. It's very easy to add the expiration of the certs to the Fleet DB during the initial migration (vs having a separate db in fleetdm.com)

@zwass
Copy link
Member

zwass commented Jun 19, 2024

what if the proxy enqueues the command (using Fleet's API) to renew the SCEP certificate the first time it redirects a host to Fleet?

This seems possible. Currently there is no state maintained within the migration proxy, but state could be added.

@georgekarrv
Copy link
Member

@georgekarrv
Copy link
Member

Please add your planning poker estimate with Zenhub @jahzielv

@JoStableford
Copy link
Contributor

@georgekarrv georgekarrv added :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. and removed :product Product Design department (shows up on 🦢 Drafting board) labels Jun 24, 2024
@roperzh roperzh assigned roperzh and unassigned georgekarrv Jun 24, 2024
@georgekarrv georgekarrv added this to the 4.54.0-tentative milestone Jun 25, 2024
@roperzh
Copy link
Member Author

roperzh commented Jun 27, 2024

As part of the research for this ticket I:

  1. set up a MicroMDM server behind ngrok using https://roperzh-micromdm.ngrok.io
  2. executed the database statements from the SQL generated by tweaks for SCEP renewals #20035
  3. changed my ngrok config to point https://roperzh-micromdm.ngrok.io to a small proxy that redirects requests from /mdm/checkin and /mdm/connect to my Fleet server /mdm/apple/mdm
  4. downloaded an enrollment profile from my Fleet server, and did the following changes:
    1. change ServerURL to be https://roperzh-micromdm.ngrok.io/mdm/connect (keep the old MicroMDM server URL)
    2. add a CheckInURL next to ServerURL with the value https://roperzh-micromdm.ngrok.io/mdm/checkin
    3. change the root PayloadIdentifier of the profile to be com.github.micromdm.micromdm.enroll
  5. sent an InstallProfile command using the enrollment profile payload

I verified that:

  1. The SCEP cert was renewed (🎉)
  2. The new certificate was issued by Fleet's CA (so the customer doesn't need to keep their old CA around)
  3. The enrollment profile in System Settings > Profiles now shows the host as enrolled by Fleet
image

Action items and stuff to coordinate on:

@zwass
Copy link
Member

zwass commented Jun 27, 2024

@roperzh are you saying you got the enrollment profile replaced without user intervention? I'm not sure I understand how this experiment is connected with the touchless migration experience we are working on with customers.

@roperzh
Copy link
Member Author

roperzh commented Jun 27, 2024

@zwass sorry for not being clear. This is to renew SCEP certificates for migrated devices (which is done by re-delivering the enrollment profile)

The enrollment profile was almost replaced, but three things need to be kept in our particular case:

1. change ServerURL to be https://roperzh-micromdm.ngrok.io/mdm/connect (keep the old MicroMDM server URL)
1. add a CheckInURL next to ServerURL with the value https://roperzh-micromdm.ngrok.io/mdm/checkin
1. change the root PayloadIdentifier of the profile to be com.github.micromdm.micromdm.enroll

@zwass
Copy link
Member

zwass commented Jun 27, 2024

Ah, so enrollment profiles can be redelivered without user intervention as long as the ServerURL and CheckInURL don't change?

@roperzh
Copy link
Member Author

roperzh commented Jun 27, 2024

@zwass exactly! in my notes I have this as the full list of things that can't change:

  • PushTopic
  • ServerURL
  • CheckInURL
  • Enrollment type
  • Access rights

I think the really important findings for us are:

  1. We can switch to a diffrent CA for SCEP certs (in this case Fleet's built-in CA)
  2. We can renew SCEP certificates for migrated devices seamlessly

@zayhanlon
Copy link
Contributor

@roperzh how are we doing on target ETA to get this in a patch next week? thanks :D

@roperzh
Copy link
Member Author

roperzh commented Jun 28, 2024

@zayhanlon thanks for checking, still on track! but please note that the issue w/profiles is probably a bigger blocker. This is majorly a blocker for the prod deploy, the profiles is limiting their testing in staging prior to any production changes.

@zayhanlon
Copy link
Contributor

@roperzh yup! i'm on it - discussing with Noah today

roperzh added a commit that referenced this issue Jul 3, 2024
for #19800

the motivation behind these changes is to support certificate renewals
for hosts that were migrated by inserting enrollment records via a
database migration.

those hosts still have their old enrollment profile installed, so SCEP
renewals need to be handled carefully.


# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- [x] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- [x] Added/updated tests
- [x] If database migrations are included, checked table schema to
confirm autoupdate
- For database migrations:
- [x] Checked schema for all modified table for columns that will
auto-update timestamps during migration.
- [x] Confirmed that updating the timestamps is acceptable, and will not
cause unwanted side effects.
- [x] Ensured the correct collation is explicitly set for character
columns (`COLLATE utf8mb4_unicode_ci`).
- [x] Manual QA for all new/changed functionality
@georgekarrv georgekarrv added :demo and removed :demo labels Jul 12, 2024
@PezHub
Copy link
Contributor

PezHub commented Jul 15, 2024

Paired w/ Roberto to test on his locally setup mircomdm server to ensure the workflow succeeded.

@lukeheath lukeheath added :product Product Design department (shows up on 🦢 Drafting board) and removed :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. labels Jul 17, 2024
@noahtalerman
Copy link
Member

Hey @zayhanlon, this customer-rosner request was shipped in Fleet 4.54.

No docs needed. Customer expect the MDM solution to handle SCEP renewal behind the scenes.

@fleet-release
Copy link
Contributor

Old profiles renew,
Fleet's secure touch in the clouds,
Peace in tech ensues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-rosner customer-starchik #g-mdm MDM product group P2 Prioritize as urgent :product Product Design department (shows up on 🦢 Drafting board) story A user story defining an entire feature
Projects
None yet
Development

No branches or pull requests

10 participants