Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: O3DE Deprecation Strategy #50

Closed
Tracked by #39 ...
hultonha opened this issue Apr 14, 2022 · 13 comments
Closed
Tracked by #39 ...

RFC: O3DE Deprecation Strategy #50

hultonha opened this issue Apr 14, 2022 · 13 comments
Labels
rfc-feature Request for Comments for a Feature

Comments

@hultonha
Copy link
Contributor

hultonha commented Apr 14, 2022

Summary

O3DE needs a deprecation strategy to allow for APIs to be safely changed over time as well as entire features or systems to be removed.

Today there is no specific guidance around how to handle deprecation (current or future breaking changes).

To ensure stability and long-term support (LTS), clear definitions for how deprecation should be performed need to be created.

What is the relevance of this feature?

"Change is the only constant in life."

Heraclitus

Why is this important?

Having a policy in place for handling deprecation is incredibly important to help ensure necessary updates and changes to O3DE do not needlessly break downstream developers. Not doing this could hurt adoption and cause attrition over time.

What are the use cases?

Changes to public APIs can be manged more deliberately and guidance can be provided around how to remove larger features (how to perform this, how to communicate this etc...).

What will it do once completed?

Give developers guidance about how to change and remove features without negatively impacting customers. There should be a framework/run-book to follow for a variety of different situations to help steer developers down the right path.

Feature design description

A deprecation guide/policy will be created for developers to follow when making changes to O3DE. This guide will most likely exist on the O3DE wiki and will be a living document that grows and evolves overtime to meet the needs of O3DE and its customers.

Technical design description

Note: The technical design comes in part from PR-23 which was a first attempt at setting out guidance around how to handle deprecation.

Process Owner: sig-release

How to request changes
  - For a small change (e.g. spelling/grammar/formatting), please create a GitHub issue (or create a PR against the repo with the fix).
  - For a large change (e.g. changes to process), please create an RFC with the proposed change and reasons why (see RFC template for more details).

Deprecation Types
  - API Deprecation
  - Feature/System Deprecation

API Deprecation
  - Examples
    - Renaming a function
    - Renaming a class/EBus
    - Changing a function signature
    - Removing a function
  - Steps
    - Identify the code to be deprecated (something resembling one of the API deprecation examples outlined in the Overview)
    - Create a new GitHub issue with the name "Deprecate <Function/Type Name>"
      - Add the "kind/deprecation" label
      - In the Description, add a preliminary Release Note (what do you want to say to the customer about this change)
      - Add a milestone label for which release this deprecation will be part of
    - Add an `O3DE_DEPRECATION_NOTICE(<GHI Number>)` comment right above the API call
      - e.g.
        // O3DE_DEPRECATION_NOTICE(GHI-1234)
        AZ::Entity* CreateEditorEntity(const char* name) = 0;
    - Add an announcement to the Discord Impactful Change channel
  - (<additional guidance to follow, see https://github.com/o3de/sig-release/pull/23 for more details at time of publication>)
  - Tracking
    - sig-release can filter GHIs by date and label (kind/deprecation) to author Release Notes
  - Advice
    - Use of the `AZ_DEPRECATED` macro is not allowed when first deprecating an API
    - It is required to first use the comment identifier `O3DE_DEPRECATION_NOTICE`, and a second pass to add `AZ_DEPRECATED`
  - Note: In C++ code, `@deprecated` may be used to highlight deprecation changes in the API reference. This is not widely supported though and is not a requirement at this time

- System/Feature Deprecation
  - Create an RFC to notify the system to be deprecated
    - Include why it's being removed and identify alternatives if customers depend on X feature
    - As part of the RFC determine a deprecation schedule
    - Add who (e.g. sig responsible) to contact
    - Add an announcement to the Discord Impactful Change or relevant sig channel (when publishing the RFC and when it is either approved or rejected)
      - Include an announcement at initial proposal time and at the time of removal
    - Deprecations should also form part of the release notes and be highlighted in a features documentation
    - (<additional guidance to follow>)

- Philosophy
  - When modifying existing code that external customers rely on we should strive wherever possible to not break them when making changes.
  - If code is to be changed or removed, we should let customers know ahead of time with a reasonable notice period (approximately 1 releases (2-3 months))

- Migrations/Backwards Compatibility
  - For changes to underlying data formats, it is the developers responsibility (where possible) to provide a migration/upgrade process to allow customers to move to the new format with as little overhead as possible (e.g. Version Converters used with the Serialize Context and Object Stream in C++).
  - Note: This advice applies for changes as opposed to total removals. If it is not practical or possible to provide an automated conversion, guidance/steps should be documented to help users adjust to the change manually.
  - For more information on data migrations, please see this document `<GUIDE ON DATA MIGRATIONS - TBD>`

- Stable vs Volatile APIs
  - For new code with few users depending on it, much weaker guarantees can be made about breaking changes.
  - The source of truth for this is the SIG Feature Matrix for a given feature. If a new feature is under heavy development and no promises have yet been made about API stability, the deprecation policy does not apply. If however an API is mature and widely used, an unannounced breaking change is not acceptable. The policy of first documenting/announcing a deprecation, and then following-up with a change in future to give customers time to upgrade is required. 
  - Note: For improved visibility, owners of the SIG feature matrix should include file paths in the notes section that are explicitly excluded from the deprecation policy if the API is volatile.

What are the advantages of the feature?

We clearly define how developers should handle deprecation and set expectations from customers about the stability (in terms of change) of the engine they're using.

What are the disadvantages of the feature?

Being more careful and deliberate around deprecation does come with a cost to feature delivery. It will slow things down and potentially make making changes more difficult. This impacts internal O3DE development, and to an extent customers looking for future updates.

It is possible time may be wasted on unnecessary care given to deprecation that has minimal impact (if their are no downstream customers of a particular feature or API simply changing/deleting the code and sidestepping this whole process may be simpler/smarter).

Given the current lifetime of O3DE (less than a year after initial preview release) it may be too soon to commit to a deprecation policy (as few systems have fully settled/matured). This might be a policy that we define, but do not put into practice until churn on engine systems is a lot less.

How will this be implemented or integrated into the O3DE environment?

N/A

Are there any alternatives to this feature?

An alternative is to simply do nothing, but long term this will cause more harm than good (even factoring in the potential impact to velocity that comes with being more careful about making breaking changes).

It is likely the impact of constant breaking changes for customers will lead to frustration and them eventually leaving O3DE.

How will users learn this feature?

The guide will be highlighted on the O3DE wiki and publicized in Release Notes

Are there any open questions?

  • We still need to better define the deprecation steps (see Technical design description section)
  • When should we instate this policy
  • How does deprecation align with releases (especially given a slow release cadence)
@hultonha hultonha added the rfc-feature Request for Comments for a Feature label Apr 14, 2022
@lsemp3d
Copy link
Contributor

lsemp3d commented Apr 14, 2022

There are cases in which it may be necessary to maintain some backwards compatibility, in this cases it's necessary to implement version conversion to allow for data migration between versions. The deprecation strategy may need to provide a documentation on the deprecation mechanics (i.e. version converters). This is most often required when deprecating code that is reflected for scripting, but other elements require data migration across versions as well.

@lmbr-pip
Copy link
Contributor

I think we need some slightly clearer guidance around time frame for applying O3DE_DEPRECATION_NOTICE. For example, how long can an API be marked as deprecated? Reading between the lines is seems that we are saying you mark in one release and remove in the next, but its unclear if thats the expectation. In addion, devs currently do not know the release schedule for O3DE, solvable but we do need a clear source of truth here. Also is there a minimum deprecation notice period?

We should also probably call out that Feature deprecation should also be announced in the TSC channel and if cross-cutting its responsibility to get cross SIG sign-off.

@cgalvan
Copy link
Contributor

cgalvan commented Apr 14, 2022

This will hopefully be helpful in identifying the scope of users impacted by the deprecation of APIs/systems. In the case of newer APIs/systems, we would probably be able to iterate quicker (as you stated). Whereas with the older systems there is probably a mix of things that are simply now unused vs. ones that our users still rely on. This should give us the data to make informed decisions, rather than guessing at what things users are/aren't using.

@AMZN-daimini
Copy link
Contributor

Tangential to this, but I think we should also have a better way to handle editor versions as requirements for Gems activation. Right now, if an API is available up until (for example) v21.11 but has then been removed in 22.05.0, where is no way for Gems to enforce or even recommend a minimum or compatible version that could guide the user in understanding which version of a Gem will work with which version of the Editor. Guidelines for deprecation should likely fit into this picture as well.

@willihay
Copy link
Contributor

willihay commented Apr 14, 2022

The guide will be highlighted on the O3DE wiki and publicized in Release Notes

We should also reference this guide in the "code" section of the Contributor Guide.

Also, documentation should be addressed in this strategy. Considerations include:

  1. When should developers highlight a deprecation using Doxygen-style comments so that it appears in the API reference? What should be included? Should we have a standard format for this?
  2. Should we require deprecation notice banners or callouts in the documentation, where applicable - on feature docs and tutorials, for example? If we use deprecation notices in documentation, these should be regularly reviewed for accuracy, perhaps as part of the release process.
  3. If we do add deprecation notices to documentation, what happens when the status changes from a "notice" to a fully deprecated API or feature? Does the documentation just get updated and the banner or callout removed? Does this guidance change when we start adding branches in the website repo for released versions, in support of LTS?

@srikappa-amzn
Copy link

It'd be nice to have some more clarity on how long to wait between each step for the deprecation. For example, if I want to deprecate a functionA and move its dependencies to functionB, how long should I wait between different steps. If we decide releases to be the deciding factor, the deprecation schedule would look like:

  1. Add O3DE_DEPRECATION_NOTICE to functionA for ReleaseInMonthX (This will just be a comment on the function)
  2. Use AZ_DEPRECATED for functionA and direct functionality to functionB for ReleaseInMonthY(This can change expected behavior)
  3. Remove the code for functionA for ReleaseInMonthZ

Waiting for releases to remove code could be a lot of wait time particularly if we want to move to step 2 sooner because sometimes its necessary to change behavior to unblock other features. So maybe the recommended wait time between each step should be something like "'n' number of days or next release(whichever comes sooner)" ?

@hultonha
Copy link
Contributor Author

Hello everyone!

Sorry I'm just now responding to the RFC feedback, I really appreciate everyone who has so far taken a look and given feedback. I'll go through each person in turn...


@lsemp3d - backwards compatibility

That's a great call-out, it might not be something the deprecation doc itself covers, but there should be a link to a 'data migration' document that discusses how to do this (as data migration could happen independently of deprecation, they are orthogonal concepts).

Unfortunately I do not think such a system or document yet exist (at least for JSON, we could still mention the ObjectStream version converters). I'll include a missing link section to address this and work with the team owning this to add such a guide when possible.


@lmbr-pip - time-frame for O3DE_DEPRECATION_NOTICE

Also a good point, this came up in a recent TSC meeting and the minimum window for deprecation was agreed to be around the 6 month mark. This would be time from announcing, to time from removing, and would most likely coincide with a release. Anything longer than this starts to impact velocity.

Fully agree we should include a recommendation to announce the deprecation (in the form of an RFC as well as Discord and TSC/SIG involvement).


@cgalvan - scope of impact

Great point, I think we need clearer ways to identify stable code vs volatile code. The mechanism we have for this right now (not mentioned in the RFC as of yet) is each SIG's feature matrix. This should be kept up to date with the volatility of an API/system so people are aware breaking changes may occur without a full deprecation path being followed. I'll try and make this clearer in the RFC and also mention adding code paths (a little like what the .codeowners file now does) could be helpful to let people know more precisely what the code areas to be mindful of are.


@AMZN-daimini - editor versions

This is a good call-out but I fear potentially out of scope of the RFC and overall policy. I'd be grateful for guidance from sig-core here who I believe own the Gem architecture and dependencies. I'll make a mention of this in the RFC, but I think a run-book or additional information should be kept with the Gem documentation itself rather than at the broader deprecation policy level (though I accept there is an overlap).


@willihay - documentation and contributor guide

I totally agree we want to reference the deprecation policy/guidance in the contributor guide. Once it's created I will be sure to include a link there.

For 1, this is addressed with the O3DE_DEPRECATION_NOTICE. For C++ code specifically we can mention the @deprecated markup, but I did not want to include that until we're further along with our use of Doxygen itself. I think this can be added later when we have the full API from Doxygen published (hopefully soon! 🙂). For 2, I think yes for larger features (or even some sub-systems) but not for smaller API changes (that might impact developers but not users). Definitely worth mentioning for larger deprecation changes. And with 3, I imagine the text would just be deleted if applicable as it would not longer be relevant (this is why tracking when code should be and is removed is so important - we can hopefully do this with GHIs for tracking). We would likely want a release note to mention the removal, but the docs should just no longer include any mention of the deprecated system/feature (similar to what happens in the code).


@srikappa-amzn - timings for deprecation

This point was also raised by @lmbr-pip. The general consensus on the TSC was a minimum of 6 months from announcing a deprecation to removing it (this means it would likely straddle two releases in practice). If we take the three pronged approach you describe (which was the policy for Lumberyard) it means we could be looking at > 1 year for a deprecation to happen. So removing something across two releases may be acceptable (depending on the feature or API being deprecated). So yes you're right, the deciding factor will ultimately be calendar time, not releases.


I hope that covers the feedback so far - I haven't yet updated the body of the text to reflect the suggestions but I'll do this very soon. If anyone else has any thoughts please let me know! Thank you again for your time! 👍

@greerdv
Copy link

greerdv commented Jun 28, 2022

As well as considering how stable/volatile the API is, I think it could be useful to think about how big the impact will be in terms of blast radius. For example, if you changed something in AzCore, the blast radius would be huge, but some changes may be much more contained in their potential impact. It might be nice to be able to expedite the deprecation process if the blast radius is small enough (maybe a sig chair could make a ruling on that).

Another idea is that for changes with low impact, if it's possible to provide a script that can perform any necessary migration, that might be another case where the deprecation policy could be relaxed a bit.

@hultonha
Copy link
Contributor Author

As well as considering how stable/volatile the API is, I think it could be useful to think about how big the impact will be in terms of blast radius. For example, if you changed something in AzCore, the blast radius would be huge, but some changes may be much more contained in their potential impact. It might be nice to be able to expedite the deprecation process if the blast radius is small enough (maybe a sig chair could make a ruling on that).

Another idea is that for changes with low impact, if it's possible to provide a script that can perform any necessary migration, that might be another case where the deprecation policy could be relaxed a bit.

This is a great callout @greerdv, I'll update the text in the code block to reflect this and will try and discuss this with sig-release soon. Cheers!

@tonybalandiuk
Copy link
Contributor

I'm supportive of this (I won't be in the meeting next week). I think we need to find the right place for this to live. I'm less concerned about getting this perfect right now, and more concerned about getting something in place. We need to think of our processes as living, breathing processes. Adding a bit of feedback on the issue itself.
I would like to see a sentence at the top of the process explaining which SIG owns this process and how one can go about requesting a change to the process (especially if it doesn't live in a wiki).

@hultonha
Copy link
Contributor Author

I'm supportive of this (I won't be in the meeting next week). I think we need to find the right place for this to live. I'm less concerned about getting this perfect right now, and more concerned about getting something in place. We need to think of our processes as living, breathing processes. Adding a bit of feedback on the issue itself. I would like to see a sentence at the top of the process explaining which SIG owns this process and how one can go about requesting a change to the process (especially if it doesn't live in a wiki).

Thank you very much for the feedback @tonybalandiuk, much appreciated. I can definitely add the section about the sig that owns the process (sig-release) and how changes can be requested (small - issue on sig-release repo, large - RFC on sig-release repo).

Thanks!

@sptramer
Copy link

sptramer commented Jul 19, 2022

Discussing from an implementation angle in sig-release on 7/19 - My experience shows that the best way to do regression testing to ensure lack of API breakage (which is only part of making sure deprecation practices are followed). These tests don't do much other than:

  • Configure preconditions for calling an API
  • Call the API in all declared, stable forms
  • Check postconditions and return values for completeness rather than correctness (correctness tests are part of unit tests outside of API compliance, or part of integration tests that examine side-effects and interactions.)

Normally this is done with a mocking layer behind the API interface, but I'm unsure if that's the best approach for O3DE.

@Ulrick28
Copy link
Contributor

Ulrick28 commented Jul 19, 2022

This has been reviewed and approved by sig-release. Next step is to create a PR in the community repo. Also for next steps is to discuss with TSC making this required reading for reviewers and maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc-feature Request for Comments for a Feature
Projects
None yet
Development

No branches or pull requests