Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the instructions for upgrade of LLVM/Clang. #1146

Merged
merged 8 commits into from
Aug 10, 2021

Conversation

sulekhark
Copy link
Contributor

This PR updates the instructions to perform an LLVM/Clang upgrade. It also contains a new file called LLVM-Upgrade-Notes.md to track important information related to upgrades.

Also added a new file LLVM-Upgrade-Notes.md to track important
information related to upgrades.
@sulekhark sulekhark requested review from kkjeer and mgrang August 5, 2021 18:32
You can now branch your baseline branches to create a new master branch:

git checkout -b updated-master
git merge updated_baseline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git merge updated_baseline
git merge --ff-only updated_baseline

We expect updated_baseline to be ahead of baseline. If it's not, we should flag the problem for investigation rather than generate a merge commit.

branch into `master` and push to the remote repository.
```
git checkout master
git merge updated_baseline_master_12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git merge updated_baseline_master_12
git merge --ff-only updated_baseline_master_12

You probably also want to make sure that this command advances master exactly to updated_baseline_master_12 (which was tested in the previous step) rather than generating a merge commit (which could fail the tests if Checked C feature development on master interfered somehow with the baseline upgrade).

which was merged into `updated_baseline_master_12`.
- The pristine LLVM/Clang source at <branch-point-of-11-on-main>
- The pristine LLVM/Clang source at <branch-point-of-12-on-main>

Copy link
Contributor

@mattmccutchen-cci mattmccutchen-cci Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wholeheartedly agree with the value of being able to see all three versions of a file involved in a merge. However, the description of the three versions here is accurate only for the initial merge from LLVM 11 to 12 in phase 1 step 6. For a merge of additional Checked C feature development in phase 1 step 8, the relevant versions are master, the merge of the previous master with the baseline LLVM 12 source that was completed in step 6 or an earlier iteration of step 8, and the previous master that was merged.

We are already assuming that git merge correctly identifies the three file versions to merge for files without conflicts, so for files that have pending conflicts, it is probably best to view the three file versions already identified by Git rather than try to look them up manually (which could be prone to mistakes/misunderstandings). I know of a few ways to do that, which might be worth documenting in this file:

  1. Best: Run git mergetool, assuming you have a suitable merge tool installed and configured. I currently use kdiff3, but I don't know if it works on Windows or if other tools are better.
  2. Set git config merge.conflictstyle diff3 before running git merge so that the conflict markup written to the file shows all three versions rather than just two. I recommend leaving this option on all the time; I've found that the extra information never hurts.
  3. Run git show :1:clang/path/to/some/File.cpp >clang/path/to/some/File.1.cpp (and similarly with 2 or 3 in place of 1) to write out the three versions for inspection with other tools.

```
9. After all the merge conflicts and test failures on the local machine are
fixed, push the `baseline` and `updated_baseline_master_12` branches to the
remote repository and excute the ADO tests on `updated_baseline_master_12`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
remote repository and excute the ADO tests on `updated_baseline_master_12`.
remote repository and execute the ADO tests on `updated_baseline_master_12`.

**Phase 1**: We upgrade the `master` branch of the `checkedc-clang` repository
up to the commit hash on the `main` branch of the LLVM/Clang repository at which
the branch `release/12.x` is created - we shall refer to this commit hash as
<branch-point-of-12-on-main>. Note that <branch-point-of-12-on-main>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current LLVM practice seems to be to create a tag llvmorg-13-init referring to the commit after the branch point for the LLVM 12 release, so you should be able to use llvmorg-13-init^ as a commit expression referring to the branch point (^ means "previous commit") rather than looking up the commit ID manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not looking up the commit ID manually - we used the command git merge-base main release/12.x in a clone of the pristine LLVM/Clang repository. I have updated this in the document (I had missed it earlier).

Additionally,

  1. the commands git merge-base main release/12.x and git show llvmorg-12-init^ show different commit hashes, and
  2. we want <branch-point-of-12-on-main> to be a commit hash on the main branch.

Therefore, the document currently does not recommend the use of llvmorg-12-init^ as <branch-point-of-12-on-main>. We may investigate this further during the next LLVM/Clang upgrade.

All other review comments have been incorporated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the commands git merge-base main release/12.x and git show llvmorg-12-init^ show different commit hashes, and

That should be git show llvmorg-13-init^, which is indeed the same as git merge-base main release/12.x. Still, git merge-base main release/12.x may be a clearer way to express what we want; good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! Thanks for the clarification.

Comment on lines 147 to 150
the 12.0.0 release. For this, we need to get the commit hash associated with the
12.0.0 release from https://github.com/llvm/llvm-project/releases. Let this
commit hash be &lt;12.0.0-commit-hash&gt;. Note that it should be the full
commit hash. Resolve merge conflicts and test failures if any.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the 12.0.0 release. For this, we need to get the commit hash associated with the
12.0.0 release from https://github.com/llvm/llvm-project/releases. Let this
commit hash be &lt;12.0.0-commit-hash&gt;. Note that it should be the full
commit hash. Resolve merge conflicts and test failures if any.
the 12.0.0 release, given by the `llvmorg-12.0.0` tag. Resolve
merge conflicts and test failures if any.

Another case where you can use a Git ref provided by LLVM rather than manually looking up a commit ID.

commit hash. Resolve merge conflicts and test failures if any.
```
git remote add upstream https://github.com/llvm/llvm-project.git
git pull upstream <12.0.0-commit-hash>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git pull upstream <12.0.0-commit-hash>
git pull upstream llvmorg-12.0.0

Copy link
Contributor

@mattmccutchen-cci mattmccutchen-cci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is good work! I identified a few more issues.

```
git checkout master
git merge updated_baseline_master_12
git merge -ff-only updated_baseline_master_12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git merge -ff-only updated_baseline_master_12
git merge --ff-only updated_baseline_master_12

Comment on lines 218 to 222
Checked C feature development in phase 1 step 8, the relevant versions are a
snapshot of branch `updated_baseline_master_12` after step 6 or an earlier
iteration of step 8 (i.e. the common ancestor), a snapshot of the current
`master` branch that is being merged, and a snapshot of the current
`updated_baseline_master_12` branch before starting the merge.
Copy link
Contributor

@mattmccutchen-cci mattmccutchen-cci Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this description of the relevant versions is correct. See this diagram of two successive merges:

LLVM 11 -> master (step 6)      -------------------------> master (step 8)
   |              |                                               |
   v              v                                               v
LLVM 12 -> ubm12 (after step 6) -> ubm12 (after step 7) -> ubm12 (after step 8)

IIUC, the versions in the document correspond to "ubm12 (after step 6)", "master (step 8)", and "ubm12 (after step 7)". But from the diagram, it seems clear that the common ancestor is "master (step 6)" and the other two versions are "master (step 8)" and "ubm12 (after step 7)".

Perhaps this illustrates the point that we should use the three versions identified by Git rather than trying to determine them manually and potentially making mistakes. (Update: In fact, I made a mistake above! I think I fixed it now.) I realize now that it's possible to get the commit IDs during the merge via $(git merge-base HEAD MERGE_HEAD), HEAD, and MERGE_HEAD. So we can still "use source snapshots" but query the commit IDs from the in-progress merge rather than determine them manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true! I have incorporated your review comments. Thanks for the review!

Copy link
Contributor

@mattmccutchen-cci mattmccutchen-cci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more minor issues I noticed. Otherwise, the discussion of all the topics that I'm familiar with (generally, the checkedc-clang branching strategy and related procedures) looks good. There are some topics specific to the subject matter of checkedc-clang development that I'm not qualified to review.

Comment on lines 191 to 194
CONFLICT (rename/delete): ... in master renamed to ... in HEAD. Version HEAD ...
left in tree.
CONFLICT (rename/delete): ... in master renamed to ... in HEAD. Version master
... left in tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format this as a code block rather than letting the two lines run together in the paragraph?

repository.
```
git fetch upstream <cherry-pick-branch>
git cherry-pick <commit-to-cherry-pick>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git cherry-pick <commit-to-cherry-pick>
git cherry-pick -x <commit-to-cherry-pick>

Recording (cherry picked from commit X) may be helpful. I think your team has done this in other contexts.

Copy link
Contributor

@mgrang mgrang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@kkjeer kkjeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small typos - otherwise LGTM!

LLVM/Clang repository:

- Commit hash 0024efc69ea6cd0b630cd11cef5991b7edb73ffc on the `main` branch
of the upstream LLVM/Clang repo (See LLVm/Clang bug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LLVm => LLVM

git remote add upstream https://github.com/llvm/llvm-project.git
git pull upstream <branch-point-of-12-on-main>
```
3. Execute LLVM/Clang tests on the branch `updated_baseline` on your local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra space between "Execute" and "LLVM/Clang"

@sulekhark sulekhark merged commit 363760d into master Aug 10, 2021
@sulekhark sulekhark deleted the llmv_upgrade_doc_12 branch August 10, 2021 20:25
arbipher added a commit that referenced this pull request Aug 26, 2021
* Revert "[BoundsWidening] Determine checked scope specifier per statement (#1139)" (#1141)

This reverts commit 980321d.

* Determine checked scopes per statement (#1142)

We introduce a 2-bit field called CheckedScopeSpecifier in the Stmt class.
During parsing when a compound statement is created we iterate the elements
(statements) of the compound statement and set the checked scope specifier for
each element to the checked scope specifier of the compound statement.

We can get the checked scope specifier for a statement by calling the
getCheckedScopeSpecifier method on the statement.

* Update the instructions for upgrade of LLVM/Clang. (#1146)

* Updated the instructions for upgrade of LLVM/Clang.
Also added a new file LLVM-Upgrade-Notes.md to track important
information related to upgrades.

* Fixed typos.

* Addressed review comments.

* Fixed an inadvertent deletion.

* Addressed review comments.

* Incorporated review comments.

* Fixed minor typos.

* Fixed typos.

* Add new flags for available facts analysis

* Add the analysis into the build script and the sema bounds

* Add utility functions to check whether a var is used in a Expr and a BoundsExpr

* Add AbstractFact as a basic available fact;
Add InferredFact and adjust WhereClauseFact to be a subclass of AbstractFact

* Add data structures used in the analysis

* Add print and dump functions

* Add utility functions which are also used by BoundsWideningAnalysis.

* Add other utility functions.

`IsSwitchCaseBlock`: use `dyn_cast_or_null` to cover the null pointer
case.

`ConditionOnEdge`: do not test if there is no edge between
pred to curr since it will only be called if there is an edge.

`GetModifiedVars`: use `TranspareCasts` to bypass some casting.
The feature to deal with membership access and the array indexing is
still TODO.

* Add fact comparision and fact-realted set oerations (contains TODO).

* Add testscases (one covers basic features and the other is converted
from the previous available facts analysis)

* Dataflow analysis: Add statement-based Gen/Kill.

* Dataflow analysis: Add block-edge-based Gen set.

* Dataflow analysis: Add function to compute In and Out set.

* Dataflow analysis: Addworklist algorithm.

* Add desctrutors to release the memory

* Fix: modify the Gen/Kill rules to match the design doc;
It also fixes a bug to visit dead blocks.

* Cleanup comments

* Fix: use the exisiting functions to find a `VarDecl` in an expr

* Change the equal check on fact collections to equal size check

* Update the testcases with the updated Gen/Kill

* Remove debug flag for available facts.

* Use lexco-compare for `EqualityOpFact` and `InferredFact`.

* Add a map to store the comparision results of facts.

* Change the source location of a fact to its near expr.

* Use a dedicated list to collect created facts and clean them finally.

* Verify if an expr contains errors before checking invertibility  (#1154)

The community has introduced a new annotation called "contains-errors" on AST
nodes that contain semantic errors. As a result, after the upgrade of Checked C
sources to LLVM 12 we need to check if an expr contains errors before operating
on the expr. One such place is in InverseUtil::IsInvertible where we need to
check if the input modifying expr contains errors.

* Added containsErrors checks to InverUtil::Inverse

* [BoundsWidening] Handle complex conditionals in bounds widening (#1149)

Support bounds widening in presence of complex conditionals like:
  "if (*p != 0)", "if ((c = *p) == 'a')", etc.

* Don't record temporary equality between expressions such as x and x + 1 in TargetSrcEquality (#1162)

* Add AllowTempEquality parameter to RecordEqualityWithTarget

* Use a ModifiedSameValue variable to determine the return value for UpdateSameValueAfterAssignment

* Rename ModifiedSameValue to RemovedAnyExprs and clean up comments

* Treat address-of array subscripts the same way as address-of dereferences (#1163)

* In CheckAddressOfOperand, add case for address-of array subscripts to C99-specific logic

* Move address-of array subscript check after other checks such as taking the address of an lvalue

* Adjust expected AST output to account for different types of address-of array subscripts

* Restore deleted comment about checking for array subscript expressions

* Add comment explaining the placement of the address-of array subscript logic

* Put &e1[e2] typing rules under a Checked C flag

* Update the available facts analysis.

Co-authored-by: Mandeep Singh Grang <magrang@microsoft.com>
Co-authored-by: Sulekha Kulkarni <Sulekha.Kulkarni@microsoft.com>
Co-authored-by: Katherine Kjeer <6687333+kkjeer@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants