Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unused variables linter #1159

Merged
merged 11 commits into from
Jun 3, 2022
Merged

Conversation

larsk21
Copy link
Contributor

@larsk21 larsk21 commented May 19, 2022

Add a linter for reporting unused variables in Lean code using Lean's linter API. A variable is reported as unused if it has a declaration, but no uses, except when

  • its name starts with an underscore
  • it is defined with the variable command
  • it appears in the signature of a constructor or structure field
  • it appears in the signature of a constant without a definition or an axiom
  • it appears in the signature of a function that is defined elsewhere (extern, implementedBy)

@larsk21
Copy link
Contributor Author

larsk21 commented May 19, 2022

Examples of unused variables in this repo

Whether all of these should be reported as unused is open for debate.

unnecessary variables:

-- src/Lean/Elab/PreDefinition/Structural/IndPred.lean
def mkIndPredBRecOn (recFnName : Name) (recArgInfo : RecArgInfo) (value : Expr) : M Expr := do
  ...
  forallBoundedTelescope FType (some 1) fun below _ => do
    let main ← mkFreshExprSyntheticOpaqueMVar FType
     -- ^
    let below := below[0]
    let valueNew     ← replaceIndPredRecApps recFnName recArgInfo motive value
    let Farg         ← mkLambdaFVars (recArgInfo.indIndices ++ #[major, below] ++ otherArgs) valueNew
    let brecOn       := mkApp brecOn Farg
    return mkAppN brecOn otherArgs

unused pattern variables:

-- src/Lean/Data/Name.lean
def updatePrefix : Name → Name → Name
  | anonymous, newP => anonymous
            -- ^
  | str p s _, newP => Name.mkStr newP s
     -- ^
  | num p s _, newP => Name.mkNum newP s
     -- ^

unused loop variables:

-- src/Lean/Elab/BuiltinCommand.lean
private def popScopes (numScopes : Nat) : CommandElabM Unit :=
  for i in [0:numScopes] do
   -- ^
    popScope

unused function arguments:

-- src/Lean/Elab/Tactic/Simp.lean
def dsimpLocation (ctx : Simp.Context) (fvarIdToSimp : FVarIdToLemmaId := {}) (loc : Location) : TacticM Unit := do
  ...
where
  go (fvarIdsToSimp : Array FVarId) (simplifyTarget : Bool) (fvarIdToSimp : Lean.Meta.FVarIdToLemmaId) : TacticM Unit := do
                                                          -- ^
    let mvarId ← getMainGoal
    let result? ← dsimpGoal mvarId ctx (simplifyTarget := simplifyTarget) (fvarIdsToSimp := fvarIdsToSimp)
    match result? with
    | none => replaceMainGoal []
    | some mvarId => replaceMainGoal [mvarId]

variables that were meant to be used:

-- src/Lean/Data/Json/Basic.lean
def setObjVal! : Json → String → Json → Json
  | obj kvs, k, v => obj <| kvs.insert compare k v
  | j      , _, _ => panic! "Json.setObjVal!: not an object: {j}"
 -- ^                                                         ^ (missing `s!`)

-- src/Lean/Data/JsonRpc.lean
instance : ToString RequestID where
  toString
  | RequestID.str s => s!"\"s\""
               -- ^         ^ (probably missing `{}`)
  | RequestID.num n => toString n
  | RequestID.null => "null"

variables that are used in a structure constructor, but actually unnecessary:

-- src/Lean/Compiler/IR/Sorry.lean
def updateSorryDep (decls : Array Decl) : CompilerM (Array Decl) := do
  let (_, s) ← Sorry.collect decls |>.run {}
  return decls.map fun decl =>
    match decl with
    | Decl.fdecl f xs t b info =>
                       -- ^
      match s.localSorryMap.find? f with
      | some g => Decl.fdecl f xs t b { info with sorryDep? := some g }
                                     -- ^ (`sorryDep?` is the only field of the structure)
      | _ => decl
    | _ => decl

variables that are used as documentation in dependent arrows:

-- src/Lean/Elab/PreDefinition/WF/Fix.lean
private partial def processSumCasesOn (x F val : Expr) (k : (x : Expr) → (F : Expr) → (val : Expr) → TermElabM Expr) : TermElabM Expr := do
                                                          -- ^            ^            ^
  ...

variables that are used as documentation in Syntax patterns ($_ is not allowed):

-- src/Lean/Elab/Match.lean
private def getMatchGeneralizing? : Syntax → Option Bool
  | `(match (generalizing := true)  $[$motive]? $discrs,* with $alts:matchAlt*) => some true
                                    -- ^         ^              ^
  | `(match (generalizing := false) $[$motive]? $discrs,* with $alts:matchAlt*) => some false
                                    -- ^         ^              ^
  | _ => none

Copy link
Member

@Kha Kha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Apart from the dependent arrows that we discussed before, to me all the examples are either clear wins or very reasonable to change.

src/Lean/Linter/Basic.lean Outdated Show resolved Hide resolved
src/Lean/Linter/Basic.lean Outdated Show resolved Hide resolved
src/Lean/Linter/Basic.lean Outdated Show resolved Hide resolved
src/Lean/Linter/Util.lean Outdated Show resolved Hide resolved
src/Lean/Linter/Basic.lean Outdated Show resolved Hide resolved
@larsk21 larsk21 force-pushed the linter/unused-variables branch 2 times, most recently from 3dca07e to f8b1b21 Compare May 23, 2022 16:21
@larsk21 larsk21 requested a review from Kha May 25, 2022 20:19
@Kha
Copy link
Member

Kha commented May 26, 2022

Leo pointed out the following special case to me, where at least currently there is no connection between the different h₂s in the info tree:

theorem foo (h₁ : x = 0) (h₂ : y = x) : y = 0 := by
  subst h₁
  -- We now have `h₂ : y = 0`
  exact h₂

This is probably an issue for quite a few (and potentially user-written) tactics, so I'm wondering if we should look for references on the Expr level after all. Perhaps in combination with the References module so as to avoid edge cases like

let mut x := 0  -- technically dead, but perhaps not helpful as an "unused variable" lint
if b then x := 1 else x := 2

In case we expect that there may be further issues in practice, there is also the option of merging the lint as opt-in for now so that people can test-drive it (which I assume can be done very easily with Lake's moreLean/ServerArgs) and we can further improve it without annoying too many people at the same time. In that case I'm fine with merging it as is.

src/Lean/Linter/Basic.lean Outdated Show resolved Hide resolved
@gebner
Copy link
Member

gebner commented May 26, 2022

This is probably an issue for quite a few (and potentially user-written) tactics, so I'm wondering if we should look for references on the Expr level after all.

Oh I didn't realize this didn't even look at the expression. Then there's of course lots of obvious corner cases, here are a few off the top of my head that I haven't seen mentioned so far:

def f (x : Nat) := ‹_›  -- false positive
def g [Inhabited α] := 42  -- false negative
def h :=
  let inst : Inhabited Nat := ⟨42-- false positive
  (default : Nat)

@larsk21
Copy link
Contributor Author

larsk21 commented May 27, 2022

Oh I didn't realize this didn't even look at the expression. Then there's of course lots of obvious corner cases, here are a few off the top of my head that I haven't seen mentioned so far:

def f (x : Nat) := ‹_›  -- false positive
def g [Inhabited α] := 42  -- false negative
def h :=
  let inst : Inhabited Nat := ⟨42-- false positive
  (default : Nat)

I agree that the α should be marked as an unused variable. For the other two, I'm unsure if these are false positives, since we could instead write:

def f (_ : Nat) := ‹_›
def h :=
  have : Inhabited Nat := ⟨42⟩
  (default : Nat)

The first example is similar to this test where HQ is also marked as unused. Currently the unused marker doesn't always mean that the value of the variable is not needed, but that variable is never referenced by name.

theorem implicitlyUsedVariable : P ∧ Q → Q := by
intro HPQ
have HQ : Q := by exact And.right HPQ
assumption

One problem is that this interpretation doesn't work when a variable is never referenced by name, but the name is still relevant:

theorem foo (h₁ : x = 0) (h₂ : y = x) : y = 0 := by
 subst h₁
 -- We now have `h₂ : y = 0`
 exact h₂

@gebner
Copy link
Member

gebner commented May 27, 2022

[...] we could instead write:

def f (_ : Nat) := ‹_›

This is a semantic change, with the previous version you could write f (x := 42), with this one you can't. Argument names are, for better or for worse, part of the public API.

def h :=
  have : Inhabited Nat := ⟨42⟩
  (default : Nat)

Usually you need let instead of have for local instances. Unfortunately let : Inhabited Nat := ⟨42⟩ doesn't work, but that's fixable.

@larsk21
Copy link
Contributor Author

larsk21 commented May 27, 2022

Argument names are, for better or for worse, part of the public API.

I agree. As it is now, the linter reports function arguments as unused like any other variable - maybe I should change this.

@Kha
Copy link
Member

Kha commented May 27, 2022

Right, there are two slightly different potential lints: a) "this binder can be safely removed (it is unused)" and b) "this binder can be safely made anonymous (its name is unused)". b) is, as discussed, problematic in public signatures, and not really solvable for some tactics currently. So we should probably focus on a) using a combination of lexical and Expr-level references.

@Kha
Copy link
Member

Kha commented May 27, 2022

I'm wondering if we should look for references on the Expr level after all.

Easier said than done, of course. There is no direct association between the free variables from the info tree and the bound variables in the final term. However, since the above issues are mostly about tactics, it might be sufficient to go through all the terms assigned to TacticInfo mvars. We should make sure not to visit any mvar twice though to avoid quadratic overhead.

That would leave the instance binders. I don't immediately see a better solution than just whitelisting them.

@Kha
Copy link
Member

Kha commented Jun 1, 2022

Excellent work! Now I really think we should just merge it and then await feedback by users. Even for the many warnings in the stdlib, I can't yet tell if I would be annoyed enough by them that we should turn them off there, so let's just try and see I'd suggest.

If they do tend to be a bit too much in e.g. the stdlib (until we can, say, fix them automatically), we could look into suppressing them in patterns only, which seems to be the vast majority of cases.

@Kha
Copy link
Member

Kha commented Jun 1, 2022

If we do anticipate some degree of annoyance, we could also evaluate it first on other projects, such as mathlib4.

Btw, with the tactic fix in place, it should be safe to re-enable the lint for parameters, no? Because if a parameter is truly unused, that is definitely worth a warning.

@gebner
Copy link
Member

gebner commented Jun 1, 2022

Is there something like the @[nolint unused_arguments] we have in mathlib to disable the lint? There's 84 definitions in mathlib where one of the arguments is intentionally unused.

@Kha
Copy link
Member

Kha commented Jun 1, 2022

Yes, I assume set_option linter.unusedVariables false (in) would work for that. I see that you ported the mathlib 3 infrastructure using an attribute, but using options has the distinct advantage that we can apply them at many different levels: packages (via moreLeanArgs), modules, section, declarations, and even subterms. On the other hand, the attribute sure can be more compact, which we could emulate using a no_lint command macro desugaring to the set_option code.

@Kha
Copy link
Member

Kha commented Jun 1, 2022

Then there is probably a larger conversation to be had about how the Lean and mathlib approaches to linters should be correlated/integrated/coexist, but maybe not in this PR.

@gebner
Copy link
Member

gebner commented Jun 1, 2022

I see that you ported the mathlib 3 infrastructure using an attribute

I didn't want to make any big refactorings yet. Most of the mathlib linters are 1) per declaration and 2) non-local. An attribute makes perfect sense here, since we need to run the linters with all of mathlib imported (when an option would have long been discarded).

but using options has the distinct advantage that we can apply them at many different levels:

That this is impossible is a feature, not a bug. Opting out of best practices should be annoying, and not happen accidentally. We could of course add a "no nolint on sections/modules" linter. 😄

Then there is probably a larger conversation to be had about how the Lean and mathlib approaches to linters should be correlated/integrated/coexist, but maybe not in this PR.

Indeed, it would be great if e.g. disabling a linter had the same syntax in both cases. But I don't see either of the two APIs subsume the other. Non-local checks (like checking for simp-normal form, or for typeclass diamonds) clearly don't fit into Lean concept of "lint the command syntax right here and now". Syntactical checks (like for line-length, comments, deprecated syntax, etc.) clearly don't fit into the mathlib concept of "check the full mathlib environment".

Even the unused variable linter doesn't neatly fit into either category. You need to look at the AST to flag by have x := 42; exact 1※. But on the other hand we also want to check declarations generated by metaprograms, which do not have an input syntax.

※ I just noticed that have behaves differently in Lean 4 than it did in Lean 3. In Lean 3 the have disappeared, so you could use it to declare local instances without affecting the elaborated term. But now it produces let_fun.

@larsk21
Copy link
Contributor Author

larsk21 commented Jun 2, 2022

With these fine-grained options to disable linters, do we still want the unused variable pattern _x, i.e. variables starting with an underscore are ignored by the linter?

@Kha
Copy link
Member

Kha commented Jun 2, 2022

It's still useful for other kinds of variables, no? And I assume most code will (eventually) not make use of these options.

@larsk21
Copy link
Contributor Author

larsk21 commented Jun 2, 2022

Sure, it can be used on every variable detected by the linter.

@larsk21
Copy link
Contributor Author

larsk21 commented Jun 2, 2022

Some statistics from a stage2 build of the Lean repo:

category number of unused variables percentage of all unused variables
all 1337 100%
function arguments 353 26.4%
pattern variables 571 42.7%

@Kha
Copy link
Member

Kha commented Jun 2, 2022

Interesting, I would have thought it would be more pattern variables. In that case I'm fine with merging it as is, and I believe @leodemoura as well. @gebner Do you have anything in mind that should be addressed right now?

Opting out of best practices should be annoying, and not happen accidentally.

I don't feel that strongly about opting out btw. Clippy/Rust also supports opt-outs at the crate & module level. And it's still not really accidental, is it?

@leodemoura
Copy link
Member

Interesting, I would have thought it would be more pattern variables. In that case I'm fine with merging it as is, and I believe @leodemoura as well. @gebner Do you have anything in mind that should be addressed right now?

Yes, I am fine with merging as is. I will be happy to address the unused variable warnings.

@gebner
Copy link
Member

gebner commented Jun 2, 2022

Do you have anything in mind that should be addressed right now?

I don't think there's anything which needs to be done right now. The PR doesn't really break anything, and set_option linter.unusedVariables false is always an option.

But we should have a plan on how to fix all the new warnings (almost every file has warnings now).

Clippy/Rust also supports opt-outs at the crate & module level. And it's still not really accidental, is it?

I'm not particularly concerned about configuring linters at the crate/package level. Linters need to be configured somewhere after all. However a section-opt-out can easily affect more declarations than originally intended, e.g. if new declarations are added to the section.

The set_option is even easier to accidentally misuse, all you need to forget are two characters and the linters are disabled for the rest of the file.

@Kha Kha force-pushed the linter/unused-variables branch from b1cfd2d to 301e8fb Compare June 3, 2022 11:03
@Kha Kha merged commit caa8804 into leanprover:master Jun 3, 2022
@larsk21 larsk21 deleted the linter/unused-variables branch June 3, 2022 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants