Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to implement copy-paste protection checks #64

Open
Martoon-00 opened this issue Apr 23, 2021 · 3 comments · May be fixed by #238, #245 or #246
Open

Try to implement copy-paste protection checks #64

Martoon-00 opened this issue Apr 23, 2021 · 3 comments · May be fixed by #238, #245 or #246
Assignees
Labels
feature New functionality

Comments

@Martoon-00
Copy link
Member

Martoon-00 commented Apr 23, 2021

Clarification and motivation

Imagine the following list of links:

  • [file](files/file.out)
  • [file2](files/file2.out)
  • [file3](files/file3.out)
  • [Another file](files/another-file.out)
  • [And a file once again](files/and-a-file-once-again.out)

It is easy to make a mistake here during copy-pasting so that text is updated and the link is not. I think we can use some heuristics to spot such mistakes (but avoid false positives at all costs):

  • If this check is enabled in config (it should be by default);
  • And there are two links [T1](L1) and [T2](L1) in a file, and T1 is substring of L1 modulo casing and all the non-alphanum characters, while T2 is not substring of L1 modulo the same things;

then report an error at [T2](L1) position, mentioning that it could be a bad copy-paste of [T1](L1). And a similar check for [T1](L2).

Acceptance criteria

  • The check is implemented.
  • It can be disabled both in config and in place via some <--! xrefcheck: no duplication check in {file/paragraph/link} -->.
@Martoon-00
Copy link
Member Author

If possible, we should check reference-style links ([text][link-id]) too.

However not sure how much is it possible, AFAIR such links are automatically inlined by our markdown parser.

@YuriRomanowski YuriRomanowski self-assigned this Dec 12, 2022
YuriRomanowski added a commit that referenced this issue Dec 14, 2022
Problem: Current implementation of the markdown scanner is hard
to extend, so we need to refactor it to add support for new annotations.

Solution: Refactor; improve handling annotations, remove IMSAll state
as it's not required, rename functions.
YuriRomanowski added a commit that referenced this issue Dec 14, 2022
Problem: Current implementation of the markdown scanner is hard
to extend, so we need to refactor it to add support for new annotations.

Solution: Refactor; isolated processing annotations for different
types of nodes.
YuriRomanowski added a commit that referenced this issue Dec 14, 2022
Problem: Current implementation of the markdown scanner is hard
to extend, so we need to refactor it to add support for new annotations.

Solution: Refactor; isolate processing annotations for different
types of nodes.
@YuriRomanowski YuriRomanowski linked a pull request Dec 14, 2022 that will close this issue
13 tasks
@YuriRomanowski
Copy link
Contributor

YuriRomanowski commented Dec 14, 2022

Do we want to check only links within one list? How about checking all the links within a given file?

YuriRomanowski added a commit that referenced this issue Dec 14, 2022
Problem: Currently xrefcheck is not able to detect possibly bad
copy-pastes, when some links are referring the same file, but
from the link name it seems that one of
that links should refer other file.

Solution: Implement check, add support for related annotations
for `.md` files, add corresponding settings to the config.
@Martoon-00
Copy link
Member Author

That's a good question. On the one hand, this increases the probability of getting a false positive. On the other, checking through the entire file may be more useful and will be a more transparent behaviour for the user.

Let's really go with checking across the entire file.

Over time we will collect some statistics on how this check works on real-life repositories and will revise the behaviour then.

YuriRomanowski added a commit that referenced this issue Dec 15, 2022
Remove extra parameters in md scanner
YuriRomanowski added a commit that referenced this issue Dec 16, 2022
Problem: Currently xrefcheck is not able to detect possible bad
copy-pastes, when some links are referring the same file, but
from the link names it seems that one of
those links should refer other file.

Solution: Implement check, add corresponding settings to the config.
@YuriRomanowski YuriRomanowski linked a pull request Dec 16, 2022 that will close this issue
13 tasks
YuriRomanowski added a commit that referenced this issue Dec 16, 2022
Problem: Currently xrefcheck is able to detect possibly bad
copy-pastes, but there is no way to disable those checks
locally for a file/paragraph/link.

Solution: Add support for related annotations for `.md` files.
@YuriRomanowski YuriRomanowski linked a pull request Dec 16, 2022 that will close this issue
13 tasks
YuriRomanowski added a commit that referenced this issue Dec 16, 2022
YuriRomanowski added a commit that referenced this issue Dec 23, 2022
YuriRomanowski added a commit that referenced this issue Dec 23, 2022
Review: fix config, README, CHANGES
YuriRomanowski added a commit that referenced this issue Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
3 participants