Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert t2t to markdown, then to HTML #15945

Merged
merged 23 commits into from
Jan 9, 2024
Merged

Convert t2t to markdown, then to HTML #15945

merged 23 commits into from
Jan 9, 2024

Conversation

seanbudd
Copy link
Member

@seanbudd seanbudd commented Dec 20, 2023

Link to issue number:

Closes #8734
Part of #15014
Related nvaccess/nvda-misc-deps#30, #16002, #15950, #15939, #15981

Summary of the issue:

In order to migrate to Crowdin, we must convert txt2tags to markdown.
In order transition safely, beta will build docs from t2t to markdown to html.
Eventually the t2t will be removed and the markdown will become the source of truth.

Description of user facing changes

The user Guide no longer has numbered sections

Translators and documentation writers now use extended markdown syntax rather than txt2tags.

Description of development approach

The build system performs certain pre-processing and post-processing when converting t2t to HTML.
This equivalent system should be retained - i.e. the translator and documentation contribution experience should remain the same for markdown to HTML.
Additionally, when converting t2t to markdown, new processing rules had to be created.

There is no universal standard for custom anchors in markdown.
As such text2tags doesn't have default rules for this. To retain our custom anchors, I added rules for a common markdown extended syntax.
Similarly a shortcut is used for generating table of contents.
See nvaccess/nvda-misc-deps#30

Setting the language code is done using the user_docs folder name, and the direction of RTL is manually set for the 3 current languages that require it, Persian, Arabic, Hebrew.

Special Catalan processing for adding hreflang attributes is converted to a markdown extension syntax of {hreflang=en}

The key commands file is generated using a custom made markdown python extension.

Testing strategy:

Tested running scons user_docs and checking the resulting files.

Known issues with pull request:

None

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@seanbudd seanbudd requested a review from a team as a code owner December 20, 2023 07:22
@seanbudd seanbudd marked this pull request as draft December 20, 2023 07:22
LeonarddeR

This comment was marked as resolved.

@seanbudd seanbudd force-pushed the convertT2tToMarkdown branch 3 times, most recently from 79448ac to 584031f Compare December 20, 2023 07:39
user_docs/changes.t2tconf Outdated Show resolved Hide resolved
@AppVeyorBot

This comment was marked as resolved.

@Adriani90

This comment was marked as resolved.

@seanbudd

This comment was marked as outdated.

@AppVeyorBot

This comment was marked as resolved.

@AppVeyorBot

This comment was marked as resolved.

user_docs/changes.t2tconf Outdated Show resolved Hide resolved
@seanbudd seanbudd changed the title Build t2t as markdown Convert t2t to markdown, then to HTML Dec 22, 2023
user_docs/changes.t2tconf Outdated Show resolved Hide resolved
@AppVeyorBot

This comment was marked as resolved.

@AppVeyorBot

This comment was marked as resolved.

@AppVeyorBot

This comment was marked as resolved.

@AppVeyorBot

This comment was marked as resolved.

.gitignore Outdated Show resolved Hide resolved
@seanbudd seanbudd linked an issue Dec 28, 2023 that may be closed by this pull request
@seanbudd
Copy link
Member Author

seanbudd commented Jan 5, 2024

Thanks @lukaszgo1 all feedback should now be resolved

@seanbudd
Copy link
Member Author

seanbudd commented Jan 5, 2024

@hwf1324

Yes, I have another proposal, maybe we can add linting and formatting tools.
For cases where file translation is used.

I would suggest opening up a separate issue.
PyMarkdown looks promising: https://github.com/jackdewinter/pymarkdown
It's far from complete and doesn't support diffs, but it could work as a tool for manually checking rather than CI/CD automated checking and failing builds.

@seanbudd

This comment was marked as resolved.

@AppVeyorBot

This comment was marked as resolved.

@lukaszgo1
Copy link
Contributor

No, I did not build from this branch before. The procedure you've proposed didn't help, as there were no .md files to remove. This can be reproduced as follows:

  • Perform a clean checkout of the repository
  • On master execute scons user_docs
  • Switch to convertT2tToMarkdown ensuring to update sub modules
  • Execute scons user_docs again

As long as the documentation was build before this is going to fail. Removing keyCommands.t2t from each language folders seems to be a work around. If this will prove difficult to fix I'd suggest at least adding a note to the known issues of this PR, as developers will certainly encounter this.
Also even when documentation builds successfully many .t2t files are marked as modified in git. Again if this is difficult to fix a note in the known issues should be added, though obviously a proper fix will be nicer.

@seanbudd
Copy link
Member Author

seanbudd commented Jan 8, 2024

Ah I see the problem. I have removed keyCommands.t2t from the .gitignore, along with build.t2tconf. This is to make them more visible to developers to ensure they remove them. During the transition period developers will have to perform git clean -f user_docs/*/keyCommands.t2t. There seems to be no easy way to do this and the transition period should be relatively short. This build error will only occur while we have t2t to md running.

The white space diffs in .t2t files could be avoided - however I plan to just commit these whitespace changes to beta once this PR is merged so that they are not annoying to devs.

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Jan 8, 2024
@zstanecic
Copy link
Contributor

will this be merged soon? I am just curious as a translator

@seanbudd
Copy link
Member Author

seanbudd commented Jan 9, 2024

Merging this alone won't affect translators - this is just a step towards getting the documents ready for translators

@seanbudd seanbudd merged commit dac5aa2 into beta Jan 9, 2024
1 check passed
@seanbudd seanbudd deleted the convertT2tToMarkdown branch January 9, 2024 22:50
seanbudd added a commit that referenced this pull request Jan 10, 2024
Follow up to #15945

Summary of the issue:
locale.t2tconf files are no longer required. Setting the language code and direction will happen using custom rules in #15945 when converting markdown to HTML

Description of user facing changes
None once #15945 is merged

Description of development approach
locale.t2tconf files were removed.
References were also removed.
When saving documents, VS code automatically fixed whitespace issues like mixed line endings.
The diff should be viewed with "ignore whitespace" on.
seanbudd added a commit that referenced this pull request Jan 11, 2024
Clean up of #15945

Summary of the issue:
The key commands doc generator script is in the base folder of the repository - it should be moved to a folder to keep the base repo tidy.
I propose it is moved to user_docs - particularly because of its relation to the docs, and the fact that the t2tconf files will be removed.

The documentation for key commands is lacking in the wiki: https://github.com/nvaccess/nvda/wiki/TranslatingUserGuide
The inline documentation in the module covers the syntax more thoroughly and should be moved to project docs.
seanbudd added a commit that referenced this pull request Jan 15, 2024
Follow up to #15945

Summary of the issue:
Translators can add arbitrary HTML to markdown translations files.
This is a stored XSS risk

Description of user facing changes
Should be none - however see known issues

Description of development approach
Used the python bindings for the Rust Ammonia sanitization library

Testing strategy:
Tested building docs, diffed HTML results between this PR and beta.

Known issues with pull request:
The sanitization deletes any HTML tags that are not recognized.
This includes using angle brackets around words, e.g.: <minor>.
If these are wrapped by code formatting these are correctly escaped: e.g. `<minor>`
While english files are mostly correct, many translated files do not wrap text with angle brackets with code formatting.
This means certain parts of translated documentation will be stripped, i.e. <major>.<minor>.<patch> becomes ..
seanbudd pushed a commit that referenced this pull request Jan 19, 2024
Related #15945, #16024

Summary of the issue:
The documentation folder in the NVDA directory contains files that the user does not need.

For example: keyCommandsDoc.py

Description of user facing changes
The structure of the subfolders in documentation is the same as in the release version, with documentation/styles.css removed.

Description of development approach
In setup.py, exclude the corresponding file
Adriani90 pushed a commit to Adriani90/nvda that referenced this pull request Mar 13, 2024
Closes nvaccess#8734
Part of nvaccess#15014
Related nvaccess/nvda-misc-deps#30, nvaccess#16002, nvaccess#15950, nvaccess#15939, nvaccess#15981

Summary of the issue:
In order to migrate to Crowdin, we must convert txt2tags to markdown.
In order transition safely, beta will build docs from t2t to markdown to html.
Eventually the t2t will be removed and the markdown will become the source of truth.

Description of user facing changes
The user Guide no longer has numbered sections

Translators and documentation writers now use extended markdown syntax rather than txt2tags.

Description of development approach
The build system performs certain pre-processing and post-processing when converting t2t to HTML.
This equivalent system should be retained - i.e. the translator and documentation contribution experience should remain the same for markdown to HTML.
Additionally, when converting t2t to markdown, new processing rules had to be created.

There is no universal standard for custom anchors in markdown.
As such text2tags doesn't have default rules for this. To retain our custom anchors, I added rules for a common markdown extended syntax.
Similarly a shortcut is used for generating table of contents.
See nvaccess/nvda-misc-deps#30

Setting the language code is done using the user_docs folder name, and the direction of RTL is manually set for the 3 current languages that require it, Persian, Arabic, Hebrew.

Special Catalan processing for adding hreflang attributes is converted to a markdown extension syntax of {hreflang=en}

The key commands file is generated using a custom made markdown python extension.
Adriani90 pushed a commit to Adriani90/nvda that referenced this pull request Mar 13, 2024
Follow up to nvaccess#15945

Summary of the issue:
locale.t2tconf files are no longer required. Setting the language code and direction will happen using custom rules in nvaccess#15945 when converting markdown to HTML

Description of user facing changes
None once nvaccess#15945 is merged

Description of development approach
locale.t2tconf files were removed.
References were also removed.
When saving documents, VS code automatically fixed whitespace issues like mixed line endings.
The diff should be viewed with "ignore whitespace" on.
Adriani90 pushed a commit to Adriani90/nvda that referenced this pull request Mar 13, 2024
Clean up of nvaccess#15945

Summary of the issue:
The key commands doc generator script is in the base folder of the repository - it should be moved to a folder to keep the base repo tidy.
I propose it is moved to user_docs - particularly because of its relation to the docs, and the fact that the t2tconf files will be removed.

The documentation for key commands is lacking in the wiki: https://github.com/nvaccess/nvda/wiki/TranslatingUserGuide
The inline documentation in the module covers the syntax more thoroughly and should be moved to project docs.
Adriani90 pushed a commit to Adriani90/nvda that referenced this pull request Mar 13, 2024
…6043)

Follow up to nvaccess#15945

Summary of the issue:
Translators can add arbitrary HTML to markdown translations files.
This is a stored XSS risk

Description of user facing changes
Should be none - however see known issues

Description of development approach
Used the python bindings for the Rust Ammonia sanitization library

Testing strategy:
Tested building docs, diffed HTML results between this PR and beta.

Known issues with pull request:
The sanitization deletes any HTML tags that are not recognized.
This includes using angle brackets around words, e.g.: <minor>.
If these are wrapped by code formatting these are correctly escaped: e.g. `<minor>`
While english files are mostly correct, many translated files do not wrap text with angle brackets with code formatting.
This means certain parts of translated documentation will be stripped, i.e. <major>.<minor>.<patch> becomes ..
Adriani90 pushed a commit to Adriani90/nvda that referenced this pull request Mar 13, 2024
Related nvaccess#15945, nvaccess#16024

Summary of the issue:
The documentation folder in the NVDA directory contains files that the user does not need.

For example: keyCommandsDoc.py

Description of user facing changes
The structure of the subfolders in documentation is the same as in the release version, with documentation/styles.css removed.

Description of development approach
In setup.py, exclude the corresponding file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.
Projects
None yet
9 participants