Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add World of Warcraft TOC file lexer #2244

Merged
merged 11 commits into from
Sep 21, 2022
Merged

Conversation

t-mart
Copy link
Contributor

@t-mart t-mart commented Sep 21, 2022

This PR adds lexer support for World of Warcraft TOC files, which are metadata files that describe addons that augment the game. See 1, 2 for examples of TOC files.

This is a widely developed and distributed file format: On Curseforge, an addon index, there are roughly 10,000 addons. The top addons have 100s of millions of downloads (see 1, 2, 3).

This PR has support for all constructs in a TOC file:

  • Tags, including determination of tag type: official (including locale), user-defined, or non-conforming tags
  • Addon files
  • Comments

I based my work off on nebularg/language-toc-wow, which serves as GitHub's linguist reference, and the Wowpedia TOC Format article.

Example files have been included, and their --golden output looks good to me. All tests pass.

@t-mart
Copy link
Contributor Author

t-mart commented Sep 21, 2022

Oh, also, the standard file suffix for TOCs (.toc) conflicts with an existing Tex suffix. I'm not sure what to do about that.

Copy link
Contributor

@jeanas jeanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, generally looks OK.

For the conflict with TeX on *.toc, you need to write an analyse_text method that returns a score indicating how probable it is that the file is a Warcraft TOC file. Are there any recurring patterns occurring in those files?

pygments/lexers/wowtoc.py Outdated Show resolved Hide resolved
pygments/lexers/wowtoc.py Outdated Show resolved Hide resolved
pygments/lexers/wowtoc.py Outdated Show resolved Hide resolved
pygments/lexers/wowtoc.py Outdated Show resolved Hide resolved
t-mart and others added 3 commits September 21, 2022 10:22
Co-authored-by: Jean Abou-Samra <jean@abou-samra.fr>
Co-authored-by: Jean Abou-Samra <jean@abou-samra.fr>
@t-mart
Copy link
Contributor Author

t-mart commented Sep 21, 2022

Are there any recurring patterns occurring in those files?

There are definitely tells, yes. I'll have to see look into how to implement this method and look into how this toc differs from Tex's.

Working on this today.

@t-mart
Copy link
Contributor Author

t-mart commented Sep 21, 2022

@jean-abou-samra, thank you for your feedback earlier. Since then, I have

  • added a analyse_text method to this lexer
  • incorporated your suggestions to my regex patterns
  • Made some minor reformats/refactors

I also fixed some other small things I found that are unrelated to this lexer, but that impeded my development workflow:

  • The Makefile specifies the python3 executable, but this was unavailable on my Windows machine. I only have python, and since no other versions of Python except 3 are supported, it's probably a safe move. I have removed the 3.
  • Speaking of supported versions, the URL in Contributing.md for supported versions was broken. I have replaced it with a working one.
  • Another Windows quirk: the default encoding for file IO is, unfortunately, not UTF-8 like most everyone else, but instead cp1252. Therefore, I was unable to decode some of the lexer source files in the scripts/count_token_references.py. To fix, I added an explicit UTF-8 encoding to the read_text call. (Note: this does not affect importing source files or anything -- it only manifests if you're trying to read arbitrary files. See this for more info.)

Are these other (minor?) changes acceptable in this PR, or should I make new ones?

@jeanas
Copy link
Contributor

jeanas commented Sep 21, 2022

encoding="utf-8" is good (I can't wait for Python 3.15 where it will be the default, it will fix more bugs than it will create…).

The URL fix is good.

The python3 change is more debatable. There are still systems where python refers to Python 2. You can just run PYTHON=python make (or whatever the Windows shell syntax is) to use python.

pygments/lexers/wowtoc.py Outdated Show resolved Hide resolved
@t-mart
Copy link
Contributor Author

t-mart commented Sep 21, 2022

The python3 change is more debatable.

Yeah, I know what you mean. I don't want to get stuck on it though, so I will revert.

@t-mart
Copy link
Contributor Author

t-mart commented Sep 21, 2022

Reversion to python3 committed and or "" removed. Awaiting your feedback.

@jeanas jeanas merged commit d48686d into pygments:master Sep 21, 2022
@jeanas
Copy link
Contributor

jeanas commented Sep 21, 2022

LGTM.

@t-mart
Copy link
Contributor Author

t-mart commented Sep 21, 2022

Thanks @jean-abou-samra! This was a very pleasant PR. I really appreciate your help.

@t-mart t-mart deleted the wowtoc-lexer branch September 22, 2022 15:43
@Anteru Anteru added this to the 2.14.0 milestone Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants