Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python recognized as Makefile #1976

Closed
kernc opened this issue Feb 17, 2019 · 15 comments · Fixed by #2278
Closed

Python recognized as Makefile #1976

kernc opened this issue Feb 17, 2019 · 15 comments · Fixed by #2278
Labels
auto-detect Issue with auto detection of language type

Comments

@kernc
Copy link

kernc commented Feb 17, 2019

(Originally reported in pdoc3/pdoc#32.)

The following <pre><code> is recognized as Makefile instead of Python (live example).

<!DOCTYPE html>
<html>
<head><meta charset="utf-8">
  
  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.14.2/styles/default.min.css">
  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.14.2/highlight.min.js"></script>
  <script>hljs.initHighlightingOnLoad();</script>
  
</head>
<body>

<pre><code>

#!python
import constraintanalysis as ca
import atmospheres as at
import unitconversions as co

designbrief = {'stloadfactor': 2, 'turnalt_m': 3050, 'turnspeed_ktas': 140}

etap = {'turn': 0.85}

designperformance = {'CLmaxclean': 1.45, 'CDminclean': 0.02541,
                     'etaprop': etap}

designdef = {'aspectratio': 10, 'sweep_le_deg': 2,
             'sweep_mt_deg': 0, 'bpr': -1}

TOW_kg = 1500

designatm = at.Atmosphere()
concept = ca.AircraftConcept(designbrief, designdef,
                             designperformance, designatm)

wingloading_pa = 1000

twreq, _, _ = concept.twrequired_trn(wingloading_pa)

turnspeed_mpstas = co.kts2mps(designbrief['turnspeed_ktas'])

pw_trn_wpn = ca.tw2pw(twreq, turnspeed_mpstas, etap['turn'])
pw_trn_hpkg = co.wn2hpkg(pw_trn_wpn)
p_trn_hp = pw_trn_hpkg * TOW_kg

print(p_trn_hp)

</code></pre>

  </body>
</html>

I believe hanging-indented lines containing spaces instead of tabs (\x09) should exclude Makefile altogether.

@joshgoebel joshgoebel added the auto-detect Issue with auto detection of language type label Oct 7, 2019
@joshgoebel
Copy link
Member

I believe hanging-indented lines containing spaces instead of tabs (\x09) should exclude Makefile altogether.

Can you elaborate? Do Makefiles alway force tabs? That's not something that's guaranteed to come thru in code snippets online though... they might have already gone thru a few different translation processes so I don't think we can necessarily expect their to be LITERAL tabs in the content.

@kernc
Copy link
Author

kernc commented Oct 7, 2019

Do Makefiles alway force tabs?

For indentation, yes. Weird errors otherwise. 😅

That's not something that's guaranteed to come thru in code snippets online though... they might have already gone thru a few different translation processes

Right, haven't really thought of that. Then again, most verbatim blocks should preserve all contained characters.


Alternatively, Makefile might be set to lower priority compared to Python (which for above snippet also triggers)?

@joshgoebel
Copy link
Member

Right, haven't really thought of that. Then again, most verbatim blocks should preserve all contained characters.

I'm not worried so much about the browser (though I haven't tested) as actually like how snippets might end up there in the first place, copied and pasted, etc...

Alternatively, Makefile might be set to lower priority compared to Python (which for above snippet also triggers)?

How would we set those priorities? :-) Right now the tuning of the auto-detect stuff (which is the only priority system we have) is all a bit of dark magic. :-)

@kernc
Copy link
Author

kernc commented Oct 9, 2019

Copy-pasting should also preserve characters! 😛

dark magic

Where is that, approximately?

@joshgoebel
Copy link
Member

Where is that, approximately?

Are you taking back the question? LOL.

Copy-pasting should also preserve characters!

You'd think, but I was imagining other ways of getting the content to the page also not just copy/paste... if you want to play around with tabs vs spaces and if that might help out, feel free and report your findings. :-)

@joshgoebel
Copy link
Member

You gave me this idea though: #2174

@joshgoebel
Copy link
Member

Out of curiosity what is the context where you simply couldn't tell it the snippet was python?

@kernc
Copy link
Author

kernc commented Oct 19, 2019

Simple indented markdown code block (pdoc3/pdoc#32). The user was able to workaround with a fenced code block.

Shebangs do give a clue, I was also thinking of that, but now it seems of too little benefit assuming only very few code snippets contain shebangs. 😕

@kernc
Copy link
Author

kernc commented Oct 20, 2019

I just thought it a makefile plugin bug. For text from example:

text = `#!python
import constraintanalysis as ca
import atmospheres as at
import unitconversions as co

designbrief = {'stloadfactor': 2, 'turnalt_m': 3050, 'turnspeed_ktas': 140}

etap = {'turn': 0.85}

designperformance = {'CLmaxclean': 1.45, 'CDminclean': 0.02541,
                     'etaprop': etap}

designdef = {'aspectratio': 10, 'sweep_le_deg': 2,
             'sweep_mt_deg': 0, 'bpr': -1}

TOW_kg = 1500

designatm = at.Atmosphere()
concept = ca.AircraftConcept(designbrief, designdef,
                             designperformance, designatm)

wingloading_pa = 1000

twreq, _, _ = concept.twrequired_trn(wingloading_pa)

turnspeed_mpstas = co.kts2mps(designbrief['turnspeed_ktas'])

pw_trn_wpn = ca.tw2pw(twreq, turnspeed_mpstas, etap['turn'])
pw_trn_hpkg = co.wn2hpkg(pw_trn_wpn)
p_trn_hp = pw_trn_hpkg * TOW_kg

print(p_trn_hp)`

the makefile relevance just seems overzealous:

hljs.listLanguages()
    .map(name => name + '\t\t' + hljs.highlight(name, text, false).r)
    .toString().replace(/,/g, '\n')
apache		0
bash		15
coffeescript		21
cpp		0
cs		17
css		0
diff		0
http		0
ini		0
java		0
javascript		20
json		0
makefile		25
xml		0
markdown		1
nginx		0
objectivec		0
perl		15
php		18
python		21
ruby		14
shell		2
sql		0

@joshgoebel
Copy link
Member

joshgoebel commented Oct 20, 2019

Many of the languages will seem overzealous for given samples... it's very hard to do magic auto-detection of languages and get it right 100% of the time.

To be clear, I do agree the relevance system can be improved, but it's far from a simple think to just sit down and "fix it"... :-) Too many things are equally valued by all languages so as to be almost meaningless (like strings or some types of comment)... they should be scored less.

Some works has happened on this though, such as my recent patch to blacklist certain popular keywords that appear in the english language as well as many programming languages.

It's hard to fix because we don't have great measurement tools to know we're moving the right direction (I'm working on some) and currently ALL the tests have to pass to accept a PR - so that makes it a hard thing to tackle say "one language at a time" like you might tackle other issues.

@joshgoebel
Copy link
Member

If you think you have ideas or answers I"d love to hear them or see sample code.

@joshgoebel
Copy link
Member

@kernc Out of curiosity did you look to see HOW makefile is getting all that relevance?

@kernc
Copy link
Author

kernc commented Oct 23, 2019

I don't have a JS debugging environment set up. But I suppose that's the way to go about it. 😄

@joshgoebel
Copy link
Member

I find a lot of the string and very common things to be poorly tuned. IE, languages might be "equal" in some regards but then string gets 1 relevance and it completely misbalances anything with a lot of strings, etc... I think all that stuff should be tuned way down, but we don't really have a way to tune the system right now exact as a "whole" and that's tough.

joshgoebel added a commit to joshgoebel/highlight.js that referenced this issue Nov 12, 2019
Closes highlightjs#1976.

- Makefile was claiming 2 relevance for every `x =` pattern
- This is a little ridiculous IMHO.
- I'm not sure any other grammars get points for assignment,
  so 2 seems way off to me.
- Cleaning up the grammar and lowering this rule back to
  1 relevance.
@joshgoebel
Copy link
Member

Root cause here is Makefile earned lots of points for simple assignment operations, which is a little silly. PR improves this situation greatly.

joshgoebel added a commit that referenced this issue Nov 13, 2019
* fix(makefile) fix double relevance for assignments

Closes #1976.

- Makefile was claiming 2 relevance for every `x =` pattern
- I'm not sure other grammars get points for assignment,
  so 2 seems off to me.
- Cleaning up the grammar and lowering this rule back to
  1 relevance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-detect Issue with auto detection of language type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants