real entropy #4

oprogramador · 2020-02-04T22:42:06Z

IMO Shannon entropy isn't a good measurement because a given string repeated 100 times has the same entropy as repeated only once.
Of course, repeating the same sequence doesn't increase much the amount of information but in some level increases.

IMO:

abcd -> log_2 (4) which gives 2
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd (abcd repeated 100 times) -> log_2 (4 + log_2 (100)) = 3.41

https://www.shannonentropy.netmark.pl/calculate

The text was updated successfully, but these errors were encountered:

oprogramador · 2020-02-04T22:48:06Z

or another example - according to Shannon, the entropy of a is 0 and the entropy of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa is 0 as well

oprogramador · 2020-02-05T00:24:48Z

Or:
abcdefabcdef -> 2.58496
abcdefcbafed -> 2.58496

oprogramador · 2020-02-05T00:40:27Z

Or:
01 -> 1
0100001011110101000000010000100100001100110101001100011101110100110101011110110111110001110111110100 -> 1

oprogramador · 2020-02-05T00:42:24Z

01 -> 1
00001 -> 0.72193

nickdeis · 2020-02-27T18:28:45Z

Hey @oprogramador,
Thank you for the compelling issue. I'm currently researching into this. I have added this plugin to a few of the larger projects I work on. I think the current problem is that the false positives tend to be actual words.
This isn't an issue until you have large inline strings with things like paragraphs (like auto-gen) docs.
I'm currently trying to think of a good solution to this. Let me know what your thoughts are.
I'm going to keep brainstorming. Maybe some NLP?
Cheers,
Nick

oprogramador · 2020-02-27T19:29:20Z

@nickdeis

that's my solution https://github.com/oprogramador/eslint-plugin-no-credentials/blob/master/src/calculateStrongEntropy.js

multiplying the Shannon entropy plus 1 and zipped data length minus 20 (because it's always at least 20)

oprogramador · 2020-02-27T20:19:26Z

you can see the results here https://github.com/oprogramador/eslint-plugin-no-credentials/blob/master/src/tests-mocha/calculateStrongEntropy.js

nickdeis · 2020-03-01T19:51:00Z

Super interesting. Wouldn't entropy and compression rates be colinear? I suppose this ends up being a weighted measure of entropy and string length. Any reference material used to come up with this?

nickdeis · 2021-02-27T20:52:38Z

Closing as over a year old

oprogramador · 2021-02-27T21:01:24Z

@nickdeis

I invented my own approach in my library to have a relatively good measurement of information quantity.

nickdeis closed this as completed Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

real entropy #4

real entropy #4

oprogramador commented Feb 4, 2020

oprogramador commented Feb 4, 2020 •

edited

oprogramador commented Feb 5, 2020

oprogramador commented Feb 5, 2020

oprogramador commented Feb 5, 2020

nickdeis commented Feb 27, 2020

oprogramador commented Feb 27, 2020 •

edited

oprogramador commented Feb 27, 2020

nickdeis commented Mar 1, 2020

nickdeis commented Feb 27, 2021

oprogramador commented Feb 27, 2021

real entropy #4

real entropy #4

Comments

oprogramador commented Feb 4, 2020

oprogramador commented Feb 4, 2020 • edited

oprogramador commented Feb 5, 2020

oprogramador commented Feb 5, 2020

oprogramador commented Feb 5, 2020

nickdeis commented Feb 27, 2020

oprogramador commented Feb 27, 2020 • edited

oprogramador commented Feb 27, 2020

nickdeis commented Mar 1, 2020

nickdeis commented Feb 27, 2021

oprogramador commented Feb 27, 2021

oprogramador commented Feb 4, 2020 •

edited

oprogramador commented Feb 27, 2020 •

edited