Idea: Add stemming #1

voz · 2013-01-09T11:03:13Z

Reduce derived word to their stems (stemming) and afterwards match the stems only. It might be more computationally intensive, but the list should become easier to maintain and more bullshit could be discovered.

mourner · 2013-01-09T11:08:57Z

Agreed! Some words become bullshit only in combination but there are others that definitely should be stemmed, thanks for the idea!

calvinmetcalf · 2013-01-09T15:47:14Z

Could add a point value to words, or just put them in groups with the same bullshit level, and modify the bs value based on the proximity to other bullshit words i.e. with a threshold of 1, 'monetize' might have 1.2 and always be bullshit, but 'functionality' 0.8 so not bullshit but if 3 words away from 'empowerment', 0.8 then bullshit, 0.8+(0.8/3)=1.07.

mourner · 2013-01-09T15:52:21Z

Lol, that's awesome idea. :) May be hard to implement though, and tough to assign/maintain the values.
Should be discussed in a separate issue I think, quite different from stemming proposal.

voz · 2013-01-09T15:52:50Z

Yes, but the usual trick here is to come with the right weights. How do we know that "'monetize' might have 1.2" and no 1.875?

On Jan 9, 2013, at 4:47 PM, Calvin Metcalf notifications@github.com wrote:

Could add a point value to words, or just put them in groups with the same bullshit level, and modify the bs value based on the proximity to other bullshit words i.e. with a threshold of 1, 'monetize' might have 1.2 and always be bullshit, but 'functionality' 0.8 so not bullshit but if 3 words away from 'empowerment', 0.8 then bullshit, 0.8+(0.8/3)=1.07.

—
Reply to this email directly or view it on GitHub.

calvinmetcalf · 2013-01-09T15:54:05Z

my bad, was thinking of solutions to the issue of words not bullshit by themselves

voz · 2013-01-09T15:57:19Z

The idea of weights is a good one, the only thing is that one needs a set of manually classified bullshit texts in order to get the values. But we can discuss it in another issue as @mourner mentioned.

On Jan 9, 2013, at 4:54 PM, Calvin Metcalf notifications@github.com wrote:

my bad, was thinking of solutions to the issue of words not bullshit by themselves

—
Reply to this email directly or view it on GitHub.

calvinmetcalf · 2013-01-09T19:42:27Z

I experemented with some of the available stemming libraries, neither porter stemmer nor Snowball.js are really at a level that is really usable here..

calvinmetcalf mentioned this issue Jan 9, 2013

Weighted bullshit assessment #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Add stemming #1

Idea: Add stemming #1

voz commented Jan 9, 2013

mourner commented Jan 9, 2013

calvinmetcalf commented Jan 9, 2013

mourner commented Jan 9, 2013

voz commented Jan 9, 2013

calvinmetcalf commented Jan 9, 2013

voz commented Jan 9, 2013

calvinmetcalf commented Jan 9, 2013

Idea: Add stemming #1

Idea: Add stemming #1

Comments

voz commented Jan 9, 2013

mourner commented Jan 9, 2013

calvinmetcalf commented Jan 9, 2013

mourner commented Jan 9, 2013

voz commented Jan 9, 2013

calvinmetcalf commented Jan 9, 2013

voz commented Jan 9, 2013

calvinmetcalf commented Jan 9, 2013