Stroke caps #32

chanind · 2018-02-18T15:24:19Z

This PR is an attempt at addressing #28. This PR adds stroke caps by the following algorithm:

assume all # # L # # strings in the stroke are possible clipping points
check the angle of the stroke around each clipping point to see if it's flat or if there's a real angle (flat means it's probably an intersection with another stroke rather than a clip
check the distance of each clipping point to each other stroke to verify that another stroke is clipping
calculate a tangent line by looking a few pixels back from each clipping point
use a cubic bezier curve with d = 30 (somewhat arbitrarily chosen, seems to work) along the tangent lines
If there are 2 clipping points next to each other (ex in 木), then combine them together and use d = 1.4 x <dist to middle clipping point> to make sure the bezier curve will extend far enough to cover up the whole stroke.

This PR includes the updated graphics.txt and a stroke_caps folder with the code to update the graphics.txt file. You can run the update script with cd stroke_caps; npm install; node updateGraphicsTxt.js -v --input ../graphics.txt. The code is a bit of a mess - I can clean it up some more if needed.

9346 / 9510 characters were modified. Below are some before / after examples:

* Pulling in stroke-cap data from skishore/makemeahanzi#32

chanind · 2018-02-22T13:34:34Z

I updated Hanzi Writer to use this data. You can see all the stroke-capped characters here: https://chanind.github.io/hanzi-writer-data/

skishore · 2018-04-13T02:28:44Z

This is a really nice approach. Awesome work. Sorry that I haven't taken a look at it until now!

I have a tendency to get overly detailed with code reviews at work which I don't want to bring here, so let me just make a few high-level comments. You tell me how feasible these things are - if they will take too much work, then I am happy to merge these commits without the changes.

This code might be more appropriate for the tool branch. That branch has a bunch of logic that's similar to what you have here - for example, there's a getPolygonApproximation method here which does basically the same thing as your getOutlinePoints. If this code ran there, then we could re-run it and display the results when editing character data.
I try to avoid using NPM in general here, instead shrink-wrapping and pulling in external dependencies that I really need and avoiding others. If integrated with the tool branch, I think Meteor would take care of Babel, commander would be replaced with console usage, and the SVG utils already exist there.
In getBridges, I'm surprised that searching for L # # works - it's pretty counterintuitive! I think the bridges can be equivalently defined as the set of points that appear in multiple strokes and that there might be some edge cases where the line approach would make a mistake. I'm not sure.
I'm also unsure of why some of the NaN handling is needed - I guess there are bridges that are parallel so their intersection is empty? Might be good to write an explanation in a comment.

Overall, my inclination is, let's merge this now and I can follow up with some of these changes if they are actually needed. What do you think?

chanind · 2018-04-14T02:36:07Z

Ah I didn't know about the tools branch! Should I reopen this PR against that branch? Using shrink wrap and removing babel and the other requirements makes sense. Getting rid of the commander stuff should be no problem too.

Ah yeah looking for sets of points in 2 strokes instead of looking for "L" should work too! Whichever is easiest.

Yeah, the Nan stuff is to handle parallel lines. I'll add a comment about that!

If it's easier for you to merge and make these changes yourself go for it!

skishore · 2018-04-16T04:11:26Z

Let's merge this now. The way you've set it up (in a separate directory so the only dependency is the format of the data) will make refactoring possible at any point. Do you mind rebasing onto master?

chanind · 2018-04-16T16:55:56Z

It looks like this is up to date with master, so it should be good to go

skishore · 2018-04-17T01:12:31Z

I rebase rather than merging to keep a linear history of commits, so I did that offline. Merged!

About 1% of characters actually change when I re-run the script. Any idea why it would be non-deterministic in that way?

Getting this into the tool branch is well-worth it. There are two huge advantages of doing it that way:

Only running this code when a character's data changes will be fast (~1s incremental update)
The new graphics.txt values can be used in the SVGs, too.

It will take a little work to integrate this code into the tool branch properly, but here's a very quick-and-dirty first integration: the tool server can simply write a fixed "graphics.txt" temp file for the given character, run the script, and read it back in. I'll get that working over the weekend so we can have really nice SVG outputs, then update the README!

chanind · 2018-04-17T02:28:52Z

Thanks for the feedback, cleanup, and merging!

About 1% of characters actually change when I re-run the script. Any idea why it would be non-deterministic in that way?

I suspect it's from the rounding done in fixStrokes here. This might mean that after the script first runs, there's a few bridges that just barely didn't meet the thresholds for correction before but do now. I'll double-check to see if this is what's going on.

Integrating into the tools branch sounds like the right thing to do. Let me know if there's more I can do to help!

chanind · 2018-04-26T02:56:47Z

I looked into the non-deterministic running issues in more depth. It looks like the reason isn't due to rounding, but is due to the way the shape of the strokes is estimated by using 1000 points along the outline of the path. After the first run of this script, the shape of strokes change slightly due to the stroke caps being added, and as a result these estimation points also shift slightly. These points are used to calculate distances and cosine similarity, so these calculations change value slightly. The bridges that are modified are just barely above the cosine similarity threshold on the first run, and on the second run are just barely below it, just due to the noise of where the estimation points happen to be. It looks like the strokes being modified should in fact have been modified in the first run. An example of one of these strokes is shown below:

This could be fixed by using more estimation points (maybe 2000 - 5000 or so?), and by increasing LOWER_COS_SIM_THRESH slightly, maybe to 0.91 or so.

skishore and others added 3 commits January 20, 2018 12:32

Add a few more related projects and client projects

f95cd40

Merge remote-tracking branch 'upstream/master'

ec04f67

updating graphics.txt

f94a31a

chanind mentioned this pull request Feb 18, 2018

Extrapolate stroke caps for overlapping strokes #28

Closed

tweaking thresholds and improving speed

9600fca

chanind added a commit to chanind/hanzi-writer-data that referenced this pull request Feb 19, 2018

Stroke caps (#1)

7372c39

* Pulling in stroke-cap data from skishore/makemeahanzi#32

cleaning up stroke capping code

0124e1b

adding a comment explaining the NaN checking

990e63c

skishore closed this Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stroke caps #32

Stroke caps #32

chanind commented Feb 18, 2018

chanind commented Feb 22, 2018

skishore commented Apr 13, 2018 •

edited

Loading

chanind commented Apr 14, 2018

skishore commented Apr 16, 2018

chanind commented Apr 16, 2018

skishore commented Apr 17, 2018 •

edited

Loading

chanind commented Apr 17, 2018

chanind commented Apr 26, 2018 •

edited

Loading

Stroke caps #32

Stroke caps #32

Conversation

chanind commented Feb 18, 2018

chanind commented Feb 22, 2018

skishore commented Apr 13, 2018 • edited Loading

chanind commented Apr 14, 2018

skishore commented Apr 16, 2018

chanind commented Apr 16, 2018

skishore commented Apr 17, 2018 • edited Loading

chanind commented Apr 17, 2018

chanind commented Apr 26, 2018 • edited Loading

skishore commented Apr 13, 2018 •

edited

Loading

skishore commented Apr 17, 2018 •

edited

Loading

chanind commented Apr 26, 2018 •

edited

Loading