New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stroke caps #32

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@chanind
Contributor

chanind commented Feb 18, 2018

This PR is an attempt at addressing #28. This PR adds stroke caps by the following algorithm:

  • assume all # # L # # strings in the stroke are possible clipping points
  • check the angle of the stroke around each clipping point to see if it's flat or if there's a real angle (flat means it's probably an intersection with another stroke rather than a clip
  • check the distance of each clipping point to each other stroke to verify that another stroke is clipping
  • calculate a tangent line by looking a few pixels back from each clipping point
  • use a cubic bezier curve with d = 30 (somewhat arbitrarily chosen, seems to work) along the tangent lines
  • If there are 2 clipping points next to each other (ex in 木), then combine them together and use d = 1.4 x <dist to middle clipping point> to make sure the bezier curve will extend far enough to cover up the whole stroke.

This PR includes the updated graphics.txt and a stroke_caps folder with the code to update the graphics.txt file. You can run the update script with cd stroke_caps; npm install; node updateGraphicsTxt.js -v --input ../graphics.txt. The code is a bit of a mess - I can clean it up some more if needed.

9346 / 9510 characters were modified. Below are some before / after examples:

screen shot 2018-02-18 at 10 21 18 am

screen shot 2018-02-18 at 10 21 45 am

screen shot 2018-02-18 at 10 20 57 am

screen shot 2018-02-16 at 8 28 43 am

screen shot 2018-02-18 at 10 20 35 am

screen shot 2018-02-16 at 8 28 56 am

chanind added a commit to chanind/hanzi-writer-data that referenced this pull request Feb 19, 2018

Stroke caps (#1)
* Pulling in stroke-cap data from skishore/makemeahanzi#32
@chanind

This comment has been minimized.

Contributor

chanind commented Feb 22, 2018

I updated Hanzi Writer to use this data. You can see all the stroke-capped characters here: https://chanind.github.io/hanzi-writer-data/

@skishore

This comment has been minimized.

Owner

skishore commented Apr 13, 2018

This is a really nice approach. Awesome work. Sorry that I haven't taken a look at it until now!

I have a tendency to get overly detailed with code reviews at work which I don't want to bring here, so let me just make a few high-level comments. You tell me how feasible these things are - if they will take too much work, then I am happy to merge these commits without the changes.

  • This code might be more appropriate for the tool branch. That branch has a bunch of logic that's similar to what you have here - for example, there's a getPolygonApproximation method here which does basically the same thing as your getOutlinePoints. If this code ran there, then we could re-run it and display the results when editing character data.
  • I try to avoid using NPM in general here, instead shrink-wrapping and pulling in external dependencies that I really need and avoiding others. If integrated with the tool branch, I think Meteor would take care of Babel, commander would be replaced with console usage, and the SVG utils already exist there.
  • In getBridges, I'm surprised that searching for L # # works - it's pretty counterintuitive! I think the bridges can be equivalently defined as the set of points that appear in multiple strokes and that there might be some edge cases where the line approach would make a mistake. I'm not sure.
  • I'm also unsure of why some of the NaN handling is needed - I guess there are bridges that are parallel so their intersection is empty? Might be good to write an explanation in a comment.

Overall, my inclination is, let's merge this now and I can follow up with some of these changes if they are actually needed. What do you think?

@chanind

This comment has been minimized.

Contributor

chanind commented Apr 14, 2018

Ah I didn't know about the tools branch! Should I reopen this PR against that branch? Using shrink wrap and removing babel and the other requirements makes sense. Getting rid of the commander stuff should be no problem too.

Ah yeah looking for sets of points in 2 strokes instead of looking for "L" should work too! Whichever is easiest.

Yeah, the Nan stuff is to handle parallel lines. I'll add a comment about that!

If it's easier for you to merge and make these changes yourself go for it!

@skishore

This comment has been minimized.

Owner

skishore commented Apr 16, 2018

Let's merge this now. The way you've set it up (in a separate directory so the only dependency is the format of the data) will make refactoring possible at any point. Do you mind rebasing onto master?

@chanind

This comment has been minimized.

Contributor

chanind commented Apr 16, 2018

It looks like this is up to date with master, so it should be good to go

@skishore

This comment has been minimized.

Owner

skishore commented Apr 17, 2018

I rebase rather than merging to keep a linear history of commits, so I did that offline. Merged!

About 1% of characters actually change when I re-run the script. Any idea why it would be non-deterministic in that way?

Getting this into the tool branch is well-worth it. There are two huge advantages of doing it that way:

  • Only running this code when a character's data changes will be fast (~1s incremental update)
  • The new graphics.txt values can be used in the SVGs, too.

It will take a little work to integrate this code into the tool branch properly, but here's a very quick-and-dirty first integration: the tool server can simply write a fixed "graphics.txt" temp file for the given character, run the script, and read it back in. I'll get that working over the weekend so we can have really nice SVG outputs, then update the README!

@skishore skishore closed this Apr 17, 2018

@chanind

This comment has been minimized.

Contributor

chanind commented Apr 17, 2018

Thanks for the feedback, cleanup, and merging!

About 1% of characters actually change when I re-run the script. Any idea why it would be non-deterministic in that way?

I suspect it's from the rounding done in fixStrokes here. This might mean that after the script first runs, there's a few bridges that just barely didn't meet the thresholds for correction before but do now. I'll double-check to see if this is what's going on.

Integrating into the tools branch sounds like the right thing to do. Let me know if there's more I can do to help!

@chanind

This comment has been minimized.

Contributor

chanind commented Apr 26, 2018

I looked into the non-deterministic running issues in more depth. It looks like the reason isn't due to rounding, but is due to the way the shape of the strokes is estimated by using 1000 points along the outline of the path. After the first run of this script, the shape of strokes change slightly due to the stroke caps being added, and as a result these estimation points also shift slightly. These points are used to calculate distances and cosine similarity, so these calculations change value slightly. The bridges that are modified are just barely above the cosine similarity threshold on the first run, and on the second run are just barely below it, just due to the noise of where the estimation points happen to be. It looks like the strokes being modified should in fact have been modified in the first run. An example of one of these strokes is shown below:

screen shot 2018-04-26 at 10 46 44 am

This could be fixed by using more estimation points (maybe 2000 - 5000 or so?), and by increasing LOWER_COS_SIM_THRESH slightly, maybe to 0.91 or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment