Implement a hard fork to normalize claim names #159

lbrynaut · 2018-06-12T14:11:18Z

This is a complete re-write of #102 and replaces it since it's much more complete and handles situations that the other does not handle.

This PR contains a lot of changes and I expect a detailed review will take some time to ensure correctness. I also expect the reproducible_build script will need further modification (likely won't work as-is again due to the added ICU dependency).

@kaykurokawa The "claimtriebranching_hardfork_disktest" unit test does not seem to work with this PR, if you have a second to see what's going on there, it would help. It came up after things were working and then I rebased to latest, so it's commented out for now.

EDIT: @kaykurokawa This last comment is no longer true, so can be ignored now.

lbrynaut · 2018-06-26T19:47:42Z

Addresses #65

~~WIP regarding travis/dependency addition.~~

Updated.

BrannonKing · 2018-10-22T14:42:11Z

@bvbfan , excellent analysis. With our work on #44 it appears that we achieved independence from methods taking a name in rpc/claimtrie.cpp. I think that is critical to your suggestion that we don't do string normalization in the CClaimTrie (instead, we do it all in the "cache"). I really like the suggestion! I think that puts a high priority on getting #44 merged and rebasing our normalization on top of that.

bvbfan · 2018-10-30T13:25:55Z

src/claimtrie.cpp

    return removeClaim(newName, outPoint, nHeight, throwaway, false);
 }

-bool CClaimTrieCache::spendClaim(const std::string& name, const COutPoint& outPoint, int nHeight, int& nValidAtHeight) const
+bool CClaimTrieCache::spendClaim(std::string& name, const COutPoint& outPoint, int nHeight, int& nValidAtHeight) const


I don't see a reason to make param in/out.

The out parameter is used in mining.cpp and main.cpp (to avoid duplicate work on that name). Out parameter usage is not very obvious, though. I'm open to other suggestions.

Yeah, i see now. The right approach, to me, is to call normalize before spend, we can prevent double work if we have function like isNormlized before real normalization. But i don't think double call will slow down significantly.

Apply (self) review feedback Clean deps required for boost to rebuild with icu support (for now) Normalization bug fixes and improvements Clang-formatting

where normalization is enabled and the claim no longer exists (due to normalization related narrowing)

also includes code to validate incoming utf8

also fixed a few post-merge issues

also rearranged unit test code to avoid some spurious errors

reproducible_build.sh

bvbfan · 2018-11-05T12:31:17Z

src/claimtrie.cpp

@@ -2819,3 +2859,49 @@ bool CClaimTrieCache::forkForExpirationChange(bool increment) const
    return true;
 }

+bool CClaimTrieCache::shouldNormalize() const {


Function opening brace goes to the new line, that's tidy expectation.

I haven't ran the formatter on this yet. I assume that would pick this up?

Yes, it will.

bvbfan · 2018-11-05T12:31:37Z

src/claimtrie.cpp

+    return nCurrentHeight > Params().GetConsensus().nNormalizedNameForkHeight;
+}
+
+std::string CClaimTrieCache::normalizeClaimName(const std::string& name, bool force) const {


Opening brace on new line.

also fixed early exit in reproducible_build script

BrannonKing · 2018-11-08T05:48:55Z

I did finally get a green on the Linux compilation again. Some notes on my issues with the reproducible_build.sh:

The ICU_PREFIX was being exported to the child makefiles but wasn't always set to a real value when relying on pkg-config. This was causing issues. I changed the child makefiles to rely wholly upon ICU_LIBS and ICU_CPPFLAGS, which should be always set.
I could not get ICU v55 to compile statically. I upped it to v57. Our current version of Boost (v59) would not compile with newer versions of ICU.
The script sets "-e", or maybe it's "-u", which makes it exit whenever any command returns nonzero. It also sets pipeline, which makes it keep the worst result from any method. The flag handling in the wait method was always returning 1 on my machine, causing the whole script to exit. I ditched the flag usage there in favor of a sub-terminal.
The parameters for the boost b2 command weren't getting grouped appropriately. I put in quotes. This made it so that I could not make that call from the "background" function. I'm hoping someone else has a solution for that. The background function seems to run it fine, it just doesn't find the ICU deps when called from there. I could not figure that out. I did add a grep check to ensure that the ICU deps are found.
Because I'm now compiling Boost without the background minute message, TravisCI times out at 10 minutes when compiling Boost afresh. I don't have a solution for this yet. It wants output every 10 minutes. It's ridiculous that the timeout is not configurable. Instead, we push the output to a log file for each dependency build. I'm not sure that's actually a desirable feature. The other side affect of that is how we hide any error reported by our script with a postdump of 200 or 1000 lines of the log file.
The TravisCI caching keeps the build folder around. Our script was checking to see if the dependency's parent folder existed there. If it did exist, it would not run make on that dependency. Hence, if you fail on a dependency it won't get rebuilt because of the TravisCI cache. Not only that, but build/boost was always there -- even if I deleted the cached file from TravisCI. I had to change the script to always build dependencies, which is fast for all except openssl who thinks it's cool to rewrite a whole bunch of source code files with every configure call.
The OSX build now fails with this error: "Too many levels of symbolic links" while compiling openssl a second time. I don't have a solution for that yet.
Statically linking in ICU requires fPIC. I had to add that to the build of ICU.
The "clone" flag that didn't do anything is now gone.
The reproducible_build originally used pkg-config to find ICU, but it would possibly return a system installation. It's now hard-coded to the one pulled by the script.

The advantage of Docker is that you don't have to recompile the dependencies every time. We can stop wasting time on that. I feel strongly like we need to move that direction.

BrannonKing · 2018-11-22T16:01:16Z

#235 supersedes this PR.

lbrynaut requested a review from kaykurokawa June 12, 2018 14:11

lbry-bot assigned kaykurokawa Jun 12, 2018

lbrynaut force-pushed the normalization-rewrite branch 2 times, most recently from 8a3c999 to 5b6dfe0 Compare June 12, 2018 15:28

lyoshenka mentioned this pull request Jun 13, 2018

Claim name normalization #102

Closed

lbrynaut force-pushed the normalization-rewrite branch 5 times, most recently from 2a7ba31 to 41921dd Compare June 21, 2018 01:19

lbrynaut mentioned this pull request Jun 26, 2018

Case-insensitive claims #65

Closed

lbrynaut force-pushed the normalization-rewrite branch 18 times, most recently from 1f8358c to f32b48d Compare June 28, 2018 00:34

lbryio deleted a comment from bvbfan Oct 22, 2018

BrannonKing mentioned this pull request Oct 22, 2018

Decode protobuf-encoded claim values and return them in JSONRPC responses #206

Closed

10 tasks

BrannonKing force-pushed the normalization-rewrite branch from b88e679 to 24ca2a9 Compare October 29, 2018 23:03

lbryio deleted a comment from bvbfan Oct 29, 2018

bvbfan reviewed Oct 30, 2018

View reviewed changes

lbrynaut and others added 6 commits November 2, 2018 14:32

Implement a hard fork to normalize claim names

2d35509

Apply (self) review feedback Clean deps required for boost to rebuild with icu support (for now) Normalization bug fixes and improvements Clang-formatting

No longer abort on removal from claimtrie in the case of expiration

b281e55

where normalization is enabled and the claim no longer exists (due to normalization related narrowing)

removed duplicate trie in RAM, other norm fixes

0c029cc

also includes code to validate incoming utf8

moved normalization from claimTrie; all in cache now

e22f5da

also fixed a few post-merge issues

added handling for support normalization

a4a498b

made expiration at norm time smarter

d9de9e4

also rearranged unit test code to avoid some spurious errors

BrannonKing force-pushed the normalization-rewrite branch from 65867d5 to d9de9e4 Compare November 2, 2018 20:39

BrannonKing requested changes Nov 2, 2018

View reviewed changes

reproducible_build.sh Outdated Show resolved Hide resolved

bvbfan reviewed Nov 5, 2018

View reviewed changes

shyba mentioned this pull request Nov 5, 2018

Support unicode URIs lbryio/lbry-sdk#1591

Closed

BrannonKing added 5 commits November 6, 2018 16:53

made ICU compile statically

b244df6

also fixed early exit in reproducible_build script

fixed failure to handle unnormalized items reinserted on rollback

2d2ceb8

fixing ICU deps for Travis build

733658a

get more info on Travis build failure

93187e4

Travis experiment 2: ICU fPIC

686253a

This was referenced Nov 8, 2018

The main normalization branch that needs review pronto #235

Merged

Add encoding for binary data returned from RPC, make upstream handle it and UTF8 #236

Closed

BrannonKing approved these changes Nov 8, 2018

View reviewed changes

BrannonKing closed this Nov 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a hard fork to normalize claim names #159

Implement a hard fork to normalize claim names #159

lbrynaut commented Jun 12, 2018 •

edited

Loading

lbrynaut commented Jun 26, 2018 •

edited

Loading

BrannonKing commented Oct 22, 2018

bvbfan Oct 30, 2018

BrannonKing Oct 30, 2018 •

edited

Loading

bvbfan Oct 30, 2018

bvbfan Nov 5, 2018

BrannonKing Nov 5, 2018

bvbfan Nov 5, 2018

bvbfan Nov 5, 2018

BrannonKing commented Nov 8, 2018 •

edited

Loading

BrannonKing commented Nov 22, 2018

Implement a hard fork to normalize claim names #159

Implement a hard fork to normalize claim names #159

Conversation

lbrynaut commented Jun 12, 2018 • edited Loading

lbrynaut commented Jun 26, 2018 • edited Loading

BrannonKing commented Oct 22, 2018

bvbfan Oct 30, 2018

Choose a reason for hiding this comment

BrannonKing Oct 30, 2018 • edited Loading

Choose a reason for hiding this comment

bvbfan Oct 30, 2018

Choose a reason for hiding this comment

bvbfan Nov 5, 2018

Choose a reason for hiding this comment

BrannonKing Nov 5, 2018

Choose a reason for hiding this comment

bvbfan Nov 5, 2018

Choose a reason for hiding this comment

bvbfan Nov 5, 2018

Choose a reason for hiding this comment

BrannonKing commented Nov 8, 2018 • edited Loading

BrannonKing commented Nov 22, 2018

lbrynaut commented Jun 12, 2018 •

edited

Loading

lbrynaut commented Jun 26, 2018 •

edited

Loading

BrannonKing Oct 30, 2018 •

edited

Loading

BrannonKing commented Nov 8, 2018 •

edited

Loading