Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace use of AppCache PROJECT section with strongly-typed structures #1415

Merged
merged 13 commits into from Mar 18, 2019

Conversation

Projects
None yet
5 participants
@cyrossignol
Copy link
Member

commented Mar 9, 2019

To ease access to project whitelist data, these changes introduce new data structures in the neural network namespace and a lock-free API for storing and loading that data. The PR removes the PROJECT section from the AppCache and updates any routines which accessed the data in that section to use the new API.

This lays some groundwork for upcoming changes that need thread-safe access to the whitelist data. It also improves the performance and thread-safety of the scraper by avoiding whitelist copies and by removing the need to synchronize access to the AppCache for that information. As a side benefit, the implementation removes some old project URL rules that became obsolete with the new scraper.

This is possibly one in a series of updates that move the project away from the use of the AppCache--a global, generic pool of string data--to more meaningful types with a greater degree of thread-safety and less overhead.

@denravonska

This comment has been minimized.

Copy link

commented on src/neuralnet/neuralnet.cpp in 9d9a976 Mar 8, 2019

You could extract this to an IsContractTypeValid, so this and Delete becomes something like:

if(!IsContractTypeValid(type))
   return false;

whitelist.Add(key, value, timestamp);
return true;

This comment has been minimized.

Copy link
Owner Author

replied Mar 8, 2019

Ah, yeah, good suggestion. They're just basic wrappers for now. I was thinking these functions might handle/forward more than project contracts, though, so it gets into something like:

if (type == "project" || type == "projectmapping") {
    whitelist.Add(...);
} else if (type == "beacon") {
    beacons.Add(...);
} else if ... 

We could give those types a ContractHandler interface of some sort, so AddContract() and DeleteContract() just need to do something like:

IContractHandler handler = GetContractHandler(type);

if (handler /* is null or however you check that in C++ */) {
    return false;
}

handler.Add(...);
@cyrossignol

This comment has been minimized.

Copy link
Owner Author

commented on src/scraper/scraper.cpp in c024aaf Mar 8, 2019

Note: it was impossible for this to fail (return false).

@cyrossignol

This comment has been minimized.

Copy link
Owner Author

commented on src/main.cpp in c024aaf Mar 8, 2019

Conditional contract message handling gives us a way to gradually refactor sections of data out of the AppCache.

@cyrossignol

This comment has been minimized.

Copy link
Owner Author

commented on src/scraper/scraper.cpp in c024aaf Mar 8, 2019

Global Whitelist facade should eliminate a copy from the AppCache each Scraper loop iteration.

@cyrossignol

This comment has been minimized.

Copy link
Owner Author

commented on src/scraper/scraper.cpp in c024aaf Mar 8, 2019

Global Whitelist facade should eliminate several local copy operations for lock avoidance in local routines.

@cyrossignol

This comment has been minimized.

Copy link
Owner Author

commented on src/scraper/scraper.cpp in c024aaf Mar 8, 2019

We found that the Contains() calls in the conditionals were returning false every time, so the scraper wasn't following the old rules. It appears to work anyway:

WCG doesn't fetch the ".xml.gz" file (it works because WCG provides a file with just a ".gz" extension as well now), and Einstein appears to allow http as well as https libcurl is following the redirect (curl_easy_setopt(curl.get(), CURLOPT_FOLLOWLOCATION, 1L); in scraper/http.cpp -- the old NN must not have). The "gorlaeus" check was for Leiden Classical which is no longer whitelisted.

This comment has been minimized.

Copy link
Owner Author

replied Mar 9, 2019

Note: we may want to change the Einstein URL from

http://einstein.phys.uwm.edu/@

...to

https://einsteinathome.org/@

...so the scraper doesn't send two requests each time it needs to contact the project.

@jamescowens jamescowens requested review from jamescowens and denravonska Mar 9, 2019

@jamescowens

This comment has been minimized.

Copy link
Member

commented Mar 9, 2019

I am in favor of the contract handler interface if we are going to generalize this approach for other sections (which I think is wise BTW).

@jamescowens
Copy link
Member

left a comment

cycy has already been testing this and it is working well. I like the idea of using a contract handler interface...

@jamescowens

This comment has been minimized.

Copy link
Member

commented Mar 9, 2019

We will need to get quez to change the URL with an appcache entry for the Einstein project.

@jamescowens jamescowens added this to the Denise milestone Mar 9, 2019

//! The internal implementation uses this class to sort containers of \c Project
//! instances. \c std::sort() cannot sort a collection of \c const objects.
//!
struct MutableProject

This comment has been minimized.

Copy link
@denravonska

denravonska Mar 11, 2019

Member

I don't think we need MutableProject. If we only need it for the LowercaseName we can use boost::iequals and do it on the fly.

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 11, 2019

Author Member

I'm working on a way to get rid of MutableProject. It mainly exists because I declared the member fields of Project as const, so std::sort() can't operate on the vector because it wants to move the values into memory marked immutable as it rearranges the items. LowercaseName() might speed up the sort comparison because each project name is compared multiple times, but we sort the set so infrequently that it doesn't really matter...would like to get rid of the extra struct.

This comment has been minimized.

Copy link
@jamescowens

jamescowens Mar 11, 2019

Member

This may be an idiotic question but why do the member fields need to be const?

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 11, 2019

Author Member

They...probably don't. I wanted to guarantee that calling code can't modify the contract data in a Project. The interface of WhitelistSnapshot seems to prevent this from happening. Need to find some time to verify that. Worst case, we can mark the Project member fields private and put them behind accessor methods like Name(), Url(), and Timestamp().

This comment has been minimized.

Copy link
@jamescowens

jamescowens Mar 11, 2019

Member

I would rather do that, and then you can do a sort on the vector.

Show resolved Hide resolved src/neuralnet/project.cpp Outdated
//! \param url Project URL from contract message value.
//! \param ts Contract timestamp.
//!
Project(const std::string name, const std::string url, const int64_t ts);

This comment has been minimized.

Copy link
@denravonska

denravonska Mar 11, 2019

Member

Pass non-POD arguments by reference or some compilers will create copies, see https://godbolt.org/z/ARmoPE. Unless that's what we want here due to thread synchronization.

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 11, 2019

Author Member

Admittedly, I don't fully understand pass-by-value, pass-by-reference, copy, and move semantics in C++ yet. I'm following the guidelines from here:

  • If the function intends to change the argument as a side effect, take it by non-const reference.
  • If the function doesn't modify its argument and the argument is of primitive type, take it by value.
  • Otherwise take it by const reference, except in the following cases
    • If the function would then need to make a copy of the const reference anyway, take it by value.

Based on this advice, I think I need to pass by value because Project takes ownership of those arguments. We already have to create copies of these values somewhere up the call stack when we extract the strings from the transaction message, so passing by value here enables the compiler to optimize the call by moving the string arguments instead of copying them. If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.

I see the extra allocation in the assembly from the tool you linked, but I can't seem to construct an example to test the above... the compiler seems to optimize the entire thing out. Sorry for being dense--still trying to learn this stuff. Are the recommendations that I'm following reasonable?

This comment has been minimized.

Copy link
@denravonska

denravonska Mar 12, 2019

Member

If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.

Bloody hell, you're correct. https://www.codesynthesis.com/~boris/blog/2012/06/19/efficient-argument-passing-cxx11-part1/. You live, you learn. It even looks like a modern compiler (gcc-8) will do the std::move for you while an older one (gcc-6.3) requires some guidance from the developer.

This comment has been minimized.

Copy link
@denravonska

denravonska Mar 12, 2019

Member

btw, are you sure you can pass it as const if its content shall be moved? Looking at godbolt gcc-8 seems to figure it out and moves/optimizes it anyway whereas gcc-6.3 still creates a copy.

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 12, 2019

Author Member

@denravonska From what I understand, a const parameter doesn't block the move optimization. I spent way too much time trying to find information about this because it's rather confusing, and I couldn't find anything definitive in the case of const. I ended up building a small program to try to figure this out. Here's the way that I see it--I might be entirely wrong, so please let me know if I've made any mistakes or false judgements:

Let's simplify the the signature for the Project constructor so we're not talking about std::string:

Project(const Name project_name);

We now initialize Project by passing it a Name object, which might hold a bunch of data that's expensive to copy. Here's the key point that took me a long time to realize: the compiler needs to allocate project_name before it gives it to the constructor of Project, so the decision to move or copy happens before binding the object to the project_name parameter no matter if it's const or not.

For moving to work, Name needs an implicitly or explicitly defined move constructor that takes an rvalue reference from another instance of the same type:

Name(const NameClass& from); // copy constructor, vs:
Name(NameClass&& from);      // move constructor

As an intermediate step, the compiler creates a Name value using one of these constructors from the passed value before binding it to the function. If a valid move constructor exists, the compiler can choose to use it when the value received by Project() is an rvalue reference, like a return value from a function:

Name GetProjectName() 
{
    Name project_name;
    project_name.value = "einstein@home";

    return project_name; // returns an rvalue reference to project_name
}

Project project(GetProjectName()); 

As shown above, the compiler needs to construct a Name object from the return value of GetProjectName() to pass to the Project constructor. Normally, return-value optimization kicks in, so an extra instantiation isn't needed, but if it doesn't, the next best choice is move construction.

This doesn't work for more common lvalues:

Name project_name = GetProjectName();
Project project(project_name); // copy!

Here, the return value is assigned to a variable, so it becomes an lvalue. If we try to pass that to the Project constructor, the compiler does a normal copy. Traditionally, it seems like C++ gets around this by using references:

Project::Project(const Name& project_name) : m_name(project_name) { ... }

By passing a reference to the constructor, the compiler doesn't add an immediate copy operation. However, because Project needs to take ownership of the value of project_name, we have to copy the value behind the reference into the m_name member field anyway.

It seems that C++11 provides the std::move() utility to work around these cases. The function converts an lvalue variable into an xvalue for functions that take rvalues to enable move construction (value...value...value...whew!).

Name project_name = GetProjectName();
Project project(std::move(project_name)); // project_name resources moved here

Our previous discussion above about Project's internal const fields illustrates how const does block moves by preventing the compiler from moving values into and out of the object's member fields. But for function parameters, the move occurs before the object binds to the const parameter.

The Whitelist data structures don't really benefit from move optimizations because their copy-on-write design requires exactly the opposite functionality, but I want to make sure I understand these concepts for any other code that I write. Similarly, moving doesn't work below--the compiler falls back to a copy of the const variable:

const Name project_name = GetProjectName();
Project project(std::move(project_name)); // copy!

CC: @jamescowens - would like your eyes on this too for fact checking 😅

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 17, 2019

Author Member

@philipswift No claims of sanity were made here 🙂

This comment has been minimized.

Copy link
@tomasbrod

tomasbrod Mar 18, 2019

Member

I fail to find cases where passing by const value makes sense. By-value will create copy, no matter const, unless the function is inlined. But const pointer or const reference is good and useful.

This comment has been minimized.

Copy link
@tomasbrod

tomasbrod Mar 18, 2019

Member

In your last example, @cyrossignol , std::move turns & into &&. But it does not change const-ness. Since there is no ctor(const T&&), && decays to & and ctor(const T&) is used. Which is copy ctor.

This comment has been minimized.

Copy link
@tomasbrod

tomasbrod Mar 18, 2019

Member

Passing non-const value makes sense for values smaller than pointer and for values that will be moved inside the function. But move operation is not free either. Referencing is cheapest.
Impact of all this is reduced for code that is all in one translation unit. Because then the compiler may use (partial) inlining and do these optimizations for you.

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 18, 2019

Author Member

Thanks, @tomasbrod -- I appreciate you taking the time to explain...this was one of the more confusing parts about learning C++ for me, and I'm finally starting to understand it.

During my testing, I was thrown-off by what appeared to be the compiler optimizing-away the copy for some (non-trivial) const by-value arguments, but I noticed it doesn't happen every time. Also, it seems as if moving into a const argument works just fine because there's often a compiler-generated move constructor. It's just for moving out of a const value that the compiler should perform a copy, but I think I observed that it doesn't always (maybe when it can avoid it).

I asked @denravonska about it on Slack:

All this makes sense to me so far, and I've settled on the belief that we can't use const if we want moves to work. There seems to be a conflict between const-correctness and move semantics. I'd like to know your opinion: is one more important than the other in the design of good C++ code? It makes sense to declare a parameter const if the function should not modify it, but we could cause an unwanted copy because the compiler really isn't supposed to move out of a const value. On the other hand, is it a bad form to omit const just to enable moves? Do you think there's a best practice?

...and he gave me similar advice: as a rule-of-thumb, non-const by-value, or const by-ref.

So, based on both of your suggestions, I'll write functions with:

  • non-const by-value parameters when the function needs to copy to take ownership anyway
  • const& parameters otherwise
  • mutable references or pointers only where strictly necessary

I've updated the code in this PR based on your recommendations. We can review in the next iteration.

//!
//! \brief Smart pointer around a collection of projects.
//!
typedef std::shared_ptr<std::vector<Project>> ProjectListPtr;

This comment has been minimized.

Copy link
@denravonska

denravonska Mar 11, 2019

Member

It could be a good idea to also typedef std::vector<Project> ProjectList for readability later on.

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 11, 2019

Author Member

I like that--will add to the next version.

This comment has been minimized.

Copy link
@jamescowens

jamescowens Mar 11, 2019

Member

Yep, similar to the kind of stuff I did in the scraper...

Show resolved Hide resolved src/neuralnet/project.h
const std::string& value,
const int64_t& timestamp)
{
if (type == "project" || type == "projectmapping") {

This comment has been minimized.

Copy link
@tomasbrod

tomasbrod Mar 14, 2019

Member

what is projectmapping?

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 14, 2019

Author Member

Looks like an old contract type name used before "project"... I didn't see any instances of that one on testnet... need to scan mainnet--if it doesn't exist there either, we can remove it.

This comment has been minimized.

Copy link
@jamescowens

jamescowens Mar 14, 2019

Member

Yep. Carried over from the old code.... probably vestigial.

return BaseUrl() + "stats/";
}

return BaseUrl() + "stats/" + type + ".gz";

This comment has been minimized.

Copy link
@tomasbrod

tomasbrod Mar 14, 2019

Member

This is wrong assumption. Project stats urls are not required to adhere to .../stats/type.gz .

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 14, 2019

Author Member

Agreed--until we change the format of URLs in the project contracts, I'm just doing it similarly to the way the old code does.

//! data with snapshots until the application mutates the data by adding or
//! removing a project. During mutation, a \c Whitelist object copies its data
//! and stores that behind a new smart pointer as an atomic operation, so any
//! existing \c WhitelistSnapshot instances become obsolete.

This comment has been minimized.

Copy link
@tomasbrod

tomasbrod Mar 14, 2019

Member

What is the use case of mutable whitelist snapshot? Only admin message processing should modify whitelist. And that happens under mutex lock.

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 14, 2019

Author Member

A snapshot is immutable. The comment about mutation applies only to the step that loads contracts from transaction messages. I'm trying to make it easier and faster to access the data by avoiding the lock on a mutex.

This comment has been minimized.

Copy link
@tomasbrod

tomasbrod Mar 14, 2019

Member

If it was me, I would KISS and just copy the structure to scraper when it needs to run.

This comment has been minimized.

Copy link
@cyrossignol

cyrossignol Mar 14, 2019

Author Member

Thanks @tomasbrod. I'm trying to KISS for the calling code that actually uses the project data. By abstracting the the details behind these classes, consumers don't need to worry about synchronization or efficient memory management. I'm working on beacon stuff that needs project data, and we could present it in other areas or features as well. The project data changes very infrequently, so I couldn't justify copying it every time something uses it 😄

@tomasbrod

This comment has been minimized.

Copy link
Member

commented Mar 14, 2019

lock-free API

shared_ptr contain mutex, therefore it is not lock free

I appreciate this patch. It will be useful to have Project and Whitelist data structures for future changes to reward calculation. Not limited to the NN namespace.

@jamescowens

This comment has been minimized.

Copy link
Member

commented Mar 14, 2019

Are you sure that is true Tomas? From https://en.cppreference.com/w/cpp/memory/shared_ptr

All member functions (including copy constructor and copy assignment) can be called by multiple threads on different instances of shared_ptr without additional synchronization even if these instances are copies and share ownership of the same object. If multiple threads of execution access the same shared_ptr without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur; the shared_ptr overloads of atomic functions can be used to prevent the data race.

If their was an implicit mutex then why would all of those cautions be necessary, and why the overloads of atomic functions?

@cyrossignol

This comment has been minimized.

Copy link
Member Author

commented Mar 14, 2019

@tomasbrod The information that I read while trying to figure out how to do this suggests that the implementation of shared_ptr uses atomics for reference counting when the target platform supports them--otherwise, it falls back to using a mutex. The data wrapped by the pointer requires its own manual synchronization if needed. I'm pretty new to C++, so maybe I misunderstood.

@jamescowens

This comment has been minimized.

Copy link
Member

commented Mar 14, 2019

Anyway I agree with @tomasbrod this a hell of an improvement over the old code and will be a good basis for extension. @cyrossignol let's get this cleaned up so we can move towards a merge.

@cyrossignol

This comment has been minimized.

Copy link
Member Author

commented Mar 17, 2019

I think I've addressed all the reviews. Thanks to @denravonska, we have tests for these changes. I filled in the gaps there as requested.

I'll push the move to a contract handler interface after this. The scope is a little more involved, so it will go into a separate PR.

@jamescowens

This comment has been minimized.

Copy link
Member

commented Mar 17, 2019

@cyrossignol I don't necessarily think the PR merge needs to wait on this, but listdata needs to be fixed up to use the new structure for keytype "project" so that listdata continues to work regardless of whether in the old appcache structure or new. This will have to be extended each time we move a section out of the appcache.

@jamescowens

This comment has been minimized.

Copy link
Member

commented Mar 17, 2019

@cyrossignol, @denravonska and I discussed this and we have agreed that we actually should retire the listdata call as we peel off sections into their own structures. This means we should implement separate calls for each of the former "sections". You had a whitelist test rpc call that you removed that does this perfectly for this first section, project. I would recommend renaming it to getprojects. I think we should stick with the naming convention get where is the old section name.

Please do this and I will merge the PR.

@jamescowens jamescowens merged commit d407125 into gridcoin-community:development Mar 18, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

denravonska added a commit that referenced this pull request May 10, 2019

4.0.3.0-leisure
Added:
 - Replace NeuralNetwork with portable C++ scraper #1387 (@jamescowens,
   @tomasbrod, @Cycy, @TheCharlatan, @denravonska).
 - Allow compile flags to be used for depends #1423 (@G-UK).
 - Add stake splitting and side staking info to getmininginfo #1424
   (@jamescowens).
 - Add freedesktop.org desktop file and icon set #1438 (@a123b).

Changed:
 - Disable Qt for windows Travis builds #1276 (@TheCharlatan).
 - Replace use of AppCache PROJECT section with strongly-typed structures #1415
   (@cyrossignol).
 - Change dumpwallet to use appropriate data directory #1416 (@jamescowens).
 - Optimize ExtractXML() calls by avoiding unnecessary string copies #1419
   (@cyrossignol).
 - Change signature of IsLockTimeWithinMinutes #1422 (@jamescowens).
 - Restore old poll output for getmininginfo RPC #1437 (@a123b).
 - Prevent segfault when using rpc savescraperfilemanifest #1439 (@jamescowens).
 - Improve miner status messages for ineligible staking balances #1447
   (@cyrossignol).
 - Enhance scraper log archiving #1449 (@jamescowens).

Fixed:
 - Re-enable full GUI 32-bit Windows builds - part of #1387 (@jamescowens).
 - Re-activate Windows Installer #1409 (@TheCharlatan).
 - Fix Depends and Travis build issues for ARM #1417 (@jamescowens).
 - Fix syncupdate icons #1421 (@jamescowens).
 - Fix potential BOINC crash when reading projects #1426 (@cyrossignol).
 - Fix freeze when unlocking wallet #1428 (@denravonska).
 - Fix RPC after high priority alert #1432 (@denravonska).
 - Fix missing poll in GUI when most recent poll expired #1455 (@cyrossignol).

Removed:
 - Remove old, rudimentary side staking implementation #1381 (@denravonska).
 - Remove auto unlock #1402 (@denravonska).
 - Remove superblock forwarding #1430 (@denravonska).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.