New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elimination of g++ -Weffc++ warnings #764
Conversation
document.h bugfixes - operator++ - document.h - noninitialized members - document_stream load_many trailing whitespace - implementation.h - missing virtual destructor - whitespaces (not me my editor) - parsedjson_iterator.h - operator= - document_stream.h - trailing blank - noninitialized members - document.h - trailing witespace - noninitialized members - operator++ - parsedjson_iterator.h - noninitialized members json_minifier.h - noninitialized members json_scanner.h - noninitialized members - trailing space - json_structural_indexer.h - noninitialized members - trailing space - stage2_build_tape.h - noninitialized members
@ostri Can you have a look at the failing tests? (In CI... see on the PR page on GitHub.) |
I think that there are good things in this PR, but the way you change the constructors is, in my opinion, for the worse. Let us look at two examples. You change this... (modern C++ code from @jkeiser) really_inline parser::parser(size_t max_capacity) noexcept
: _max_capacity{max_capacity}, loaded_bytes(nullptr, &aligned_free_char) {} into this... (C++ 'from the 1990s', Meyer's style)
Then you change this... (modern C++ code from @jkeiser) really_inline document_stream::document_stream(
dom::parser &_parser,
const uint8_t *buf,
size_t len,
size_t batch_size,
error_code _error
) noexcept : parser{_parser}, _buf{buf}, _len{len}, _batch_size(batch_size), error{_error} {
if (!error) { error = json_parse(); }
} into this... (C++ 'from the 1990s', Meyer's style)
These changes you make are semantically identical (+/- details that we can discuss), guaranteed by the C++ standards, to the shorter and cleaner version that @jkeiser wrote. There is absolutely no good reason whatsoever to explicitly call the default constructors on the attributes: this is what the standard does by default. The constructors should only initialize the methods that require an initialization different from the default constructor. Now I code like Meyer's... But I am old. I am old you would not believe. I remember when using C++ (instead of C) was working with an immature, dangerous, controversial programming language. "What is it, Daniel, is Fortran not good enough for you?" So Meyer gave us rules, and we followed them blindly, like faith. And the C++ language was and remained ugly. Until younger folks or smarter folks came along and produced nicer looking and more maintainable code. So will suggest that we leave the constructors alone, the way @jkeiser wrote them. I would have written them the way you do, but I am wrong. John's way is better, nicer. It would be a software engineering mistake to go back to the old style. |
@jkeiser Since this is (mostly) your code, I have asked a review of this PR from you. Please see my note above where I state my belief that your coding style with respect to constructors is better and should not be reverted to the 1990s C++ style that I tend to use. |
Further consideration: we compile with We further run tests with sanitizers. If, somehow, we accessed an initialized value, we would know. These are tools that Meyer did not have in the 1990s. |
So, again, there is a lot of good in this PR, my beef is only with the way you change the constructors... Let me be specific... In include/simdjson/inline/document_stream.h, maybe because a tool told you to do so, you added the following to the constructor...
but look at the class definition... simdjson/include/simdjson/document_stream.h size_t buf_start{0};
size_t next_json{0};
bool load_next_batch{true};
size_t current_buffer_loc{0};
size_t last_json_buffer_loc{0};
size_t n_parsed_docs{0};
size_t n_bytes_parsed{0}; All these variables are defined as part of the class declaration. This wasn't possible when Meyer wrote his books, but it is vastly better and highly recommended. Next, in src/generic/json_structural_indexer.h, you added the following to the constructor...
but look at the class declaration, how the attributes are declared and defined... uint64_t prev_structurals = 0;
uint64_t unescaped_chars_error = 0; |
What about the default constructors you added to the initialization lists? So we don't need to do it and, in fact, it adds noise to do it. I know that this goes against Meyer... but Meyer retired in 2015 and his books were written in the 1990s. |
@ostri So I recommend you revert back the changes you made to the constructors. If you feel that there is a mistake, a fault in there, please be specific. It is not enough that a tool or a static analyzer flagged it... we need to look at it and understand the mistake. |
amalgamation.sh
Outdated
@@ -173,7 +173,7 @@ SINGLEHDR=$SCRIPTPATH/singleheader | |||
echo "Copying files to $SCRIPTPATH/singleheader " | |||
mkdir -p $SINGLEHDR | |||
echo "c++ -O3 -std=c++17 -pthread -o ${CPPBIN} ${DEMOCPP} && ./${CPPBIN} ../jsonexamples/twitter.json ../jsonexamples/amazon_cellphones.ndjson" > $SINGLEHDR/README.md | |||
cp ${AMAL_C} ${AMAL_H} ${DEMOCPP} $SINGLEHDR | |||
cp --remove-destination --preserve=all ${AMAL_C} ${AMAL_H} ${DEMOCPP} $SINGLEHDR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of remove-destination here, for my edification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that these flags (--remove-destination --preserve=all) are POSIX. They won't work on macOS and other system. I suspect that they are GNU-specific.
This certainly breaks the script on macOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem is/was that
And there were places where members were not initialized and complier complained about that.
Where? Maybe I missed them. For the cases that I looked at, there was initialization.
For example:
/** @Private Structural indices passed from stage 1 to stage 2 */
std::unique_ptr<uint32_t[]> structural_indexes;
If you want to have whole list, try to compile with -Wellc++. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr<uint32_t[]> structural_indexes;
This is not a valid error. The structural_indexes class instance will get initialized with its default constructor. This is guaranteed by the standard.
I know that -Weffc++ throws a ton of warnings, but most are invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "--remove-destination --preserve=all" is not portable and it seems unrelated to this PR. If you have an issue with the script, please let us discuss it elsewhere.
Problem was, that the files were not overwriten. This is similar to -f, but not equal.
From: John Keiser <notifications@github.com>
Sent: Wednesday, April 22, 2020 4:34 PM
To: simdjson/simdjson <simdjson@noreply.github.com>
Cc: Ostroversnik Matjaz <Matjaz.Ostroversnik@snt.si>; Mention <mention@noreply.github.com>
Subject: Re: [simdjson/simdjson] elimination of g++ -Weffc++ warnings (#764)
@jkeiser commented on this pull request.
________________________________
In amalgamation.sh<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsimdjson%2Fsimdjson%2Fpull%2F764%23discussion_r413036440&data=02%7C01%7Cmatjaz.ostroversnik%40snt.si%7C5c80d2a325da454e5a9108d7e6ca2a4f%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637231628274812454&sdata=GzWQGSnhlQkGMlz%2BQM1iH%2FztsqZqCurUMyKldjA1LGg%3D&reserved=0>:
@@ -173,7 +173,7 @@ SINGLEHDR=$SCRIPTPATH/singleheader
echo "Copying files to $SCRIPTPATH/singleheader "
mkdir -p $SINGLEHDR
echo "c++ -O3 -std=c++17 -pthread -o ${CPPBIN} ${DEMOCPP} && ./${CPPBIN} ../jsonexamples/twitter.json ../jsonexamples/amazon_cellphones.ndjson" > $SINGLEHDR/README.md
…-cp ${AMAL_C} ${AMAL_H} ${DEMOCPP} $SINGLEHDR
+cp --remove-destination --preserve=all ${AMAL_C} ${AMAL_H} ${DEMOCPP} $SINGLEHDR
What is the purpose of remove-destination here, for my edification?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsimdjson%2Fsimdjson%2Fpull%2F764%23pullrequestreview-398254733&data=02%7C01%7Cmatjaz.ostroversnik%40snt.si%7C5c80d2a325da454e5a9108d7e6ca2a4f%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637231628274812454&sdata=zjH6sVa2twZUXeYxBJGlT3ui4PeZBOm26dqhZH6em90%3D&reserved=0>, or unsubscribe<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADDYSKS7CIC4ADKE5NJ5XQTRN354TANCNFSM4MOCIHZQ&data=02%7C01%7Cmatjaz.ostroversnik%40snt.si%7C5c80d2a325da454e5a9108d7e6ca2a4f%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637231628274822409&sdata=gIBMGt2K8aO%2BM%2FCgFPPtsStSxDb2hM08OFtBksyo9vI%3D&reserved=0>.
|
Do you mean to say that cp sometimes silently doesn't copy the file? What situations does this happen in? Does it throw an error? I'd rather not add flags unless they're causing problems, but clearly you are running into one :) |
Daniel,
I'd like to apoligise, since I didn't reply to set of your mails, but I was busy fixing my last mistake (I've tested with Weffc++ only, and without Wall and consequentialy there were warnings).
At this point in time I have a version that:
* Has proper operator++
* Has no plain destructors, when virtual metods are present,
* Proper assignement operator when there are pointer members
* Members are properly initialized
* Something strange 😊
/home/ostri/v-json/simdjson/simdjson.cpp: In constructor ‘simdjson::westmere::stage1::json_minifier::json_minifier(uint8_t*)’:
/home/ostri/v-json/simdjson/simdjson.cpp:4763:56: note: synthesized method ‘constexpr simdjson::westmere::stage1::json_scanner::json_scanner()’ first required here
4763 | really_inline json_minifier(uint8_t *_dst) : dst{_dst} {}
* And it compiles with -Wall -Wextra -Weffc++
* Script is fixed
* You are aware of triplicaiton of the code in simdjson.?pp – don't know if it mistake or not.
One thing is not fixed:
/home/ostri/v-json/simdjson/simdjson.cpp:12260:17: warning: ‘simdjson::westmere::stage2::structural_iterator::c’ should be initialized in the member initialization list [-Weffc++]
In file included from /home/ostri/v-json/simdjson/simdjson.cpp:2:
/home/ostri/v-json/simdjson/simdjson.h: In instantiation of ‘T* simdjson::internal::atomic_ptr<T>::operator=(T*) [with T = const simdjson::implementation]’:
/home/ostri/v-json/simdjson/simdjson.cpp:564:82: required from here
/home/ostri/v-json/simdjson/simdjson.h:3844:40: warning: ‘operator=’ should return a reference to ‘*this’ [-Weffc++]
3844 | T* operator=(T *_ptr) { return ptr = _ptr; }
| ^~~~
If you wish I can submit it. If you don't, no problem.
1. …old vs new syntax…
, <member>()
Is my favorite syntax, since it is easier to manage code. If you prefer some other format, that is fine with me.
1. Initialisation of the members
It is good to have some discipline, either in constructor or with members. Problem is only if it is not initialized at all.
And there were places where members were not initialized and complier complained about that. 😉
1. Initialisation with default value.
I completelly agree with you that, compiler initializes by itself and it is not necessary to do it by the programmer.
I put it there only to be sure that the all members are there. It is possible to to remove them now, if you think so.
My motivation for the changes was to make header file warning less, so that I can concentrate on my nistakes and not on the library ones. 😉
Let me conclude, if I were runing the project I would apply that. But since I am not, you will decide by yourself. 😊
Best regards
Matjaž
|
It is likely that if it happens, then an error will get displayed. And there might be good reasons why the files cannot be overwritten. Unfortunately, the script will keep going even after an error. A better solution would be to be make the script robust by trapping errors. |
Where? Maybe I missed them. For the cases that I looked at, there was initialization. |
Are you referring to the code reuse between the three kernels? If so, this is very deliberate, yes. We have three distinct kernels (on x64) and @jkeiser worked hard to maximize code reuse. |
Ah. That's one case where we do not appear to have constructor-time initialization. Thank you for this report. I will fix it in its own PR. |
Regarding constructors, I do prefer having members be initialized at the point of declaration where possible. I'm bringing my prejudices as a language polyglot, replicating the things I've liked the most; I've learned, though, that it's better to work with the idioms used by those who live and breathe the language. In this case, if the contributors here are happy with inline member initializers, I certainly am too :) However, I see value in not warning when the user compiles with -Weffc++. We encourage people to drop .cpp and .h files into their projects, so being warning free for as many flags as we can manage is good (as long as it doesn't impose an insane cost, which this really doesn't seem to). NOTE that being warning free could easily mean disabling these warnings within the simdjson headers, too! If we decide we don't want to follow a particular warning, that's our choice, and we can keep the warnings from showing up in other peoples' code. StackOverflow doesn't show a way to turn off just some parts of -Weffc++ :( |
Pretty sure you are planning to do this anyway, but it'd be really good to know why -Wuninitialized didn't catch this! |
Because the compiler can prove that we always call structurals.advance_char() first before using structurals. Try to avoid or avoid advance_char() and you won't be able to build. So our code is safe. This is a pedantic change, but I am happy to make it. It is prettier anyhow. |
Guys do you have some conferencing system? I can't answer to two streams of mail. |
Give me a minute. I think we can silence these warnings with modern C++. |
@ostri oh, I understand why you're seeing triplification. There is only one copy of the triplicate code, in src/generic/*. But we include them once per implementation (haswell, westmere and fallback), just putting them in different namespaces, because we need the code to recompile with different instruction sets each time, and C++ can't do that unless you duplicate the code. So the source code is not duplicated, but when it reaches simdjson.h it is. I suggest resolving -Weffc++ warnings by compiling against the original source and headers to make this easier on yourself. You can just use -Isimdjson/include -Isimdjson/src and your same code should compile fine--just don't copy in simdjson.h and simdjson.cpp. (@lemire noted this too, this is just a longer explanation :)) |
@ostri the easiest thing to do is check github notifications or go to the comment thread here on github, I couldn't keep up with mail streams either :) |
If you don't mind sticking the comma at the end of the line, let's make it consistent with everything else. I really don't have much of a dog in this hunt, but it's a lot easier to do this than go change all the other multiline lists to use beginning-of-line commas :) |
Actually I did exactly like you suggested.
|
Not quite the same--you might be able to save some time here. If you want it to just directly give you the right file and line #, you can compile against the non-amalgamated headers. For example, from the root of the simdjson repo, this is a super easy way to do it: g++ -Weffc++ -Iinclude -Isrc examples/quickstart/quickstart.cpp src/simdjson.cpp The way you are doing it is accurate and fine, just trying to make your life easier :) |
One thing we're missing if this does make us -Weffc++ clean, is adding -Weffc++ with all the other -Wall -Wextra etc. flags in CMakeLists.txt. That way we don't regress this. |
done. But never the less the script is not cosher:
|
@jkeiser I agree, but I don't know how the semantically equivalent parameter for clang. If I put pedantic, then calculated goto warnings are reported. |
@ostri OK, understood. If there's a way to enable only for g++ it still seems like a good idea. Alternately, we can add code to disable weffc++ warnings in the computed goto locations (there's only a few of them). But if we can't figure it out, this is still good for moving forward! |
Weffc++ helped me a lot in the past. I would be glad if you can put it into the build process, so that it reports if there is something fishy. You are starting now, but when code base grows, you'll need all the help machines can provide you. Sanitizers are great, but not the silver bullet. build failed again, at the same point. I can not repeat it on my machine, nither with g++, nither with clang++. |
Looks like there is a reliable way to check for gcc in cmake: https://stackoverflow.com/questions/10046114/in-cmake-how-can-i-test-if-the-compiler-is-clang |
Great. So you can make basic build with clang and amalgamation in gcc or vice versa. This way more errors can be detected, and the sources are 100% portable too. |
@ostri To fix the CI issues, could you sync your fork? It should be as simple as this... (some steps may be unnecessary)
|
@ostri It should build if you add -Weffc++ even on clang. If it does not, we can figure it out. Please add the warning flag. |
Yes, but that's a distinct issue. We can certainly improve it... but not in this PR. |
So please...
|
¸This one fails.
Please make sure you have the correct access rights |
It compiles, but it actually do nothing. I.e. code with .Weffc++ warnings compile without the warnings. |
Actually this is ok. I do not have rights to write to
my upstream is
|
That is still good enough :) We do CI runs against gcc and clang both, so we'll catch regressions on the gcc runs! |
I put it in. With clang it works (it does nothing). But it fails with g++, because of the last remaining warning of Weffc++.
I tried some tweaks related to this operator, but it fails within the code, where I am not so sure that I understand the semantics. If I commit this to repository I'll break the build. |
What you added is fine. However, what was there before was also cross-platform. Doing...
is entirely standard... In fact, it is even more portable because CXX and CC will be recognized by virtually all build systems (make, cmake, ninja, configure). The CMake documentation is explicit: |
Typically, one's own repository is called But I trust you figured it out. |
@jkeiser I vote to ignore the failing performance test on clang6. I compared the performance with clang9. First using @ostri 's PR:
Next our master with clang9:
Pay attention to the number of instructions per structural element. The running time in cycles and ns varies from run to run, but the number of instructions is exactly the same. Thus I say that there is no measurable performance regression. |
I think @jkeiser has read this PR. I have read it carefully too. I have looked into the possible performance regression and and I have dismissed it. So I will make the executive decision of merging this without waiting further. We can come back later with the flag. |
Matjaž Ostroveršnik: I will add your name to the authors as a subsequent commit. |
I tried export stuff yesterday, but it didn't work until I removed build folder and regenerate it. |
Thanks. If you do so, for such a small change, you'll have a really long list of authors very soon. ;-) Lesson learnt from similar project (xml + database + transactions) engage all analysing tools available (linters, compiler warnings, sanitizers (memory, threads)) early in the project. |
Right. I would not normally update it. Instead, I will create a buildgcc directory, a buildclang directory... and so forth. Disk space is cheap and being able to simultaneously use multiple compilers, multiple configurations... is great. |
Yes, you are a contributor to the project. I just pushed the commit. I think it is an objective fact that you contributed code. |
trailing whitespace
json_minifier.h
json_scanner.h