Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elimination of g++ -Weffc++ warnings #764

Merged
merged 14 commits into from Apr 23, 2020
Merged

elimination of g++ -Weffc++ warnings #764

merged 14 commits into from Apr 23, 2020

Conversation

ostri
Copy link
Contributor

@ostri ostri commented Apr 22, 2020

  • document.h bugfixes
    • operator++
    • document.h
      • noninitialized members
      • document_stream load_many
        trailing whitespace
  • implementation.h
    • missing virtual destructor
    • whitespaces (not me my editor)
  • parsedjson_iterator.h
    • operator=
  • document_stream.h
    • trailing blank
    • noninitialized members
  • document.h
    • trailing witespace
    • noninitialized members
    • operator++
  • parsedjson_iterator.h
    • noninitialized members
      json_minifier.h
    • noninitialized members
      json_scanner.h
    • noninitialized members
    • trailing space
  • json_structural_indexer.h
    • noninitialized members
    • trailing space
  • stage2_build_tape.h
    • noninitialized members

document.h bugfixes
- operator++
- document.h
  - noninitialized members
  - document_stream load_many
      trailing whitespace
- implementation.h
  - missing virtual destructor
  - whitespaces (not me my editor)
- parsedjson_iterator.h
   - operator=
- document_stream.h
   - trailing blank
   -  noninitialized members
- document.h
  - trailing witespace
  -  noninitialized members
  - operator++
- parsedjson_iterator.h
   -  noninitialized members
json_minifier.h
   -  noninitialized members
json_scanner.h
   -  noninitialized members
   - trailing space
- json_structural_indexer.h
  -  noninitialized members
  - trailing space
- stage2_build_tape.h
   -  noninitialized members
@ostri ostri changed the title amalgamation.sh - bug fix elimination of g++ -Weffc++ warnings Apr 22, 2020
@lemire
Copy link
Member

lemire commented Apr 22, 2020

@ostri Can you have a look at the failing tests? (In CI... see on the PR page on GitHub.)

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020 via email

@lemire
Copy link
Member

lemire commented Apr 22, 2020

I think that there are good things in this PR, but the way you change the constructors is, in my opinion, for the worse.

Let us look at two examples.

You change this... (modern C++ code from @jkeiser)

really_inline parser::parser(size_t max_capacity) noexcept		
   : _max_capacity{max_capacity}, loaded_bytes(nullptr, &aligned_free_char) {}

into this... (C++ 'from the 1990s', Meyer's style)

really_inline parser::parser(size_t max_capacity) noexcept
   : _max_capacity{max_capacity}, loaded_bytes(nullptr, &aligned_free_char) {}		   : _max_capacity{max_capacity},
     containing_scope(),
     current_string_buf_loc(),
     doc(),
     loaded_bytes(nullptr, &aligned_free_char),
     ret_address(),
     structural_indexes() {}

Then you change this... (modern C++ code from @jkeiser)

really_inline document_stream::document_stream(
  dom::parser &_parser,
  const uint8_t *buf,
  size_t len,
  size_t batch_size,
  error_code _error
) noexcept : parser{_parser}, _buf{buf}, _len{len}, _batch_size(batch_size), error{_error} {
  if (!error) { error = json_parse(); }
}

into this... (C++ 'from the 1990s', Meyer's style)

really_inline document_stream::document_stream(
  dom::parser &_parser,
  const uint8_t *buf,
  size_t len,
  size_t batch_size,
  error_code _error
) noexcept
  : parser{_parser}
  , _buf{buf}
  , _len{len}
  , _batch_size(batch_size)
  , buf_start(0)
  , next_json(0)
  , load_next_batch(true)
  , current_buffer_loc(0)
#ifdef SIMDJSON_THREADS_ENABLED
  , last_json_buffer_loc()
#endif
  , n_parsed_docs()
  , n_bytes_parsed()
  , error(_error)
#ifdef SIMDJSON_THREADS_ENABLED
  , stage1_is_ok_thread(SUCCESS)
  , stage_1_thread()
  , parser_thread()
#endif
{
  if (!error) { error = json_parse(); }
}

These changes you make are semantically identical (+/- details that we can discuss), guaranteed by the C++ standards, to the shorter and cleaner version that @jkeiser wrote.

There is absolutely no good reason whatsoever to explicitly call the default constructors on the attributes: this is what the standard does by default.

The constructors should only initialize the methods that require an initialization different from the default constructor.

Now I code like Meyer's... But I am old. I am old you would not believe. I remember when using C++ (instead of C) was working with an immature, dangerous, controversial programming language. "What is it, Daniel, is Fortran not good enough for you?"

So Meyer gave us rules, and we followed them blindly, like faith. And the C++ language was and remained ugly.

Until younger folks or smarter folks came along and produced nicer looking and more maintainable code.

So will suggest that we leave the constructors alone, the way @jkeiser wrote them. I would have written them the way you do, but I am wrong. John's way is better, nicer.

It would be a software engineering mistake to go back to the old style.

@lemire lemire requested a review from jkeiser April 22, 2020 13:00
@lemire
Copy link
Member

lemire commented Apr 22, 2020

@jkeiser Since this is (mostly) your code, I have asked a review of this PR from you. Please see my note above where I state my belief that your coding style with respect to constructors is better and should not be reverted to the 1990s C++ style that I tend to use.

@lemire
Copy link
Member

lemire commented Apr 22, 2020

Further consideration: we compile with -Wuninitialized and -Werror (clang, gcc) which means that we get a warning whenever we try to use an uninitialized value. So we know that it does not happen in our code.

We further run tests with sanitizers. If, somehow, we accessed an initialized value, we would know.

These are tools that Meyer did not have in the 1990s.

@lemire
Copy link
Member

lemire commented Apr 22, 2020

So, again, there is a lot of good in this PR, my beef is only with the way you change the constructors...

Let me be specific... In include/simdjson/inline/document_stream.h, maybe because a tool told you to do so, you added the following to the constructor...

   buf_start(0)
   , next_json(0)
   , load_next_batch(true)
  , buf_start(0)
   , next_json(0)

but look at the class definition... simdjson/include/simdjson/document_stream.h

  size_t buf_start{0};
  size_t next_json{0};
  bool load_next_batch{true};
  size_t current_buffer_loc{0};
  size_t last_json_buffer_loc{0};
  size_t n_parsed_docs{0};
  size_t n_bytes_parsed{0};

All these variables are defined as part of the class declaration. This wasn't possible when Meyer wrote his books, but it is vastly better and highly recommended.

Next, in src/generic/json_structural_indexer.h, you added the following to the constructor...

     , prev_structurals{0}
   , unescaped_chars_error{0}

but look at the class declaration, how the attributes are declared and defined...

  uint64_t prev_structurals = 0;
  uint64_t unescaped_chars_error = 0;

@lemire
Copy link
Member

lemire commented Apr 22, 2020

What about the default constructors you added to the initialization lists?
Well, the standard is clear: if class members are not in the initializer list and they have not been defined in the class declaration, then they get default-initialized which means, that for class types the default constructor is called.

So we don't need to do it and, in fact, it adds noise to do it.

I know that this goes against Meyer... but Meyer retired in 2015 and his books were written in the 1990s.

@lemire
Copy link
Member

lemire commented Apr 22, 2020

@ostri So I recommend you revert back the changes you made to the constructors. If you feel that there is a mistake, a fault in there, please be specific. It is not enough that a tool or a static analyzer flagged it... we need to look at it and understand the mistake.

amalgamation.sh Outdated
@@ -173,7 +173,7 @@ SINGLEHDR=$SCRIPTPATH/singleheader
echo "Copying files to $SCRIPTPATH/singleheader "
mkdir -p $SINGLEHDR
echo "c++ -O3 -std=c++17 -pthread -o ${CPPBIN} ${DEMOCPP} && ./${CPPBIN} ../jsonexamples/twitter.json ../jsonexamples/amazon_cellphones.ndjson" > $SINGLEHDR/README.md
cp ${AMAL_C} ${AMAL_H} ${DEMOCPP} $SINGLEHDR
cp --remove-destination --preserve=all ${AMAL_C} ${AMAL_H} ${DEMOCPP} $SINGLEHDR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of remove-destination here, for my edification?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that these flags (--remove-destination --preserve=all) are POSIX. They won't work on macOS and other system. I suspect that they are GNU-specific.

This certainly breaks the script on macOS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem is/was that

And there were places where members were not initialized and complier complained about that.

Where? Maybe I missed them. For the cases that I looked at, there was initialization.

For example:
/** @Private Structural indices passed from stage 1 to stage 2 */
std::unique_ptr<uint32_t[]> structural_indexes;

If you want to have whole list, try to compile with -Wellc++. ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ostri

std::unique_ptr<uint32_t[]> structural_indexes;

This is not a valid error. The structural_indexes class instance will get initialized with its default constructor. This is guaranteed by the standard.

I know that -Weffc++ throws a ton of warnings, but most are invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "--remove-destination --preserve=all" is not portable and it seems unrelated to this PR. If you have an issue with the script, please let us discuss it elsewhere.

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020 via email

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

Problem was, that the files were not overwriten. This is similar to -f, but not equal.

Do you mean to say that cp sometimes silently doesn't copy the file? What situations does this happen in? Does it throw an error?

I'd rather not add flags unless they're causing problems, but clearly you are running into one :)

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020 via email

@lemire
Copy link
Member

lemire commented Apr 22, 2020

Problem was, that the files were not overwriten. This is similar to -f, but not equal.

It is likely that if it happens, then an error will get displayed.

And there might be good reasons why the files cannot be overwritten.

Unfortunately, the script will keep going even after an error. A better solution would be to be make the script robust by trapping errors.

@lemire
Copy link
Member

lemire commented Apr 22, 2020

And there were places where members were not initialized and complier complained about that.

Where? Maybe I missed them. For the cases that I looked at, there was initialization.

@lemire
Copy link
Member

lemire commented Apr 22, 2020

You are aware of triplicaiton of the code in simdjson.?pp

Are you referring to the code reuse between the three kernels? If so, this is very deliberate, yes. We have three distinct kernels (on x64) and @jkeiser worked hard to maximize code reuse.

@lemire
Copy link
Member

lemire commented Apr 22, 2020

‘simdjson::westmere::stage2::structural_iterator::c’ should be initialized in the member initialization list [-Weffc++]

Ah. That's one case where we do not appear to have constructor-time initialization. Thank you for this report. I will fix it in its own PR.

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

Regarding constructors, I do prefer having members be initialized at the point of declaration where possible. I'm bringing my prejudices as a language polyglot, replicating the things I've liked the most; I've learned, though, that it's better to work with the idioms used by those who live and breathe the language. In this case, if the contributors here are happy with inline member initializers, I certainly am too :)

However, I see value in not warning when the user compiles with -Weffc++. We encourage people to drop .cpp and .h files into their projects, so being warning free for as many flags as we can manage is good (as long as it doesn't impose an insane cost, which this really doesn't seem to). NOTE that being warning free could easily mean disabling these warnings within the simdjson headers, too! If we decide we don't want to follow a particular warning, that's our choice, and we can keep the warnings from showing up in other peoples' code.

StackOverflow doesn't show a way to turn off just some parts of -Weffc++ :(

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

Ah. That's one case where we do not appear to have constructor-time initialization. Thank you for this report. I will fix it in its own PR.

Pretty sure you are planning to do this anyway, but it'd be really good to know why -Wuninitialized didn't catch this!

@lemire
Copy link
Member

lemire commented Apr 22, 2020

Pretty sure you are planning to do this anyway, but it'd be really good to know why -Wuninitialized didn't catch this!

Because the compiler can prove that we always call structurals.advance_char() first before using structurals. Try to avoid or avoid advance_char() and you won't be able to build.

So our code is safe. This is a pedantic change, but I am happy to make it. It is prettier anyhow.

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020

Guys do you have some conferencing system? I can't answer to two streams of mail.
I am reachable on matjaz.ostroversnik@snt.si

@lemire
Copy link
Member

lemire commented Apr 22, 2020

StackOverflow doesn't show a way to turn off just some parts of -Weffc++ :(

Give me a minute. I think we can silence these warnings with modern C++.

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

@ostri oh, I understand why you're seeing triplification. There is only one copy of the triplicate code, in src/generic/*. But we include them once per implementation (haswell, westmere and fallback), just putting them in different namespaces, because we need the code to recompile with different instruction sets each time, and C++ can't do that unless you duplicate the code.

So the source code is not duplicated, but when it reaches simdjson.h it is.

I suggest resolving -Weffc++ warnings by compiling against the original source and headers to make this easier on yourself. You can just use -Isimdjson/include -Isimdjson/src and your same code should compile fine--just don't copy in simdjson.h and simdjson.cpp.

(@lemire noted this too, this is just a longer explanation :))

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

@ostri the easiest thing to do is check github notifications or go to the comment thread here on github, I couldn't keep up with mail streams either :)

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

…old vs new syntax…

If you don't mind sticking the comma at the end of the line, let's make it consistent with everything else. I really don't have much of a dog in this hunt, but it's a lot easier to do this than go change all the other multiline lists to use beginning-of-line commas :)

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020

@ostri oh, I understand why you're seeing triplification. There is only one copy of the triplicate code, in src/generic/*. But we include them once per implementation (haswell, westmere and fallback), just putting them in different namespaces, because we need the code to recompile with different instruction sets each time, and C++ can't do that unless you duplicate the code.

So the source code is not duplicated, but when it reaches simdjson.h it is.

I suggest resolving -Weffc++ warnings by compiling against the original source and headers to make this easier on yourself. You can just use -Isimdjson/include -Isimdjson/src and your same code should compile fine--just don't copy in simdjson.h and simdjson.cpp.

Actually I did exactly like you suggested.

  1. I found the warning in overall file.
  2. I took the same string from this file and found the "partial" file and fix it.
  3. Later I made amalgation and repeated the cycle.

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

Actually I did exactly like you suggested.

I found the warning in overall file.
I took the same string from this file and found the "partial" file and fix it.
Later I made amalgation and repeated the cycle.

Not quite the same--you might be able to save some time here. If you want it to just directly give you the right file and line #, you can compile against the non-amalgamated headers. For example, from the root of the simdjson repo, this is a super easy way to do it:

g++ -Weffc++ -Iinclude -Isrc examples/quickstart/quickstart.cpp src/simdjson.cpp

The way you are doing it is accurate and fine, just trying to make your life easier :)

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

One thing we're missing if this does make us -Weffc++ clean, is adding -Weffc++ with all the other -Wall -Wextra etc. flags in CMakeLists.txt. That way we don't regress this.

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020

@ostri Can you undo your commit on the almagamation script... then I think @jkeiser will green light this for merger and we'll be done.

done.

But never the less the script is not cosher:

  1. compiler is fixed, parameters are fixed.
  2. most of variables are not quoted (e.g. problem with spacces in filenames)
  3. usage of ancient techniques (backtick)
  4. this should go into cmake. There you have all the infrastructure and it should be one of the targets.
  5. My linter reports 32 errors/warnings/notes.

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020

One thing we're missing if this does make us -Weffc++ clean, is adding -Weffc++ with all the other -Wall -Wextra etc. flags in CMakeLists.txt. That way we don't regress this.

@jkeiser I agree, but I don't know how the semantically equivalent parameter for clang. If I put pedantic, then calculated goto warnings are reported.

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

@ostri OK, understood. If there's a way to enable only for g++ it still seems like a good idea. Alternately, we can add code to disable weffc++ warnings in the computed goto locations (there's only a few of them). But if we can't figure it out, this is still good for moving forward!

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020

@ostri OK, understood. If there's a way to enable only for g++ it still seems like a good idea. Alternately, we can add code to disable weffc++ warnings in the computed goto locations (there's only a few of them). But if we can't figure it out, this is still good for moving forward!

Weffc++ helped me a lot in the past. I would be glad if you can put it into the build process, so that it reports if there is something fishy. You are starting now, but when code base grows, you'll need all the help machines can provide you. Sanitizers are great, but not the silver bullet.

build failed again, at the same point. I can not repeat it on my machine, nither with g++, nither with clang++.

@jkeiser
Copy link
Member

jkeiser commented Apr 22, 2020

Looks like there is a reliable way to check for gcc in cmake: https://stackoverflow.com/questions/10046114/in-cmake-how-can-i-test-if-the-compiler-is-clang

@ostri
Copy link
Contributor Author

ostri commented Apr 22, 2020

Looks like there is a reliable way to check for gcc in cmake: https://stackoverflow.com/questions/10046114/in-cmake-how-can-i-test-if-the-compiler-is-clang

Great. So you can make basic build with clang and amalgamation in gcc or vice versa. This way more errors can be detected, and the sources are 100% portable too.
In last time I switched from eclipse to VS code. Since it is convenient I am reguary switching the compilers, especially if I get an error report that I do not understand..

@lemire
Copy link
Member

lemire commented Apr 23, 2020

@ostri To fix the CI issues, could you sync your fork?

It should be as simple as this... (some steps may be unnecessary)

   git remote add upstream https://github.com/simdjson/simdjson.git
   git fetch upstream
   git checkout master
   git merge upstream/master
   git push origin master

@lemire
Copy link
Member

lemire commented Apr 23, 2020

@ostri @jkeiser

Guys, guys... clang supports -Weffc++ as well....

@lemire
Copy link
Member

lemire commented Apr 23, 2020

@ostri It should build if you add -Weffc++ even on clang. If it does not, we can figure it out. Please add the warning flag.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

But never the less the script is not cosher

Yes, but that's a distinct issue. We can certainly improve it... but not in this PR.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

So please...

  1. Sync with master.

  2. Add -Weffc++ to CMake (it is easy, look at CMakeLists.txt in the root directory, you'll find the spot).

@ostri
Copy link
Contributor Author

ostri commented Apr 23, 2020

git push origin master

¸This one fails.

$ git push origin master
fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

@ostri
Copy link
Contributor Author

ostri commented Apr 23, 2020

@ostri It should build if you add -Weffc++ even on clang. If it does not, we can figure it out. Please add the warning flag.

It compiles, but it actually do nothing. I.e. code with .Weffc++ warnings compile without the warnings.

@ostri
Copy link
Contributor Author

ostri commented Apr 23, 2020

git push origin master

¸This one fails.

$ git push origin master
fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Actually this is ok. I do not have rights to write to

git remote add upstream https://github.com/simdjson/simdjson.git

my upstream is

git@github.com:ostri/simdjson.git

@jkeiser
Copy link
Member

jkeiser commented Apr 23, 2020

It compiles, but it actually do nothing. I.e. code with .Weffc++ warnings compile without the warnings.

That is still good enough :) We do CI runs against gcc and clang both, so we'll catch regressions on the gcc runs!

@ostri
Copy link
Contributor Author

ostri commented Apr 23, 2020

It compiles, but it actually do nothing. I.e. code with .Weffc++ warnings compile without the warnings.

That is still good enough :) We do CI runs against gcc and clang both, so we'll catch regressions on the gcc runs!

I put it in. With clang it works (it does nothing). But it fails with g++, because of the last remaining warning of Weffc++.

[ 10%] Built target competition-ujson4c
Creating /home/ostri/warnings/simdjson/build/singleheader/simdjson.cpp...
In file included from /home/ostri/warnings/simdjson/include/simdjson.h:16,
                 from /home/ostri/warnings/simdjson/src/simdjson.cpp:1:
/home/ostri/warnings/simdjson/include/simdjson/implementation.h: In instantiation of ‘T* simdjson::internal::atomic_ptr<T>::operator=(T*) [with T = const simdjson::implementation]’:
/home/ostri/warnings/simdjson/src/implementation.cpp:132:36:   required from here
/home/ostri/warnings/simdjson/include/simdjson/implementation.h:217:40: error: ‘operator=’ should return a reference to ‘*this’ [-Werror=effc++]
  217 |   T* operator=(T *_ptr) { return ptr = _ptr; }

I tried some tweaks related to this operator, but it fails within the code, where I am not so sure that I understand the semantics. If I commit this to repository I'll break the build.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

Added few lines how to change the compiler. Existing instructions are for mac and not for linux. I believe this a cross platform solution assuming that you have both compilers installed on your testing / development platform.

What you added is fine. However, what was there before was also cross-platform.

Doing...

export CXX=myvaforitecompiler
cmake ..

is entirely standard... In fact, it is even more portable because CXX and CC will be recognized by virtually all build systems (make, cmake, ninja, configure).

The CMake documentation is explicit:

https://cmake.org/cmake/help/v3.10/envvar/CXX.html

@lemire
Copy link
Member

lemire commented Apr 23, 2020

$ git push origin master
fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Typically, one's own repository is called origin but that's just common use and not a requirement. One can try git remote -v. I think that GitHub has more detailed instructions on how to sync a fork.

But I trust you figured it out.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

@jkeiser I vote to ignore the failing performance test on clang6.

I compared the performance with clang9.

First using @ostri 's PR:

$ ./benchmark/parse ../jsonexamples/twitter.json
number of iterations 200

../jsonexamples/twitter.json
============================
     9867 blocks -     631515 bytes - 55263 structurals (  8.8 %)
special blocks with: utf8      2284 ( 23.1 %) - escape       598 (  6.1 %) - 0 structurals      1287 ( 13.0 %) - 1+ structurals      8581 ( 87.0 %) - 8+ structurals      3272 ( 33.2 %) - 16+ structurals         0 (  0.0 %)
special block flips: utf8      1104 ( 11.2 %) - escape       642 (  6.5 %) - 0 structurals       940 (  9.5 %) - 1+ structurals       940 (  9.5 %) - 8+ structurals      2593 ( 26.3 %) - 16+ structurals         0 (  0.0 %)

All Stages
|    Speed        :  26.7722 ns per block ( 98.40%) -   0.4183 ns per byte -   4.7806 ns per structural -    2.390 GB/s
|    Cycles       :  98.9018 per block    ( 99.22%) -   1.5454 per byte    -  17.6603 per structural    -    3.694 GHz est. frequency
|    Instructions : 307.1246 per block    (100.00%) -   4.7991 per byte    -  54.8415 per structural    -    3.105 per cycle
|    Misses       :    2829 branch misses ( 95.39%) - 4 cache misses ( 20.71%) - 79039.00 cache references
|- Stage 1
|    Speed        :  11.5506 ns per block ( 42.46%) -   0.1805 ns per byte -   2.0625 ns per structural -    5.541 GB/s
|    Cycles       :  42.6624 per block    ( 42.80%) -   0.6666 per byte    -   7.6180 per structural    -    3.694 GHz est. frequency
|    Instructions : 151.8969 per block    ( 49.46%) -   2.3735 per byte    -  27.1234 per structural    -    3.560 per cycle
|    Misses       :     729 branch misses ( 24.58%) - 0 cache misses (  0.00%) - 27304.00 cache references
|- Stage 2
|    Speed        :  15.1386 ns per block ( 55.64%) -   0.2366 ns per byte -   2.7032 ns per structural -    4.227 GB/s
|    Cycles       :  55.9145 per block    ( 56.10%) -   0.8737 per byte    -   9.9843 per structural    -    3.693 GHz est. frequency
|    Instructions : 155.0630 per block    ( 50.49%) -   2.4230 per byte    -  27.6887 per structural    -    2.773 per cycle
|    Misses       :    2006 branch misses ( 67.64%) - 4 cache misses ( 20.71%) - 51633.00 cache references

3535.3 documents parsed per second

Next our master with clang9:

number of iterations 200

../jsonexamples/twitter.json
============================
     9867 blocks -     631515 bytes - 55263 structurals (  8.8 %)
special blocks with: utf8      2284 ( 23.1 %) - escape       598 (  6.1 %) - 0 structurals      1287 ( 13.0 %) - 1+ structurals      8581 ( 87.0 %) - 8+ structurals      3272 ( 33.2 %) - 16+ structurals         0 (  0.0 %)
special block flips: utf8      1104 ( 11.2 %) - escape       642 (  6.5 %) - 0 structurals       940 (  9.5 %) - 1+ structurals       940 (  9.5 %) - 8+ structurals      2593 ( 26.3 %) - 16+ structurals         0 (  0.0 %)

All Stages
|    Speed        :  26.4689 ns per block ( 98.21%) -   0.4136 ns per byte -   4.7264 ns per structural -    2.418 GB/s
|    Cycles       :  97.8048 per block    ( 99.06%) -   1.5283 per byte    -  17.4645 per structural    -    3.695 GHz est. frequency
|    Instructions : 307.1246 per block    (100.00%) -   4.7991 per byte    -  54.8415 per structural    -    3.140 per cycle
|    Misses       :    2510 branch misses ( 93.79%) - 4 cache misses ( 21.83%) - 84214.00 cache references
|- Stage 1
|    Speed        :  11.5184 ns per block ( 42.74%) -   0.1800 ns per byte -   2.0568 ns per structural -    5.556 GB/s
|    Cycles       :  42.5476 per block    ( 43.09%) -   0.6648 per byte    -   7.5975 per structural    -    3.694 GHz est. frequency
|    Instructions : 151.8969 per block    ( 49.46%) -   2.3735 per byte    -  27.1234 per structural    -    3.570 per cycle
|    Misses       :     709 branch misses ( 26.49%) - 0 cache misses (  0.00%) - 29069.00 cache references
|- Stage 2
|    Speed        :  14.8905 ns per block ( 55.25%) -   0.2327 ns per byte -   2.6589 ns per structural -    4.298 GB/s
|    Cycles       :  55.0174 per block    ( 55.72%) -   0.8597 per byte    -   9.8241 per structural    -    3.695 GHz est. frequency
|    Instructions : 155.0630 per block    ( 50.49%) -   2.4230 per byte    -  27.6887 per structural    -    2.818 per cycle
|    Misses       :    1754 branch misses ( 65.54%) - 2 cache misses ( 10.92%) - 55034.00 cache references

3565.2 documents parsed per second

Pay attention to the number of instructions per structural element. The running time in cycles and ns varies from run to run, but the number of instructions is exactly the same.

Thus I say that there is no measurable performance regression.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

I think @jkeiser has read this PR. I have read it carefully too. I have looked into the possible performance regression and and I have dismissed it. So I will make the executive decision of merging this without waiting further.

We can come back later with the flag.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

Matjaž Ostroveršnik: I will add your name to the authors as a subsequent commit.

@ostri
Copy link
Contributor Author

ostri commented Apr 23, 2020

Added few lines how to change the compiler. Existing instructions are for mac and not for linux. I believe this a cross platform solution assuming that you have both compilers installed on your testing / development platform.

What you added is fine. However, what was there before was also cross-platform.

Doing...

export CXX=myvaforitecompiler
cmake ..

is entirely standard... In fact, it is even more portable because CXX and CC will be recognized by virtually all build systems (make, cmake, ninja, configure).

The CMake documentation is explicit:

https://cmake.org/cmake/help/v3.10/envvar/CXX.html

I tried export stuff yesterday, but it didn't work until I removed build folder and regenerate it.
(thanks @jkeiser ).
This way it works without build folder recreation. ;-)

@lemire lemire merged commit 87acab0 into simdjson:master Apr 23, 2020
@ostri
Copy link
Contributor Author

ostri commented Apr 23, 2020

Matjaž Ostroveršnik: I will add your name to the authors as a subsequent commit.

Thanks. If you do so, for such a small change, you'll have a really long list of authors very soon. ;-)
Actually you helped me. Based on your library I made a tool to convert json files with variables (i.e. ${...}) to regular json files.
I need it for my project configuration management. :-)

Lesson learnt from similar project (xml + database + transactions) engage all analysing tools available (linters, compiler warnings, sanitizers (memory, threads)) early in the project.
Late introduction can be prohibitively expensive.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

I tried export stuff yesterday, but it didn't work until I removed build folder and regenerate it.

Right. I would not normally update it. Instead, I will create a buildgcc directory, a buildclang directory... and so forth. Disk space is cheap and being able to simultaneously use multiple compilers, multiple configurations... is great.

@lemire
Copy link
Member

lemire commented Apr 23, 2020

If you do so, for such a small change

Yes, you are a contributor to the project. I just pushed the commit.

I think it is an objective fact that you contributed code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants