Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High dependence on CPU performance #44

Open
troosh opened this issue Jan 15, 2018 · 3 comments
Open

High dependence on CPU performance #44

troosh opened this issue Jan 15, 2018 · 3 comments

Comments

@troosh
Copy link

troosh commented Jan 15, 2018

For some tests, there is a strong dependence of the results on the processor speed. I made measurements for frequency scan from 800MHz to 3600MHz using the cpufrequtils utility on the i7-4790 with the AMD Radeon R7 250E graphics card:

scan_glamark2_fps.res.txt

screenshot_2017-10-19_18-51-04

  • [terrain] and [refract] two lower horizontal lines, almost no depending on the frequency of the processor, it seems here rests all in speed video card.
  • [buffer- *] cross these two horizontal lines, apparently these tests will be
    a little like the fastest x86 processor, so they are demanding to the video card
  • The results of the [ideas] test directly depend on the processor speed, even on the
    3.6 GHz shows that we do not rest on anything else in this test.
    The rest of the tests behave very similar, starting at some CPU frequency
    The further its dispersal practically does not influence result:
  • [desktop-1] enough 1.4 GHz
  • [desktop-2] needs at least 2.8 GHz
  • [effect2d-2] apparently rests on the speed of videocard already at 1 GHz processor
  • [effect2d-1] shows 2 times the best results, but with 2 GHz
  • [bump-1] ceases to notice the processor speed from 1.2 GHz
  • [build-1] stops noticing CPU speed from 1.6 GHz
  • [shadow] stops noticing CPU speed from 1.8 GHz
  • [jellyfish] - behaves exactly like the entire glmark2 on average!
  • [shading- *] cease to notice the processor speed from 1.8 GHz
  • [pulsar] stops noticing processor speed from 2.8 GHz
    The remaining tests cease to notice the processor speed from 2.4..2.6 GHz, the most
    high result [conditionals-3] (it needs a fast processor, but a graphics card it almost does not load).

It is clear that this is a real life, but perhaps you should pay attention to optimizing some tests. Although, of course, etho can create a problem comparing results with old versions of the benchmark.

@troosh
Copy link
Author

troosh commented Jan 17, 2018

Really need to look at the performance of the following tests:

  1. buffer
 52.18% glmark2 glmark2 [.] SceneBuffer::update() 
 20.24% glmark2 glmark2 [.] Mesh::update_single_array(std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > > const&, unsigned long, unsigned long, unsigned long) 
 13.40% glmark2 libc-2.23.so [.] __memmove_avx_unaligned 
  1.04% glmark2 [kernel.kallsyms] [k] evergreen_irq_set
  1. refract
 6.77% glmark2 libc-2.23.so [.] _int_free 
 6.51% glmark2 libstdc++.so.6.0.20 [.] __dynamic_cast 
 5.98% glmark2 libc-2.23.so [.] malloc 
 4.75% glmark2 libc-2.23.so [.] _int_malloc 
 3.41% glmark2 libstdc++.so.6.0.20 [.] __cxxabiv1::__si_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const 
 3.29% glmark2 ld-2.23.so [.] do_lookup_x 
 2.87% glmark2 libstdc++.so.6.0.20 [.] std::locale::locale() 
 2.85% glmark2 libstdc++.so.6.0.20 [.] __cxxabiv1::__vmi_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const 
 2.34% glmark2 libstdc++.so.6.0.20 [.] std::locale::~locale() 
 2.31% glmark2 libstdc++.so.6.0.20 [.] std::locale::operator=(std::locale const&) 
 1.80% glmark2 libc-2.23.so [.] __strcmp_sse2_unaligned 
 1.63% glmark2 libstdc++.so.6.0.20 [.] bool std::has_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) 
 1.46% glmark2 libc-2.23.so [.] memchr 
 1.32% glmark2 libc-2.23.so [.] __GI_____strtof_l_internal 
 1.27% glmark2 libstdc++.so.6.0.20 [.] bool std::has_facet<std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) 
 1.21% glmark2 ld-2.23.so [.] strcmp 
 1.16% glmark2 libstdc++.so.6.0.20 [.] std::locale::id::_M_id() const 
 1.16% glmark2 libstdc++.so.6.0.20 [.] std::istreambuf_iterator<char, std::char_traits<char> > std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::_M_extract_int<unsigned int>(std::istreambuf_iterator<char, std::char_traits<char> >, std::istreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, std::_Ios_Iostate&, unsigned int&) const 
 1.10% glmark2 libstdc++.so.6.0.20 [.] std::basic_ios<char, std::char_traits<char> >::_M_cache_locale(std::locale const&) 
 1.02% glmark2 libc-2.23.so [.] __memcpy_avx_unaligned 
 0.99% glmark2 glmark2 [.] split_normal(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) 
 0.99% glmark2 glmark2 [.] Mesh::set_attrib(unsigned int, LibMatrix::tvec3<float> const&, std::vector<float, std::allocator<float> >*) 
 0.96% glmark2 libstdc++.so.6.0.20 [.] std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > const& std::use_facet<std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) 
 0.96% glmark2 libstdc++.so.6.0.20 [.] std::locale::_S_initialize() 
 0.96% glmark2 glmark2 [.] Util::split(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, Util::SplitMode) 
 0.96% glmark2 libstdc++.so.6.0.20 [.] std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) 
 0.89% glmark2 libc-2.23.so [.] __memmove_avx_unaligned 
 0.84% glmark2 libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 
 0.82% glmark2 libstdc++.so.6.0.20 [.] std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > const& std::use_facet<std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > > >(std::locale const&) 
 0.82% glmark2 libstdc++.so.6.0.20 [.] std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::_M_extract_float(std::istreambuf_iterator<char, std::char_traits<char> >, std::istreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, std::_Ios_Iostate&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) const 
 0.82% glmark2 libstdc++.so.6.0.20 [.] std::ctype<char> const& std::use_facet<std::ctype<char> >(std::locale const&) 
 0.79% glmark2 libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate(unsigned long, unsigned long, unsigned long) 
 0.76% glmark2 libstdc++.so.6.0.20 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, unsigned long, std::allocator<char> const&) 
 0.73% glmark2 libstdc++.so.6.0.20 [.] std::basic_ios<char, std::char_traits<char> >::init(std::basic_streambuf<char, std::char_traits<char> >*) 
 0.73% glmark2 glmark2 [.] std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_insert_aux(__gnu_cxx::__normal_iterator<std::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 
 0.68% glmark2 libstdc++.so.6.0.20 [.] std::ios_base::_M_init() 
 0.67% glmark2 libGL.so.1.2.0 [.] driConvertConfigs 

In other tests, first of all, most problems with user libraries (Mesa, libpng/libz, libpthread) and the kernel (DRM).

@afrantzis
Copy link
Contributor

Hi! Thank you for this very interesting analysis. After your initial report I started taking a look at the CPU usage (I mostly used valgrind/callgrind), and, like you, noticed that in many cases the primary CPU consumer was not glmark2 itself, but rather some other part of the graphics stack. The CPU usage from libpng/libz shouldn't be a concern, since the textures are decoded at setup time and shouldn't affect benchmarking results. The CPU usage from drivers, and its effect on the benchmarks is actually something that we want reflected in the benchmark results.

There are still cases like the ones you mention in the second comment where the CPU usage lies predominantly in glmark2 itself. I have started looking into these and I will report progress in this issue.

afrantzis added a commit that referenced this issue Jan 18, 2018
Use std::copy to copy data instead of copying elements manually.
Performance results indicate that this change results in speed-up
of over 2x for this function.

See #44.
@afrantzis
Copy link
Contributor

I have pushed a performance improvement in 5b0f603 that helps with the CPU usage in the buffer scene. Looking at the refract scene, the majority of the CPU usage is in the scene setup code, so it shouldn't affect the benchmark results (of course, it would be good to improve it anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants