Permalink
Browse files

-- docs/make/reverted array "<<"

  • Loading branch information...
lvv committed Dec 23, 2008
1 parent c7e5c14 commit 6010eca8bc1b9342715823588365d14d1c54f5c9
Showing with 84 additions and 36 deletions.
  1. +53 −31 README.txt
  2. +2 −0 array.h
  3. +10 −0 b-cmp.h
  4. +11 −0 b-sum.h
  5. +1 −1 doc/Makefile
  6. +5 −3 include.mk
  7. +2 −1 timer.h
View
@@ -1,37 +1,14 @@
== LvvLib - C++ utility library
:gh-ll: http://github.com/lvv/lvvlib/tree/master/
Initially collection of headers that I used in my projects. 'LVV' are my initials (Leonid V. Volnitsky).
Needs some cleanup, dependency pruning and documentation.
=== http://github.com/lvv/lvvlib/tree/master/array.h[array.h]
Similar to http://http://www.boost.org/doc/libs/1_37_0/doc/html/array.html[`boost::array`] but faster:
.Sum of 100,000,000 float-s with values {1, 2, 1, 2, 1, 2 ...}
[cols="^3,^1,16",frame="topbot",options="header"]
|=============================================================================================
| Ticks per cycle| Computed Value | Source
| 1.74 | 1.5e+08 | `float sum = A.sum();`
| 3.14 | 1.5e+08 | `double sum=0 ; for (int i=0 ; i<N; i++) sum += A[i];`
| 3.06 | 3.35544e+07 | `float sum=0 ; for (int i=0 ; i<N; i++) sum += A[i];`
| 3.06 | 3.35544e+07 | `float sum = std::accumulate(A.begin(), A.end(), float());`
|=============================================================================================
.Max of 100,000,000 float-s with values {1, 2, 1, 2, 1, 2 ... 3, ... }
[cols="^1,6",frame="topbot",options="header"]
|=============================================================================================
| Ticks per cycle| Source
| 1.63 | `float max = A.max()`
| 5.81 | `float max=0; for (size_t i=0; i<N; i++) if (A[i] > max) max = A[i];`
| 1.88 | OpenMP (source same as above, no check for race)
| 5.81 | STL: `float max = *std::max_element (A.begin(), A.end());`
| 1.67 | SSE: `__m128 m = mk_m128(A[0]); for (size_t i=4; i<N; i+=4) { m = _mm_max_ps(m, mk_m128(A[i]) ); } ...`
|==============================================================================================
It is basically plain C array wrapped in class to make it STL compatible
It is enhanced version of http://http://www.boost.org/doc/libs/1_37_0/doc/html/array.html[`boost::array`]
which is plain C array wrapped in class to make it STL compatible
container. If you look in such array in debugger its looks exactly like C
arrays (which means you can freely cast to and from C array). Because it
doesn't have constructor, it can be initialised like C arrays:
@@ -40,16 +17,63 @@ doesn't have constructor, it can be initialised like C arrays:
Second set of curly braces needed because this is an array inside a class.
There are no mallocs, no extra pointers, no extraneous class members.
GCC 4.4 promoted `boost::array` to `tr1::array`.
.lvv::array have following added capabilities:
- Vector operation: `A1 += A2; cout << A1;`
- Optimized template specialization for specific combination of CPU capabilities, array size and type.
- explicit SSE vectorization (gcc not very good yet in auto-vectorization).
- parallelization with OpenMP
* explicit SSE vectorization (gcc not very good yet in auto-vectorization).
* parallelization with OpenMP
* out-of order execution optimization
- Index of first element defaults to 0, but can be any number.
- Index value tested if it is in valid range when `NDEBUG` macro is not defined (i.e. for `gcc -g` ).
- basic linear algebra functions: `norm2(A)`, `distance_norm2(A1,A2)`, `dot_product(A1,A2)`, etc
Below is benchmark of specialised operations. Benchmarks are done on Core2 Duo, 2.2Ghz, with GCC-4.4.
Benchmark source is at `b-*.h` files.
.Sum of 100,000,000 float-s with values {1, 2, 1, 2, 1, 2 ...}
[cols="^3,^1,16",frame="topbot",options="header"]
|=============================================================================================
| *Ticks per cycle* | *Computed Value* | *Source*
| 1.74 | 1.5e+08 | `float sum = A.sum();`
| 3.14 | 1.5e+08 | `double sum=0 ; for (int i=0 ; i<N; i++) sum += A[i];`
| 3.06 | 3.35544e+07 | `float sum=0 ; for (int i=0 ; i<N; i++) sum += A[i];`
| 3.06 | 3.35544e+07 | `float sum = std::accumulate(A.begin(), A.end(), float());`
|=============================================================================================
.Maximum of 100,000,000 float-s
[cols="^1,6",frame="topbot",options="header"]
|=============================================================================================
| *Ticks per cycle* | *Source*
| 1.63 | `float max = A.max()`
| 5.81 | `float max=0; for (size_t i=0; i<N; i++) if (A[i] > max) max = A[i];`
| 1.88 | OpenMP (source same as above, no check for race)
| 5.81 | STL: `float max = *std::max_element (A.begin(), A.end());`
| 1.67 | SSE: `__m128 m = mk_m128(A[0]); for (size_t i=4; i<N; i+=4) { m = _mm_max_ps(m, mk_m128(A[i]) ); } ...`
|==============================================================================================
Acceleration is done through template specialization for combination of specific type and operation.
So far I implemented only combinations needed for my work. Hopefully there will
be less blank space in table bellow as I will have more time or there will be outside contributions.
.Implemented combinations
[cols="1,^1,^1,^1,^1,^1,^1,^1,^1",frame="topbot",options="header"]
|=============================================================================================
| *Type* | *sum* | *max* | *min* | *lower_bound* | *find* | *V1 += V2* | *V1 -= V2* | *...*
| *float* | yes | yes | | | | | |
| *double* | | | | | | | |
| *long double* | | | | | | | |
| *int8_t* | | | | | | | |
| *int16_t* | | yes | | | | | |
| *int32_t* | | | | | | | |
| *int64_t* | | | | | | | |
| *uint8_t* | | | | | | | |
| *uint16_t* | | | | | | | |
| *uint32_t* | | | | | | | |
| *uint64_t* | | | | | | | |
|==============================================================================================
=== *eq()* - numeric comparison template function (http://github.com/lvv/lvvlib/tree/master/math.h[math.h])
Used for numeric comparison in generic programming. For floating point types
@@ -84,12 +108,10 @@ We assume that if someone compares with `unsigned` then he guarantees that other
=== check.h
Very basic unit testing. I had to write my own unit testing because gcc44 can not
compile BOOST_CHECK. Implemented mostly in macros. Shows at execution log
evaluated expression.
compile BOOST_CHECK. Implemented mostly in macros.
=== Other
[width="80%",cols="3,3,6",frame="none",options="header"]
|==========================
| Header | Sample Use | Description
View
@@ -362,9 +362,11 @@ distance_norm2 (const array<T,N,B>& LA, const array<T,N,B>& RA) {
operator<< (ostream& os, array<T,N,B> A) {
//os << format("[%d..%d]=") %A.ibegin() %(A.iend()-1);
/*
if (N > 10) std::cout << endl;
std::cout << "[" << A.ibegin() << ".." << A.iend() << "] ";
if (N > 10) std::cout << endl;
*/
copy (A.begin(), A.end(), ostream_iterator<T>(os, " "));
//for (long i=A.ibegin(); i< A.iend(); i++)
View
10 b-cmp.h
@@ -1,4 +1,14 @@
const static unsigned long N = 1000000;
typedef array<TYPE, N> array_t;
//////////// CREATE ARRAY
array_t A;
for (size_t i=0; i<N-1; i+=2) {
A[i] =1;
A[i+1]=2;
}
A[333] = 3; // for max() testing
cout << "*** COMPARE type:" << typeid(TYPE).name() << endl;
View
11 b-sum.h
@@ -1,4 +1,15 @@
const static unsigned long N = 1000000;
//////////// CREATE ARRAY
array_t A;
for (size_t i=0; i<N-1; i+=2) {
A[i] =1;
A[i+1]=2;
}
A[333] = 3; // for max() testing
///////////////////////////////////////////
cout << "*** SUM type:" << typeid(TYPE).name() << endl;
#ifdef DO_PLAIN
View
@@ -1,7 +1,7 @@
#include ../Makefile
WEB_DESTDIR ?= /tmp/html-lvvlib
WEB_DESTDIR ?= /tmp/html
ASCIIDOC ?= asciidoc
show: web_install
View
@@ -33,6 +33,8 @@ g++FLAGS := -pipe -Wno-reorder -Wno-sign-compare # -fstrict-aliasing #
g++FLAGS_OPTIMIZE := -O3 -march=native -fwhole-program --combine -fopenmp -fomit-frame-pointer -funsafe-loop-optimizations
# FAST
#g++FLAGS_OPTIMIZE := -O3 -march=native -fwhole-program --combine -fopenmp -fomit-frame-pointer -fargument-noalias-anything -ffast-math -funsafe-loop-optimizations -fassociative-math -fassociative-math -mfpmath=sse,387 -fno-builtin -fargument-noalias-anything -fassociative-math
g++FLAGS_PROFILE := -pg -g -O2 -march=native -fno-omit-frame-pointer -fno-inline-functions -fno-inline-functions-called-once -fno-optimize-sibling-calls -fno-default-inline -fno-inline
# DO NOT USE
#-fargument-noalias-anything (newuoa segfalts at the end)
@@ -41,7 +43,7 @@ g++FLAGS_OPTIMIZE := -O3 -march=native -fwhole-program --combine -fope
# 2try(but deps on libs with exception?): -DBOOST_NO_EXCEPTIONS -fno-exceptions -fno-enforce-eh-specs -freorder-blocks-and-partition
# -ftree-vectorizer-verbose=3 -fdump-tree-vect
#
g++FLAGS_COMMON += -I /usr/local/include -l:/opt/intel/Compiler/11.0/074/lib/intel64/libimf.so
g++FLAGS_COMMON += -I /usr/local/include -l:/opt/intel/Compiler/11.0/074/lib/intel64/libimf.so -Wstrict-aliasing=2
##################################################################################33
# CHECK+DEBUG
@@ -52,7 +54,7 @@ g++FLAGS_DEBUG := -O0 -p -Wpacked -fsignaling-nans -fdelete-null-pointer-chec
#g++FLAGS_DEBUG += -Wfloat-equal -Weffc++
#g++FLAGS_DEBUG += -fmudflap
iccFLAGS := -vec-report0 -Wformat -openmp-report0 -wd1418 -wd981 -wd424 -wd810 -wd383 -wd82 -wd1572 -wd2259 -wd11001 -wd11005
iccFLAGS := -vec-report0 -Wformat -openmp-report0 -wd1418 -wd981 -wd424 -wd810 -wd383 -wd82 -wd1572 -wd2259 -wd11001 -wd11005
#-gxx-name=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.2.4/
iccFLAGS_OPTIMIZE := -O3 -ipo -march=core2 -openmp -xT -openmp-lib compat
#iccFLAGS_OPTIMIZE := -ipo -march=core2 -fomit-frame-pointer -parallel
@@ -81,7 +83,7 @@ b-% u-% : MAKEFLAGS += -B
% : %.cc
@tput sgr0; tput setaf 4
$(CXX) $< -o $(name_prefix)$@ $(CXXFLAGS) $(LDFLAGS)
@$(CXX) $< -o $(name_prefix)$@ $(CXXFLAGS) $(LDFLAGS)
@tput sgr0
#@make $<
View
@@ -43,10 +43,11 @@ uint64_t read_tick() { // tested with with x86_64 only.
"lea %0, %%eax;"
"movl %%edx, 4(%%eax);"
#endif
"cpuid;"
:[now_tick] "=m"(now_tick) // output
:
#if defined(__x86_64)
:"rbx","rcx", "rdx" // clobbered register
:"rax", "rdx", "rbx","rcx", "rdx" // clobbered register
#else
:"ebx","ecx", "edx" // clobbered register
#endif

0 comments on commit 6010eca

Please sign in to comment.