Skip to content

Commit

Permalink
update latex docs
Browse files Browse the repository at this point in the history
in-situ processing
deferred queues,
reduced struct work size
compression with zlib
affected executables
  • Loading branch information
calccrypto committed Apr 3, 2023
1 parent 673b170 commit b998ef5
Show file tree
Hide file tree
Showing 6 changed files with 149 additions and 24 deletions.
24 changes: 20 additions & 4 deletions docs/latex/sections/gufi_dir2index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,29 @@ \subsubsection{\gufidirindex}

\begin{table} [h]
\centering
\begin{tabular}{l|r}
Flag & Functionality \\\hline
\begin{tabular}{l|l}
Flag & Functionality \\
\hline
-h & help manual \\
\hline
-H & Show assigned input values \\
-n \textless num\_threads\textgreater & define number of threads to use \\
\hline
-n \textless num\_threads\textgreater & define number of threads to use \\
\hline
-x & pull xattrs from source file-sys into GUFI \\
-z \textless max\_level\textgreater & maximum level to go down to
\hline
-z \textless max\_level\textgreater & maximum level to go down to \\
\hline
-k \textless filename\textgreater & file containing directory names to skip \\
\hline
-M \textless bytes\textgreater & target memory footprint \\
\hline
-C \textless count\textgreater & Number of subdirectories allowed to be \\
& enqueued for parallel processing. Any \\
& remainders will be processed in-situ \\
\hline
-e & compress work items \\
\hline
\end{tabular}
\caption{\label{fig:Flags_for_dir2index} \gufidirindex Flags and Arguments}
\end{table}
Expand Down
24 changes: 20 additions & 4 deletions docs/latex/sections/gufi_dir2trace.tex
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,29 @@ \subsubsection{\gufidirtrace}

\begin{table} [htb]
\centering
\begin{tabular}{l|r}
Flag & Functionality \\\hline
\begin{tabular}{l|l}
Flag & Functionality \\
\hline
-h & help manual \\
\hline
-H & Show assigned input values \\
-n \textless num\_threads\textgreater & define number of threads to use \\
\hline
-n \textless num\_threads\textgreater & define number of threads to use \\
\hline
-x & pull xattrs from source file-sys into GUFI \\
-z \textless max\_level\textgreater & maximum level to go down to
\hline
-d \textless delim\textgreater & delimiter (one char) [use 'x' for 0x1E] \\
\hline
-k \textless filename\textgreater & file containing directory names to skip \\
\hline
-M \textless bytes\textgreater & target memory footprint \\
\hline
-C \textless count\textgreater & Number of subdirectories allowed to be \\
& enqueued for parallel processing. Any \\
& remainders will be processed in-situ \\
\hline
-e & compress work items \\
\hline
\end{tabular}
\caption{\label{fig:Flags_for_dir2trace} \gufidirtrace Flags and Arguments}
\end{table}
Expand Down
2 changes: 2 additions & 0 deletions docs/latex/sections/gufi_query.tex
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ \subsubsection{Flags}
\hline
-k & file containing directory names to skip \\
\hline
-M \textless bytes\textgreater & target memory footprint \\
\hline
\end{tabular*}
\caption{\label{tab:widgets} \gufiquery Flags and Arguments}
\end{table}
Expand Down
23 changes: 23 additions & 0 deletions docs/latex/sections/gufi_trace2index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -71,5 +71,28 @@ \subsubsection{\gufitraceindex}
thread trace files may be concatenated in any order into a smaller
number of larger files for processing.

Extended attributes will be processed if they are found in the traces.
There is no need to tell \gufitraceindex to process them
with \texttt{-x}.

\begin{table} [htb]
\centering
\begin{tabular}{l|l}
Flag & Functionality \\
\hline
-h & help manual \\
\hline
-H & Show assigned input values \\
\hline
-n \textless num\_threads\textgreater & define number of threads to use \\
\hline
-d \textless delim\textgreater & delimiter (one char) [use 'x' for 0x1E] \\
\hline
-M \textless bytes\textgreater & target memory footprint \\
\hline
\end{tabular}
\caption{\label{fig:Flags_for_dir2trace} \gufidirtrace Flags and Arguments}
\end{table}

\paragraph{Usage} ~\\
\gufitraceindex \texttt{[flags] trace\_file... index\_root}
76 changes: 63 additions & 13 deletions docs/latex/sections/optimizations.tex
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ \subsection{Optimizations}
In order for GUFI to be performant, many optimizations were used and
implemented.

\subsubsection{Reducing Branching}
\subsubsection{Reduced Branching}
In order to reduce the number of failed branch predictions experienced
by GUFI, branching was removed where possible. The main way this was
done was by intentionally skipping \texttt{NULL} pointer checks that
Expand All @@ -75,19 +75,25 @@ \subsubsection{Allocations}
allocations. To reduce the amount of dynamic allocations, C-string are
usually declared as fixed size arrays instead of pointers.

During descent, each work item is initially allocated on the stack since
there is no way to tell beforehand whether or not the path pointed to
by a directory entry is a directory. If the path is a directory to be
enqueued, the work item is copied into dynamically allocated memory
first. If not, the work item allocated on the stack is used.

Additionally, allocations are not performed by the standard
malloc. Instead, \texttt{jemalloc(3)} is used to override
\texttt{malloc(3)}. See \href{https://jemalloc.net/}{jemalloc's
website} for details.

\subsubsection{Not Calling \lstat During Tree Walk}
\texttt{struct dirent}s are returned when reading a directory with
\readdir. glibc's implementation of \texttt{struct dirent}
provides extra fields not required by POSIX.1. GUFI takes advantage of
the nonstandard \texttt{d\_type} field to not call \lstat when
determining whether or not the entry is a directory. This also
prevents memory allocations of work items that end up not being
enqueued for processing.
\texttt{struct~dirent}s are returned when reading a directory with
\readdir. glibc's implementation of \texttt{struct~dirent} provides
extra fields not required by POSIX.1. GUFI takes advantage of the
nonstandard \texttt{d\_type} field to not call \lstat when determining
whether or not the entry is a directory. This also prevents memory
allocations of work items that end up not being enqueued for
processing.

\subsubsection{Enqueuing Work Before Processing}
In order to reduce the amount of time the thread pool spends waiting
Expand All @@ -113,11 +119,55 @@ \subsubsection{Combining Strings with \memcpy}
One method of combining C-strings is by concantenating them with
\texttt{snprintf(3)} with format strings containing only \texttt{\%s}
format specifiers. Instead of parsing the format string, the
\texttt{SNFORMAT\_S} function was created to do \memcpy s on
the arguments, skipping figuring out whether or not inputs are strings
and how long they are by finding NULL terminators. Instead, lengths are
obtained as by-products of previous string manipulations and the
values are reused.
\texttt{SNFORMAT\_S} function was created to do \memcpy s on the
arguments, skipping figuring out whether or not inputs are strings and
how long they are by finding \texttt{NULL} terminators. Instead,
lengths are obtained as by-products of previous string manipulations
and the values are reused.

\subsubsection{In-Situ Processing}
Occassionally, GUFI might encounter extremely large directories. This
results in many long lived dynamic allocations being created during
descent, which can overwhelm memory. Users can set a subdirectory
limit so that if too many subdirectories are encountered within a
single directory, subdirectories past the user provided count will be
processed recursively using a work item allocated on the thread's
stack instead of being dynamically allocated and enqueued for
processing. This reduces memory pressure by limiting the amount of
work items that extremely large directories would otherwise spawn. The
subdirectories being processed recursively may themselves enqueue
dynamically allocated subdirectory work or recurse further down with
subdirectory work allocated on the stack.

\subsubsection{Smaller Enqueued Work Items}
The main data structure that is enqueued is \texttt{struct~work}. This
struct was approximately 14KiB in size prior to
\href{https://github.com/mar-file-system/GUFI/commit/2227d00665eb6d507ac2052e80616c077a5da853}{2227d00}. After
moving the parts of this structure that were not necssary for
directory tree traversal to \texttt{struct~entry\_data},
\texttt{struct work} was reduced to slightly over 8KiB.

This was used to fix
\href{https://github.com/mar-file-system/GUFI/issues/121}{Issue 121}.

\subsubsection{Compression with zlib}
When zlib is detected during CMake configuration, \texttt{struct~work}
can be compressed to further reduce the size of each work item that is
sitting in memory waiting to be processed. The compressed buffer,
originally allocated with \texttt{sizeof(struct~work))} bytes, is then
reallocated to the compressed size. The bulk of \texttt{struct~work}
is made up of text strings followed by \texttt{NULL} characters, both
of which are highly compressible, meaning that compressed work items
can be expected to be much smaller than uncompressed work items.

Note that \texttt{struct~work} is its own compressed buffer. Whether
or not the work item is compressed and the compressed length are now
the first two fields of \texttt{struct~work}. When a work item is
compressed, a pointer pointing to it will have less space allocated to
it than \texttt{sizeof(struct~work))}.

This was used to fix
\href{https://github.com/mar-file-system/GUFI/issues/121}{Issue 121}.

\subsubsection{Database Templates}
Every directory in an index contains at least one database file,
Expand Down
24 changes: 21 additions & 3 deletions docs/latex/sections/qptp.tex
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,27 @@ \subsubsection{Processing Queued Work}
popped off one at a time. All work is processed before the thread
returns to the work queue to find more work.

There exists a second work queue, called the deferred work queue, that
is pushed to if the thread pool is initialized with a non-zero
\texttt{queue\_limit}. Work items are placed in the deferred work
queue when the normal work queue has more than \texttt{queue\_limit}
items enqueued. The deferred work queue is only processed if the
normal work queue is empty when the worker thread goes to look for
more work. The normal work queue may still be pushed to if the work
queue was recently moved to the worker thread for processing. This
changes the order that work is processed, allowing for work to drain,
reducing memory pressure, while still continuously processing work
when there is work present.

\subsubsection{Usage}
\begin{enumerate}
\item Create a thread pool: \\ \texttt{QPTPool\_t *pool =
QPTPool\_init(nthreads, args, next, next\_args);} \\ The
\texttt{args} argument will be accessible by all threads that are
run. The \texttt{next} argument is a function that selects the
QPTPool\_init(nthreads, args, next, next\_args, queue\_limit);}

The \texttt{args} argument will be accessible by all threads that
are run.

The \texttt{next} argument is a function that selects the
thread id of the work queue to place the new work item
into. \texttt{next\_args} allows for extra arguments to be passed
into \texttt{next}. If \texttt{next} is \texttt{NULL}, the thread
Expand All @@ -96,12 +111,15 @@ \subsubsection{Usage}
instead of at \texttt{QPTPool\_enqueue} in order to not require a
branch to figure out whether or not the provided function pointer is
valid.

\item Add work: \\ \texttt{QPTPool\_enqueue(pool, 0, function,
work);} \\ The function passed into \texttt{QPTPool\_enqueue} must
match the signature found in \texttt{QPTPool.h}. The \texttt{work}
argument will only be accessible to the thread processing this work.

\item Wait for all work to be completed (threads are joined):
\\ \texttt{QPTPool\_wait(pool);} \\ This function exists to allow
for the collection of statistics before the context is destroyed.

\item Destroy the pool context: \\ \texttt{QPTPool\_destroy(pool);}
\end{enumerate}

0 comments on commit b998ef5

Please sign in to comment.