@@ -5,7 +5,11 @@ \chapter{Technical background}\label{chap:theory}

Follow a quick presentation of FPGAs and how they can be driven from the operating system.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Operating system}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%User/kernel space. Networking application: need to send the data through the kernel: huge overhead.

@@ -22,7 +26,11 @@ \subsection{Device Driver}\label{sec:theory-driver}
Hopefully, modern monolithic kernels such as the Linux kernel from 2.6 provide preemptive scheduling~\cite{Santhanam2003}, that is the scheduler interrupts the running task and assigns the processor ressources it used to an other one.
Hence, systems with a lot of processes in need for CPU ressources would not be stalled, but it would not change anything if the only process heavily requesting processor time is the one using the driver.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{FPGA}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Driving from the OS: basically, they will need to share some memory.
That memory can be directly mapped and accessed from the user-space using \texttt{/dev/mem}, or can use a direct memory access module (DMA).
@@ -31,7 +39,26 @@ \section{FPGA}
When the CPU write something in those descriptor and synchronize them with the DMA, it does not have to care about them anymore, the DMA in now in charge to send them to the device where registers are ready to read the incomming data.
The same goes from the device to the CPU: when the device wants to communicate data to the OS, it writes it on the DMA that will transfer them to the CPU, triggering a flag on the way to notify it.




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Cryptography}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Cryptography is the corner stone of security.
The four main goals are the following, as defined in~\cite{Menezes1996}:
\begin{description}
\item[Confidentiality] keeping information secret from all but those who are authorized to see it.
\item[Integrity] ensuring information has not been altered by unauthorized or unknown means.
\item[Source Authentication] corroborating the source of information.
\item[Non-repudiation] preventing the denial of previous commitments or actions.
\end{description}

In order to achieve those, four cryptographic primitives are needed: symmetric and asymmetric ciphers, message digests and digital signatures.




\subsection{Symetric cryptography}
Talk about encryption, integrity and authentication.
@@ -40,8 +67,57 @@ \subsection{Symetric cryptography}
\subsubsection{AES}
Many modes, CBC is mainly used, GCM is great.

\subsubsection{SHA}
Keyed signature algorithm, several versions in place. SHA-1 is depreciated, SHA-2 is widely used and SHA-3 is already defined and begins to be implemented.
\begin{figure}
\includegraphics[width=\textwidth]{nist-cbc}
\caption{CBC encryption and decryption diagram}{taken from the NIST recommendation~\cite{nist-sp800-38A}.}
\label{fig:cbc-encrypt-decrypt}
\end{figure}

\begin{figure}
\includegraphics[width=\textwidth]{nist-gcm-encrypt}
\caption{GCM encryption diagram}{taken from the NIST specification~\cite{mcgrew2005}. The ciphertext blocks are formed by \textit{xor}-ing the encrypted counter and the plaintext. The tag is generated by a chain of ciphertext \textit{xor}-ing with Galois field multiplicated data. The decryption works excatly the same way, except the plaintext and ciphertext are swapped.}
\label{fig:gcm-encrypt}
\end{figure}








\subsection{Message digest}
%Not the same as digital signature because we use the same key for both MAC values.
A message digest is the result of a one-way mathematical function of a fixed size.
Those hash functions are of two types~\cite{infof405}: manipulation detection codes (MDC) to guarantee integrity and message authentication codes (MAC) to guarantee both integrity and source authentication.


An MDC $h(x)$ can follow an iterative construction for a message $x$ including $t$ blocks:
\[
\begin{dcases}
H_0 = \mbox{initial value}\\
H_i = f(H_{i-1}, x_i), \mbox{with } i \in [1,t]\\
h(x) = H_t\\
\end{dcases}
\]

Based on this design and adding a key to the process, the RFC 2104~\cite{rfc2104} defines a MAC:
\[
HMAC(k, x) = h((k\oplus opad)|h((k\oplus ipad)|x))
\]
with a key $k$, and two padding block added for security concerns: an outer pad $opad$ and an inner pad $ipad$.

\noindent There exist a wide varety of MDCs, ranging from block cipher based such as Miyaguchi-Preneel, customized such as MD5, SHA-1 and SHA-2, or built using modular arithmetic such as MASH-1.

In both schemes, data integrity can be guaranteed because the flip of one bit will irremediably change the digest.
However, only a MAC can ensure source authentication since it is the only one based on a shared secret key.

Now rises the question of what and when authenticating.
\citet{Bellare2000} prooved that the most secure solution is to encrypt then compute the MAC from the ciphertext.
We will see in section~\ref{sec:theory-network} that if IPSec follows this recommandation, SSL/TLS does not and MAC first the plaintext then encrypt the message.




\subsection{Asymetric cryptography}
% Diffie-Hellman: show the math about the shared key, maybe from the course INFOH405 or RFC2631, and point which operation will be offloaded.
@@ -69,8 +145,6 @@ \subsubsection{RSA}
\end{description}




\subsubsection{Diffie-Hellman}
Diffie-Hellman is a secret key exchange protocol: two parties compute a shared secret $ZZ$ that can be used as a symetric key during the following exchanges.
It uses the same kind of operation as RSA, that is modular exponentiation.
@@ -109,7 +183,20 @@ \subsubsection{Diffie-Hellman}
We will see in chapter~\ref{chap:results} that while a 1024-bit prime is easily manageable by full software implementation, hardware offloading become a necessity for 4096-bit primes.
Moreover, 1024-bit parameter size, both RSA and Diffie-Hellman, are disallowed by the NIST recommandations since 2013~\cite{nist-sp800-131A}.

\section{Network and VPN implementation}











%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Network and VPN implementation}\label{sec:theory-network}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Present the TCP/IP layering
The RFC 1122~\cite{rfc1122} defines the TCP/IP and OSI stack as in the table~\ref{tab:tcp-ip-stack}.
@@ -137,15 +224,33 @@ \section{Network and VPN implementation}
% First explain what a VPN is.

There exist several major implementations of VPN: SSL, IPSec, L2TP and PPTP.
The later was developped by a vendor consortium and proposed in the RFC 2637 and will not be discussed further.
The later was developped by a vendor consortium leaded by Microsoft and proposed in the RFC 2637 and will not be discussed further.

\subsection{SSL/TLS}
% Introduce SSL/TLS, talk about the protocol, the key exchange and stuff, but leaver OpenVPN for the 'implementation' chapter.
% Question the security? Apparently, it does mac-then-encrypt, which is insecure regarding certain types of attacks \cite{cryptoeprint:2001:045}.
% Plus, SSLv3 is to be deprecated, according to a queued RFC: http://www.rfc-editor.org/internet-drafts/draft-ietf-tls-sslv3-diediedie-03.txt, and is not supported anymore by many servers since poodle.
Application level security.














\subsection{IPSec}
% IPSec as a protocol, strongswan come in the 'implementation' chapter.
Modification of the IP stack in the kernel space.
IPSec is a network level security; it examines incomming IP packets and checks if there exists a security association with the destination, and decrypt it on-the-fly if necessary.

One of the main disadvantage compared to a user-space VPN is the difficulty to traverse NAT.

%TODO Figure IPSec frame structure.

@@ -12,7 +12,7 @@ \subsection{x86 host}
\begin{framed}
\begin{description}
\item[OS] Ubuntu 12.04 LTS, kernel 3.16
\item[CPU] Intel Core-i3 ... (two logical core out of four)
\item[CPU] Intel Core-i5 (two logical core out of four)
\item[RAM] 1GB DDR3
\end{description}
\end{framed}
@@ -23,7 +23,7 @@ \subsection{Altera Socrates SoCFPGA}
\begin{description}
\item[OS] Yocto project, kernel 3.14
\item[CPU] Dual core ARM Cortex-A9, 800MHz
\item[RAM] ...GB DDR3
\item[RAM] 1GB DDR3
\item[FPGA] Altera Cyclone V
\end{description}
\end{framed}
@@ -42,7 +42,7 @@ \section{TLS Connections}
Otherwise, all the clients share the same basic configuration file (see listing~\ref{list:openvpn-config-client}), which tell them to renogociate a new connection every second.
Hence, if a connection could be made with no delay and if the processes scheduling were ideal, the server would have to address 600 connections per minute.

\lstinputlisting[language=bash]{stress-openvpn.sh}
\lstinputlisting[language=bash, label=list:openvpn-client-script, caption={Script starting ten clients in parallel who will stress the server.}]{stress-openvpn.sh}

\section{File transfer}
The file transfered is an un compressed block of 128MB of random data generated using the following command: