# <p style ="padding: 8px; background: linear-gradient(45deg, #000000, #ad5aff); color : #F8F8FF; font-family: Arial, sans-serif; font-size: 100%; text-align: center; border-radius: 20px; margin-top: 15px; box-shadow: 3px 3px 10px rgba(0,0,0,0.1); border: 2px solid #333;"> **Thesis Empirical Results** </p>


<p style = "text-align: justify; font-family: 'Georgia', serif; font-size: 110%; margin: 20px; border: 2px solid #333; padding: 10px; border-radius: 15px;"> 
In this notebook, I describe my empirical findings concerning the error term of the Polya-Vinogradov Inequality. At a high level, I generated data for a sample of primes and then analyzed that data. This report will contain the major results, along with a more in-depth description of everything later on.
</p>

<a id='top'></a>

# <p style ="padding: 8px; background: linear-gradient(45deg, #000000, #ad5aff); color : #F8F8FF; font-family: Arial, sans-serif; font-size: 70%; text-align: center; border-radius: 20px; margin-top: 15px; box-shadow: 3px 3px 10px rgba(0,0,0,0.1); border: 2px solid #333;"> **Table of Contents** </p>


<table style="margin-left: auto; margin-right: auto; width: 85%; border-collapse: collapse; font-family: 'Georgia', serif; font-size: 105%; border: 2px solid #333;">
    <tr>
        <td>No</td>
        <td>Contents</td>
        <td>No</td>
        <td>Contents</td>
    </tr>
    <tr>
        <td>1</td>
        <td><a href="#1"><font color="#F8F8FF"> Importing Libraries </font></a></td>
        <td>8</td>
        <td><a href="#8"><font color="F8F8FF">Diamond's Weight (Carat)</font></a></td>
    </tr>
</table>


<a id='1'></a>

# <p style ="padding: 8px; background: linear-gradient(45deg, #000000, #ad5aff); color : #F8F8FF; font-family: Arial, sans-serif; font-size: 70%; text-align: center; border-radius: 20px; margin-top: 15px; box-shadow: 3px 3px 10px rgba(0,0,0,0.1); border: 2px solid #333;"> **Problem Background** </p>

<p style = "text-align: justify; font-family: 'Georgia', serif; font-size: 110%; margin: 20px;"> 

Define $S_p(x) := \sum_{a \leq x} \left( \frac{a}{p} \right)$ to be the character sum of legendre symbols. In other words, the character sum is the partial sum of Legendre symbols. A famous inequality, known as the *Polya-Vinogradov* Inequality, gives us the following bound:
$$
|S_p(x)| \leq \sqrt{p} \log p.
$$

One way to prove this involves using *Polya's Fourier Expansion*: 

$$
S_p(x) = \frac{G(p)}{2\pi i} \sum_{i \leq |n| \leq H} \left( \frac{n}{p} \right) \frac{[1 - e(\frac{-nx}{p})]}{n} + O\left(\frac{p \log p}{H}\right) + O(1),
$$

where 
$$
G(p) = \begin{cases}
       \sqrt{p} & \text{ if } p \equiv 1 \ (\textrm{mod } 4) \\
       i \sqrt{p} & \text{ if } p \equiv -1 \ (\textrm{mod } 4).
\end{cases}
$$
</p>

It turns out if you instantiate $H = (\log_p)^2$, you get the following:
$$
S_p(x) = C\sqrt{p} \cdot \log \log p + O\left(\frac{p}{\log p}\right). 
$$
Notice that if the error term is smaller than we improve the Polya-Vinogradov Inequality! This is the ambitious ultimate goal, and a starting point was to generate numerics to see what this error term actually looks like, which is what this notebook showcases.

<a id='2'></a>

# <p style ="padding: 8px; background: linear-gradient(45deg, #000000, #ad5aff); color : #F8F8FF; font-family: Arial, sans-serif; font-size: 70%; text-align: center; border-radius: 20px; margin-top: 15px; box-shadow: 3px 3px 10px rgba(0,0,0,0.1); border: 2px solid #333;"> **Data Description** </p>

<p style = "text-align: justify; font-family: 'Georgia', serif; font-size: 110%; margin: 20px;"> 

First let's go through a bit of notation. We let $S(x)$ be the character sum of legendre symbols, and $F(x)$ be the main term of Polya's Fourier Expansion. We are interested in the max (or min) of $D(x) := S(x) - F(x)$. 

Specifically, I collected data for around $135$ primes, with $2$ being around $1,000,000$ and the rest being sampled uniformly at random from $100,000- 200,000$. For each prime $p$ I record the following pieces of info:
- prime        : p

- pos_error    : max(D(x)) 
- neg_error    : min(D(x)) 
- x_pos_error  : 100 * (argmax(D(x)) / prime)
- x_neg_error  : 100 * (argmin(D(x)) / prime)
- max_error    : max(pos_error, -1 $\cdot$ neg_error)
- max_error_ind: 1 if max_error == pos_error else 0
- x_max_error  : x_pos_error if max_error_ind == 1 else x_neg_error
- mag_diff     : |pos_error + neg_error|
- dist_check   : |100 - x_pos_error - x_neg_error|
- mod_4        : prime (mod 4)

Additionally, I also create a "diff plot", which is simply a scatterplot of $D(x)$ vs. $x$. 