# Write  roots to files

In this notebook, we create data sets consisting of roots of polynomial equations having restricted 0,1 coefficients.

## Creating dataframe
In this first section we create a dataframe that we want to study.

First we create a list of all possible coefficients $(1,a_{d-1}, \cdots, a_1, 1)$ of polynomials of degree $d$ of the form
    $$P(z)=z^{d}+a_{d-1}z^{d-1}\cdots + a_1z + 1 $$
where $a_i\in\{0,1\}$

Here Python is quite handy, as there is an entire module of functions for doing very specific types of iteration (here we need the one called **product**).

In [1]:
from itertools import product

To make it easier for you to make a new list for different degrees (say $d=6$, or $d=9$,  etc.), we create a function which can be easily reused by inputting in place of *degree* the desired maximum degree.

In [2]:
degree = 4 #this is a parameter cell

When inserting 'parameters' used by the Run Notebook feature, degree will be input as a string. We therefore convert it below.

In [3]:
degree = int(degree)

In [4]:
def make_coeff(degree):
    coefficients = list(product(range(2), repeat = degree -1 ))
    for k in range(len(coefficients)):
        coefficients[k] = tuple([1]) + coefficients[k] + tuple([1])
    return coefficients

For example, when $d=4$ as was specified earlier in the default parameter, the list is below.

In [5]:
coefficients = make_coeff(degree)

In [6]:
coefficients[:5] #inspect first 5

[(1, 0, 0, 0, 0, 0, 0, 0, 1),
 (1, 0, 0, 0, 0, 0, 0, 1, 1),
 (1, 0, 0, 0, 0, 0, 1, 0, 1),
 (1, 0, 0, 0, 0, 0, 1, 1, 1),
 (1, 0, 0, 0, 0, 1, 0, 0, 1)]

We use Numpy to find the roots. 

In [7]:
import numpy as np
import pandas as pd

We create an empty dataframe to populate with the roots. Each root has a real and imaginary part.

In [8]:
array = degree*make_coeff(degree) #there are d solutions for each choice of coefficients when the degree is d 
df = pd.DataFrame(array, columns = [f'a{i}' for i in range(degree,-1, -1)])
df['real'] = ''
df['imag'] = ''

In [9]:
df.head()

Unnamed: 0,a8,a7,a6,a5,a4,a3,a2,a1,a0,real,imag
0,1,0,0,0,0,0,0,0,1,,
1,1,0,0,0,0,0,0,1,1,,
2,1,0,0,0,0,0,1,0,1,,
3,1,0,0,0,0,0,1,1,1,,
4,1,0,0,0,0,1,0,0,1,,


Iterate through list of coefficients, computing roots and inputting them to dataframe.

In [10]:
n = len(coefficients) #total number of coefficients
for i in range(len(coefficients)): #iterate through list of coefficients
    rootsi = np.roots(coefficients[i]) #solve roots of that coefficient
    roots_index = 0 #each coefficient has d roots, index to iterate over roots
    for j in range(i,len(df), n): #iterate over every nth row of dataframe
        rootj=rootsi[roots_index] #select root
        roots_index+=1 
        df.at[j,'real'] = rootj.real #replace empty value with real solution
        df.at[j,'imag'] = rootj.imag #replace empty value with imaginary solution

In [11]:
df[['real','imag']] = df[['real','imag']].astype(float)

In [12]:
df.dtypes

a8        int64
a7        int64
a6        int64
a5        int64
a4        int64
a3        int64
a2        int64
a1        int64
a0        int64
real    float64
imag    float64
dtype: object

## In this second section, we save the dataframe to an S3 bucket.

I made a bucket to save to, let's write a line to check if it exists. If no output is returned, it exists, otherwise an error is returned.

In [13]:
bucket_name = 'restricted-coefficients'

In [14]:
!aws s3api head-bucket --bucket $bucket_name

First we install the AWS SDK for Pandas, formerly known as `awswrangler`.

In [15]:
!pip install awswrangler

Keyring is skipped due to an exception: 'keyring.backends'
Collecting awswrangler
  Using cached awswrangler-2.19.0-py3-none-any.whl (267 kB)
Collecting progressbar2<5.0.0,>=4.0.0
  Using cached progressbar2-4.2.0-py2.py3-none-any.whl (27 kB)
Collecting requests-aws4auth<2.0.0,>=1.1.1
  Using cached requests_aws4auth-1.2.0-py2.py3-none-any.whl (24 kB)
Collecting pg8000<2.0.0,>=1.20.0
  Using cached pg8000-1.29.4-py3-none-any.whl (51 kB)
Collecting opensearch-py<3,>=1
  Downloading opensearch_py-2.1.0-py2.py3-none-any.whl (220 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m220.7/220.7 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting pymysql<2.0.0,>=1.0.0
  Using cached PyMySQL-1.0.2-py3-none-any.whl (43 kB)
Collecting backoff<3.0.0,>=1.11.1
  Using cached backoff-2.2.1-py3-none-any.whl (15 kB)
Collecting gremlinpython<4.0.0,>=3.5.2
  Using cached gremlinpython-3.6.1-py2.py3-none-any.whl (73 kB)
Collecting jsonpath-ng<2.0.0,>=1.5.3
  Using cache

Next, we use import this package and use it to save to a Parquet file.

In [16]:
import awswrangler as wr

In [17]:
wr.s3.to_parquet(df,f's3://restricted-coefficients/{degree}')

{'paths': ['s3://restricted-coefficients/8'], 'partitions_values': {}}

Uncomment the line below to check that the data can be recovered together with data types.

In [18]:
#df2 = wr.s3.read_parquet(path=f's3://restricted-coefficients/{degree}')
#df2

Saving roots up degree 16 was about 12 seconds, up to 17 about 27 seconds, up to 18 about 55 seconds, up to 19 about 2 minutes, up to 20 (4m 7s), up to 21 (9m 41 s), up to 22 (20 m 49 s), up to 23 (52 m 54s).