<h3>Multi-Index DataFrame Creation</h3>
<p>This was an aside learning exercise for a larger project which I ended up scrapping because the resulting dataframe, while looking neat, was just not memory efficient at all and even with Sparse dataframes it was still a tad larger than desired, but couldn't be exported to_parquet. I opted for multi-index rows and a diferent initial dictionary structure for similar data. Alas though, here I set up a structure for creating datasets of the results of the number of digits functions create, organized by a digit-categorizing algorithm using multi-index columns for searching.</p>
<p>Some simple sample functions are used here, but I intend to use this technique to create sets of very large intergers you wouldn't want to store. Therefore I instead save arrays of inputs based on number of digits coupled with categorizing the large numbers. I.e. Take a really large number that is theoretically created from one of these functions and then find which function(s) created it via searching #ofdigits and the result of digit_quantify(), then recompute inputs and match. An example of this is at the end.</p>

In [1]:
import math
import pandas as pd

In [2]:
#Core functions
def get_digits(num: int) -> int:
    '''Returns the number of digits of a given integer.'''
    num = abs(num) #Force positive
    if num == 0: #catch the log10(zero) error
        return 0
    return 1 + math.floor(math.log10(num))

def digit_quantify(num: int, byte_amount: int = 4) -> int:
    '''
    Algorithm splits up digits by approx golden ratio and then subtracts
    the larger portion from the smaller and takes the absolute.
    It keeps doing this until the result is below the byte-storage threshold
    (byte_amount converted to uint) and is a way to "quantify" a pattern of 
    digits into a smaller repeatable format for search speed improvements.
    
    Note: The digit-split with divmod() will cause numbers ending in 0 
    getting split by 10^spit to chop off the 0's and they are more unique.
    '''
    #Convert byte_storage into max uint possible.
    max_number = 2**(8*byte_amount) - 1 
    digits = get_digits(num)
    #Split by Phi and subtract digits from each other until you get a
    #number below the uint threashold. Higher threashold gives more resolution.
    while max_number < num:
        split_amount = round(digits/1.618034) #Phi/Golden Ratio = 1.61803398875
        split = divmod(num, 10**split_amount)
        num = abs(split[1] - split[0])#abs() for case of multiple zeros in num and split[1] is smaller
        digits = get_digits(num)
    return num

<h4>Create initial digit-key dictionary of quantify-key dictionaries which hold arrays of inputs. This will create the base DataFrames:</h4>

In [9]:
dict1 = {}
for x in range(20, 25+1):
    for y in range(20,25+1):
        result = x**y
        num_digits = get_digits(result)

        digit_quantifier = digit_quantify(result, 2)
        #Dyanmically create dictionaries/data
        if num_digits in dict1:
            if digit_quantifier in dict1[num_digits]:
                dict1[num_digits][digit_quantifier].append([x, y])
            else:
                dict1[num_digits][digit_quantifier] = [[x,y]]
        else:
            dict1[num_digits] = {digit_quantifier : [[x,y]]}

In [10]:
dict1

{27: {4895: [[20, 20]], 7554: [[21, 20]], 4757: [[22, 20]]},
 28: {7752: [[20, 21]],
  7681: [[21, 21]],
  8251: [[23, 20]],
  4759: [[24, 20]],
  8745: [[25, 20]]},
 29: {5503: [[20, 22]], 7069: [[22, 21]], 5764: [[23, 21]], 836: [[24, 21]]},
 30: {1005: [[20, 23]],
  1: [[21, 22]],
  926: [[22, 22]],
  906: [[23, 22]],
  491: [[25, 21]]},
 32: {3009: [[20, 24]], 7178: [[21, 24]], 3887: [[23, 23]], 3735: [[24, 23]]},
 33: {51: [[20, 25]], 9896: [[22, 24]], 9239: [[23, 24]], 5767: [[25, 23]]},
 31: {5771: [[21, 23]], 512: [[22, 23]], 1578: [[24, 22]], 6509: [[25, 22]]},
 34: {31907: [[21, 25]], 40280: [[22, 25]], 6148: [[24, 24]], 368: [[25, 24]]},
 35: {4607: [[23, 25]], 1237: [[24, 25]], 3923: [[25, 25]]}}

In [4]:
#Turn the raw dictionary into a data frame ready for multi-indexing concatenations with like-frames
df1 = pd.DataFrame(dict1)
df1.index.name = 'Quant'
func_name = '5x + 7y'
multi_cols = []
for col in df1.columns:
    multi_cols.append((col, func_name))
df1.columns = pd.MultiIndex.from_tuples(multi_cols)

In [5]:
df1

Unnamed: 0_level_0,2,3
Unnamed: 0_level_1,5x + 7y,5x + 7y
Quant,Unnamed: 1_level_2,Unnamed: 2_level_2
31,"[[2, 3]]",
38,"[[2, 4]]",
45,"[[2, 5]]",
52,"[[2, 6]]",
59,"[[2, 7]]",
...,...,...
759,,"[[65, 62]]"
766,,"[[65, 63]]"
773,,"[[65, 64]]"
780,,"[[65, 65]]"


In [6]:
#Copy/paste and slightly modify for a new functions (will turn into recursive looping function in big project)
dict2 = {}
for z in range(2, 65+1):
    for x in range(2, 65+1):
        for y in range(2,65+1):
            result = x**2 + 2*y + z
            num_digits = get_digits(result)
            if num_digits > 10: #Cap the results for testing.
                continue
            digit_quantifier = digit_quantify(result, 2)
            #Dyanmically create dictionaries/data
            if num_digits in dict2:
                if digit_quantifier in dict2[num_digits]:
                    dict2[num_digits][digit_quantifier].append([x, y, z])
                else:
                    dict2[num_digits][digit_quantifier] = [[x, y, z]]
            else:
                dict2[num_digits] = {digit_quantifier : [[x, y, z]]}

In [7]:
df2 = pd.DataFrame(dict2)
df2.index.name = 'Quant'
func_name = 'x^2 + 2y + z'
multi_cols = []
for col in df2.columns:
    multi_cols.append((col, func_name))
df2.columns = pd.MultiIndex.from_tuples(multi_cols)

In [8]:
df2

Unnamed: 0_level_0,2,3,4
Unnamed: 0_level_1,x^2 + 2y + z,x^2 + 2y + z,x^2 + 2y + z
Quant,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
10,"[[2, 2, 2]]",,
12,"[[2, 3, 2], [2, 2, 4]]",,
14,"[[2, 4, 2], [2, 3, 4], [2, 2, 6]]",,
16,"[[2, 5, 2], [3, 2, 3], [2, 4, 4], [2, 3, 6], [...",,
18,"[[2, 6, 2], [3, 3, 3], [2, 5, 4], [3, 2, 5], [...",,
...,...,...,...
4416,,,"[[65, 65, 61], [65, 64, 63], [65, 63, 65]]"
4417,,,"[[65, 65, 62], [65, 64, 64]]"
4418,,,"[[65, 65, 63], [65, 64, 65]]"
4419,,,"[[65, 65, 64]]"


In [9]:
#Create a new large dataset by concatenating the two raw frames
full_df = pd.concat([df1,df2], join='outer', axis = 1)

In [10]:
full_df

Unnamed: 0_level_0,2,3,2,3,4
Unnamed: 0_level_1,5x + 7y,5x + 7y,x^2 + 2y + z,x^2 + 2y + z,x^2 + 2y + z
Quant,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
10,,,"[[2, 2, 2]]",,
11,,,"[[2, 2, 3]]",,
12,,,"[[2, 3, 2], [2, 2, 4]]",,
13,,,"[[2, 3, 3], [2, 2, 5]]",,
14,,,"[[2, 4, 2], [2, 3, 4], [2, 2, 6]]",,
...,...,...,...,...,...
4416,,,,,"[[65, 65, 61], [65, 64, 63], [65, 63, 65]]"
4417,,,,,"[[65, 65, 62], [65, 64, 64]]"
4418,,,,,"[[65, 65, 63], [65, 64, 65]]"
4419,,,,,"[[65, 65, 64]]"


<h4>Now we add a 3rd function/set and combine with full_df to ensure we can keep "appending" this way:</h4>

In [11]:
dict3 = {}
for z in range(2, 65+1):
    for x in range(2, 65+1):
        for y in range(2,65+1):
            result = x**3 + y**2 + z
            num_digits = get_digits(result)
            if num_digits > 10: #Cap the results for testing.
                continue
            digit_quantifier = digit_quantify(result, 2)
            #Dyanmically create dictionaries/data
            if num_digits in dict3:
                if digit_quantifier in dict3[num_digits]:
                    dict3[num_digits][digit_quantifier].append([x, y, z])
                else:
                    dict3[num_digits][digit_quantifier] = [[x, y, z]]
            else:
                dict3[num_digits] = {digit_quantifier : [[x, y, z]]}

In [12]:
df3 = pd.DataFrame(dict3)
df3.index.name = 'Quant'
func_name = 'x^3 + y^2 + z'
multi_cols = []
for col in df3.columns:
    multi_cols.append((col, func_name))
df3.columns = pd.MultiIndex.from_tuples(multi_cols)

In [13]:
df3

Unnamed: 0_level_0,2,3,4,5,6
Unnamed: 0_level_1,x^3 + y^2 + z,x^3 + y^2 + z,x^3 + y^2 + z,x^3 + y^2 + z,x^3 + y^2 + z
Quant,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
14,"[[2, 2, 2]]",,,"[[44, 54, 2], [41, 34, 7], [42, 63, 7], [43, 2...","[[61, 55, 3], [55, 60, 28], [62, 41, 29], [61,..."
19,"[[2, 3, 2], [2, 2, 7]]",,,"[[42, 63, 2], [42, 31, 7], [44, 54, 7], [41, 1...","[[60, 63, 34], [62, 41, 34], [61, 55, 36], [55..."
26,"[[2, 4, 2], [2, 3, 9], [2, 2, 14]]",,,"[[43, 60, 2], [42, 3, 3], [46, 28, 4], [41, 13...","[[53, 34, 8], [62, 41, 41], [61, 55, 43]]"
35,"[[2, 5, 2], [3, 2, 4], [2, 4, 11], [2, 3, 18],...",,,"[[44, 44, 2], [42, 4, 5], [40, 45, 6], [40, 55...","[[63, 3, 4], [46, 52, 5], [63, 2, 9], [53, 34,..."
46,"[[2, 6, 2], [3, 4, 3], [3, 3, 10], [2, 5, 13],...",,,"[[41, 10, 2], [45, 3, 3], [42, 44, 6], [42, 5,...","[[49, 49, 8], [63, 4, 8], [63, 3, 15], [46, 52..."
...,...,...,...,...,...
65221,,,,"[[40, 34, 65]]",
65290,,,,"[[40, 35, 65]]",
65361,,,,"[[40, 36, 65]]",
65434,,,,"[[40, 37, 65]]",


<h4>Add to full DataFrame and sort columns:</h4>

In [14]:
full_df = pd.concat([full_df,df3], join='outer', axis = 1)
full_df.sort_index(axis=1, level=[0,0], ascending=[True, True], inplace=True)

In [15]:
full_df

Unnamed: 0_level_0,2,2,2,3,3,3,4,4,5,6
Unnamed: 0_level_1,5x + 7y,x^2 + 2y + z,x^3 + y^2 + z,5x + 7y,x^2 + 2y + z,x^3 + y^2 + z,x^2 + 2y + z,x^3 + y^2 + z,x^3 + y^2 + z,x^3 + y^2 + z
Quant,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
0,,,,,,,,,"[[44, 30, 2], [41, 12, 4], [45, 31, 6], [41, 5...","[[62, 41, 15], [61, 55, 17], [55, 60, 42], [53..."
1,,,,,,,,,"[[45, 63, 2], [41, 12, 3], [44, 30, 3], [41, 1...","[[62, 41, 14], [61, 55, 16], [62, 41, 16], [61..."
2,,,,,,,,,"[[41, 12, 2], [45, 63, 3], [44, 30, 4], [45, 3...","[[62, 41, 13], [61, 55, 15], [62, 41, 17], [61..."
3,,,,,,,,,"[[46, 42, 2], [45, 31, 3], [45, 63, 4], [44, 3...","[[62, 41, 12], [61, 55, 14], [62, 41, 18], [61..."
4,,,,,,,,,"[[45, 31, 2], [46, 42, 3], [45, 63, 5], [44, 3...","[[62, 41, 11], [61, 55, 13], [62, 41, 19], [61..."
...,...,...,...,...,...,...,...,...,...,...
65531,,,,,,,,,"[[40, 39, 10]]",
65532,,,,,,,,,"[[40, 39, 11]]",
65533,,,,,,,,,"[[40, 39, 12]]",
65534,,,,,,,,,"[[40, 39, 13]]",


In [16]:
#And now reverse-find what makes the following integer:
number = 69658
digits = get_digits(number)
quantifier = digit_quantify(number, 2)
print(f'{number} has {digits} digits with quantifier {quantifier}.')

69658 has 5 digits with quantifier 589.


In [17]:
look_up = full_df[digits].loc[quantifier]
print(look_up)

x^3 + y^2 + z    [[43, 34, 6], [44, 22, 6], [41, 27, 8], [44, 5...
Name: 589, dtype: object


In [18]:
#pretend to read and choose function x^3 + y^2 + z for result (I cherry-picked)
for idx, function in enumerate(look_up.index):
    for terms in look_up[idx]:
        result = terms[0]**3 + terms[1]**2 + terms[2]
        if result == number:
            print(f'Found the following {function} terms to create {number}:\n{terms}')

Found the following x^3 + y^2 + z terms to create 69658:
[41, 27, 8]
Found the following x^3 + y^2 + z terms to create 69658:
[41, 26, 61]
