## Recreating the Results in "The Phaistos Disk: A New Way of Viewing the Language Behind the Script" (Davis 2018)

(Note: You will need to [install a font](https://www.evertype.com/emono/evermono.zip) to view the Phaistos Disc glyphs used in this article.)

I use the Linear A corpus at https://lineara.xyz to recreate the results from [Brent Davis' paper](https://sci-hub.st/https://doi.org/10.1111/ojoa.12151) showing a 
statistically significant relationship between the bigrams in the Phaistos Disc and Linear A.

My findings from this exercise are:

* I find that one bigram identified as common between the two is doubtful. '𐘠𐘚' (TI-I) does not actually appear in Linear A. It may be that the bigram is with a variation of 𐘚(TI) which is 𐘛(*28B). We find a single instance of '𐘠𐘛' (TI-*28B) in the Linear A corpus, in ZA6b. It is not clear to me if it is valid to treat (TI-*28B) as the equivalent of (TI-I). If it is not, then the number of matching bi-grams between the Phaistos disc and the Linear A corpus must be revised down to 16. This no longer falls within the region of statistical significance, which Davis identifies as 16.4 or above.

* I get a better p-value than [Davis 2018] for his mapping. 

* I propose an alternative mapping of PD and Linear A symbols that achieves a better proportion of bigrams found in both Linear A and the Phaistos Disc and a substantially lower p-value than [Davis 2018]. This mapping is as follows:
![alt text](HoganMapping.png "Title")
![alt text](Diff.png "Title")



[Davis 2018]:https://sci-hub.st/https://doi.org/10.1111/ojoa.12151


## Recreating the Results of Davis 2018

First we import the Phaistos Disc inscription. We also initialize a list of symbols from the Phaistos Disc and all known symbols from Linear A.

In [3]:
import sys
print(sys.version)

3.8.10 (default, Sep 28 2021, 16:10:42) 
[GCC 9.3.0]


In [4]:
import json
import itertools as it
import pandas as pd
from IPython.display import display
pd.set_option("display.latex.repr", True)

styles = [dict(selector="caption", 
    props=[("text-align", "center"),
    ("font-size", "120%"),
    ("color", 'black')])]

pd_inscription_a = ("𐇑𐇛𐇜𐇐𐇡𐇽|𐇧𐇷𐇛|𐇬𐇼𐇖𐇽|𐇬𐇬𐇱|𐇑𐇛𐇓𐇷𐇰|𐇪𐇼𐇖𐇛|𐇪𐇻𐇗|𐇑𐇛𐇕𐇡|𐇮𐇩𐇲|"
                "𐇑𐇛𐇸𐇢𐇲|𐇐𐇸𐇷𐇖|𐇑𐇛𐇯𐇦𐇵𐇽|𐇶𐇚|𐇑𐇪𐇨𐇙𐇦𐇡|𐇫𐇐𐇽|𐇑𐇛𐇮𐇩𐇽|𐇑𐇛𐇪𐇪𐇲𐇴𐇤|𐇰𐇦|"
                "𐇑𐇛𐇮𐇩𐇽|𐇑𐇪𐇨𐇙𐇦𐇡|𐇫𐇐𐇽|𐇑𐇛𐇮𐇩𐇽|𐇑𐇛𐇪𐇝𐇯𐇡𐇪|𐇕𐇡𐇠𐇢|𐇮𐇩𐇛|𐇑𐇛𐇜𐇐|𐇦𐇢𐇲𐇽|𐇙𐇒𐇵|"
                "𐇑𐇛𐇪𐇪𐇲𐇴𐇤|𐇜𐇐|𐇙𐇒𐇵|")
pd_words_a = pd_inscription_a.split('|')

pd_inscription_b = ("𐇑𐇛𐇥𐇷𐇖|𐇪𐇼𐇖𐇲|𐇑𐇴𐇦𐇔𐇽|𐇥𐇨𐇪|𐇰𐇧𐇣𐇛|𐇟𐇦𐇡𐇺𐇽|𐇜𐇐𐇶𐇰|𐇞𐇖𐇜𐇐𐇡|𐇥𐇴𐇹𐇨|"
                    "𐇖𐇧𐇷𐇲|𐇑𐇩𐇳𐇷|𐇪𐇨𐇵𐇐|𐇬𐇧𐇧𐇣𐇲|𐇟𐇝𐇡|𐇬𐇰𐇐|𐇕𐇲𐇯𐇶𐇰|𐇑𐇘𐇪𐇐|𐇬𐇳"
                    "𐇖𐇗𐇽|𐇬𐇗𐇜|𐇬𐇼𐇖𐇽|𐇥𐇬𐇳𐇖𐇗𐇽|𐇪𐇱𐇦𐇨|𐇖𐇡𐇲|𐇖𐇼𐇖𐇽|𐇖𐇦𐇡𐇧|𐇥𐇬𐇳𐇖𐇗𐇽|𐇘𐇭𐇶𐇡𐇖|"
                    "𐇑𐇕𐇲𐇦𐇖|𐇬𐇱𐇦𐇨|𐇼𐇖𐇽|")
pd_words_b = pd_inscription_b.split('|')

pd_inscription = pd_inscription_a + pd_inscription_b
pd_words = pd_inscription.replace('𐇽','').split('|')

pd_symbols = ["𐇐", "𐇑", "𐇒", "𐇓", "𐇔", "𐇕", "𐇖", "𐇗", "𐇘", "𐇙", "𐇚", 
    "𐇛", "𐇜", "𐇝", "𐇞", "𐇟", "𐇠", "𐇡", "𐇢", "𐇣", "𐇤", "𐇥", "𐇦", "𐇧", "𐇨", "𐇩", "𐇪", "𐇫", "𐇬", "𐇭", "𐇮", "𐇯"
    , "𐇰", "𐇱", "𐇲", "𐇳", "𐇴", "𐇵", "𐇶", "𐇷", "𐇸", "𐇹", "𐇺", "𐇻", "𐇼"]

la_symbols = ["𐄂", "𐘀", "𐘁", "𐘂", "𐘃", "𐘄", "𐘅", "𐘆", "𐘇", "𐘈", "𐘉", "𐘊", "𐘋", "𐘌", "𐘍", "𐘎", 
    "𐘏", "𐘐", "𐘑", "𐘒", "𐘓", "𐘔", "𐘕", "𐘖", "𐘗", "𐘘", "𐘙", "𐘚", "𐘛", "𐘜", "𐘝", "𐘞",
    "𐘟", "𐘠", "𐘡", "𐘢", "𐘣", "𐘤", "𐘥", "𐘦", "𐘧", "𐘨", "𐘩", "𐘪", "𐘫", "𐘬", "𐘭", "𐘮",
    "𐘯", "𐘰", "𐘱", "𐘲", "𐘳", "𐘴", "𐘵", "𐘶", "𐘷", "𐘸", "𐘹", "𐘺", "𐘻", "𐘼", "𐘽", "𐘾",
    "𐘿", "𐙀", "𐙁", "𐙂", "𐙃", "𐙄", "𐙅", "𐙆", "𐙇", "𐙈", "𐙉", "𐙊", "𐙋", "𐙌", "𐙍", 
    "𐙎", "𐙏", "𐙐", "𐙑", "𐙒", "𐙓", "𐙔", "𐙕", "𐙖", "𐙗", "𐙘", "𐙙", "𐙚", "𐙛", "𐙜", "𐙝", 
    "𐙞", "𐙟", "𐙠", "𐙡", "𐙢", "𐙣", "𐙤", "𐙥", "𐙦", "𐙧", "𐙨", "𐙩", "𐙪", "𐙫", "𐙬", "𐙭",
    "𐙮", "𐙯", "𐙰", "𐙱", "𐙲", "𐙳", "𐙴", "𐙵", "𐙶", "𐙷", "𐙸", "𐙹", "𐙺", "𐙻", "𐙼", "𐙽",
    "𐙾", "𐙿", "𐚀", "𐚁", "𐚂", "𐚃", "𐚄", "𐚅", "𐚆", "𐚇", "𐚈", "𐚉", "𐚊", "𐚋", "𐚌", "𐚍",
    "𐚎", "𐚏", "𐚐", "𐚑", "𐚒", "𐚓", "𐚔", "𐚕", "𐚖", "𐚗", "𐚘", "𐚙", "𐚚", "𐚛", "𐚜", 
    "𐚝", "𐚞", "𐚟", "𐚠", "𐚡", "𐚢", "𐚣", "𐚤", "𐚥", "𐚦", "𐚧", "𐚨", "𐚩", "𐚪", "𐚫", "𐚬", 
    "𐚭", "𐚮", "𐚯", "𐚰", "𐚱", "𐚲", "𐚳", "𐚴", "𐚵", "𐚶", "𐚷", "𐚸", "𐚹", "𐚺", "𐚻", "𐚼",
    "𐚽", "𐚾", "𐚿", "𐛀", "𐛁", "𐛂", "𐛃", "𐛄", "𐛅", "𐛆", "𐛇", "𐛈", "𐛉", "𐛊", "𐛋", "𐛌",
    "𐛍", "𐛎", "𐛏", "𐛐", "𐛑", "𐛒", "𐛓", "𐛔", "𐛕", "𐛖", "𐛗", "𐛘", "𐛙", "𐛚", "𐛛", "𐛜",
    "𐛝", "𐛞", "𐛟", "𐛠", "𐛡", "𐛢", "𐛣", "𐛤", "𐛥", "𐛦", "𐛧", "𐛨", "𐛩", "𐛪", "𐛫", 
    "𐛬", "𐛭", "𐛮", "𐛯", "𐛰", "𐛱", "𐛲", "𐛳", "𐛴", "𐛵", "𐛶", "𐛷", "𐛸", "𐛹", "𐛺", "𐛻", 
    "𐛼", "𐛽", "𐛾", "𐛿", "𐜀", "𐜁", "𐜂", "𐜃", "𐜄", "𐜅", "𐜆", "𐜇", "𐜈", "𐜉", "𐜊", "𐜋",
    "𐜌", "𐜍", "𐜎", "𐜏", "𐜐", "𐜑", "𐜒", "𐜓", "𐜔", "𐜕", "𐜖", "𐜗", "𐜘", "𐜙", "𐜚", "𐜛",
    "𐜜", "𐜝", "𐜞", "𐜟", "𐜠", "𐜡", "𐜢", "𐜣", "𐜤", "𐜥", "𐜦", "𐜧", "𐜨", "𐜩", "𐜪", "𐜫",
    "𐜬", "𐜭", "𐜮", "𐜯", "𐜰", "𐜱", "𐜲", "𐜳", "𐜴", "𐜵", "𐜶", "𐝀", "𐝁", "𐝂", "𐝃", 
    "𐝄", "𐝅", "𐝆", "𐝇", "𐝈", "𐝉", "𐝊", "𐝋", "𐝌", "𐝍", "𐝎", "𐝏", "𐝐", "𐝑", "𐝒", "𐝓", 
    "𐝔", "𐝕", "𐝠", "𐝡", "𐝢", "𐝣", "𐝤", "𐝥", "𐝦", "𐝧", "𐝬", "𐝭", "𐝮", "𐝯"]

Next we import all known words from Linear A into a list called `la_words`.

In [5]:
json_file = open('../Data/LinearAWords.json')
inscriptions = json.load(json_file)
la_words = []
for inscription in inscriptions:
    word_tags = inscription["tagsForWords"]

    for index, word_tag in enumerate(word_tags):
        tags = word_tag["tags"]
        if "word" not in tags:
            continue
        word = word_tag["word"].replace('\U0001076b', '')
        if len(word) == 1:
            continue
        la_words.append(word)
la_words = list(set(la_words))


Now we can create lists of unique bigrams in Linear A and the Phaistos disc.

In [6]:

def getNgrams(words, n):
    ngrams = []
    for word in words:
        bg = [word[i:i+n] for i in range(0, len(word) - (n-1))]
        ngrams.extend(bg)
    return ngrams

la_bigrams, pd_bigrams, pd_trigrams, la_trigrams = [], [], [], []
ngram_infos = [
    [la_bigrams, "bi", 2, la_words, "Linear A"],
    [pd_bigrams, "bi", 2, pd_words, "Phaistos Disc"],
]

for (ngram, prefix, n, words, name) in ngram_infos:
    ngram = getNgrams(words, n)
    print("\n" + name + ":")
    print("Unique " + prefix + "grams", len(set(ngram)), 
          "Total " + prefix + "grams", len(ngram))
    print("Unique symbols in " + prefix + "grams",
          len(set(list(it.chain.from_iterable(ngram)))))

la_bigrams = getNgrams(la_words, 2)
pd_bigrams = getNgrams(pd_words, 2)




Linear A:
Unique bigrams 1170 Total bigrams 2036
Unique symbols in bigrams 168

Phaistos Disc:
Unique bigrams 115 Total bigrams 180
Unique symbols in bigrams 45


With these we now have what we need to rerun Davis' analysis comparing the bigrams that appear in both Linear A and the disc.

Davis gives the homomorphs used for his analysis as follows:
![alt text](14homomorphs.png "Title")

We implement the same here:

In [7]:
# Brent Davis 2018 mapping
pd_la_davis_map = {
"𐇛": "𐘿",  
"𐇬": "𐙁",  
"𐇼": "𐘽", 
"𐇖": "𐘠",  
"𐇱": "𐘢",  
"𐇗": "𐘚",  
"𐇮": "𐙂",  
"𐇲": "𐘃",  
"𐇢": "𐘀",  
"𐇦": "𐘅",  
"𐇨": "𐙅",  
"𐇥": "𐘞",  
"𐇟": "𐘸", 
"𐇳": "𐘝",  
}

df = pd.DataFrame([pd_la_davis_map.keys()
                  , pd_la_davis_map.values()
                  ])
df.set_axis(["Phaistos Disc", "Linear A"], axis='index', inplace=True)
df.style.set_caption("Davis 2018 Mapping of Linear A and PD Symbols").set_table_styles(styles)
display(df)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
Phaistos Disc,𐇛,𐇬,𐇼,𐇖,𐇱,𐇗,𐇮,𐇲,𐇢,𐇦,𐇨,𐇥,𐇟,𐇳
Linear A,𐘿,𐙁,𐘽,𐘠,𐘢,𐘚,𐙂,𐘃,𐘀,𐘅,𐙅,𐘞,𐘸,𐘝


Now we see if we can get the same number of bigrams consisting of these syllabograms as Davis in the disc:

In [8]:


# Use the provisional PD to LA mapping above to find common bigrams between LA and the Disc
pd_inscription_as_la = list(map(lambda x: pd_la_davis_map[x] if x in pd_la_davis_map else x, pd_inscription))
pd_inscription_as_la_words = ''.join(pd_inscription_as_la).split('|')
pd_la_bigrams = getNgrams(pd_inscription_as_la_words,2)
pd_bigrams_both = set([bg for bg in pd_la_bigrams if all(g in pd_la_davis_map.values() for g in bg)])
#print(str(len(pd_bigrams_both)) + " bigrams", sorted(pd_bigrams_both))

pd_la_davis_map_r = {y:x for x,y in  pd_la_davis_map.items()}
df = pd.DataFrame([pd_bigrams_both,
                  [pd_la_davis_map_r[x[:1]] + pd_la_davis_map_r[x[-1:]] for x in pd_bigrams_both]],
                  columns=[i+1 for i,p in enumerate(pd_bigrams_both)])
df.set_axis(['Linear A Bigrams', 'Disc Bigrams'], axis='index', inplace=True)
df.style.set_caption("Linear A and Phaistos Disc Bigrams").set_table_styles(styles)


Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
Linear A Bigrams,𐘠𐘿,𐘿𐙂,𐘠𐘅,𐘅𐙅,𐘞𐙅,𐘠𐘚,𐘠𐘽,𐙁𐘝,𐘃𐘅,𐘢𐘅,𐘞𐙁,𐘅𐘠,𐘽𐘠,𐘅𐘀,𐘝𐘠,𐘿𐘞,𐙁𐘚,𐙁𐘢,𐙁𐙁,𐘸𐘅,𐘠𐘃,𐙁𐘽,𐘀𐘃
Disc Bigrams,𐇖𐇛,𐇛𐇮,𐇖𐇦,𐇦𐇨,𐇥𐇨,𐇖𐇗,𐇖𐇼,𐇬𐇳,𐇲𐇦,𐇱𐇦,𐇥𐇬,𐇦𐇖,𐇼𐇖,𐇦𐇢,𐇳𐇖,𐇛𐇥,𐇬𐇗,𐇬𐇱,𐇬𐇬,𐇟𐇦,𐇖𐇲,𐇬𐇼,𐇢𐇲


This matches the 23 bigrams given in Table 40 by Davis:
    
![alt text](23pairs.png "Title")


Now we can count the number of pairs that also occur in the Linear A corpus.

In [9]:

bg_both = sorted([(bg, la_bigrams.count(bg), pd_la_bigrams.count(bg)) 
                  for bg in pd_bigrams_both & set(la_bigrams)])

df = pd.DataFrame([[b for a,b,c in bg_both], [c for a,b,c in bg_both]],
                  columns=[a for a,b,c in bg_both])
df = df.set_axis(['Occurences in Linear A', 'Occurences in Disc'], axis='index')
df.style.set_table_styles(styles).set_caption("%i Bigrams that Appear in Both Linear A and Phaistos Disc" % len(bg_both))



Unnamed: 0,𐘀𐘃,𐘅𐘀,𐘅𐘠,𐘝𐘠,𐘞𐙁,𐘠𐘃,𐘠𐘅,𐘠𐘽,𐘢𐘅,𐘸𐘅,𐘽𐘠,𐘿𐙂,𐙁𐘚,𐙁𐘢,𐙁𐘽,𐙁𐙁
Occurences in Linear A,2,1,4,1,3,1,2,2,2,2,1,1,5,1,1,1
Occurences in Disc,2,1,1,3,2,1,1,1,2,1,6,3,1,2,2,1


We find only 16 instances of Disc bigrams appearing in Linear A. This is one less than found by Davis. Our output also gives the number of occurences of the bigrams in each of Linear A and the Disc, both as a total for all bigrams and for each bigram individually. So for '𐘸𐘅' we find that it occurs twice in Linear A and once on the Phaistos Disc, i.e.: ('𐘸𐘅', 2, 1).




## Reviewing the Results

Let's take a look at bigram we are missing compared to Davis 2018:

In [10]:
bg_pd_only = pd_bigrams_both - set(la_bigrams)
bg_pd_only = sorted([(bg, pd_la_bigrams.count(bg)) 
                  for bg in bg_pd_only])

df = pd.DataFrame([[b for a,b in bg_pd_only]],
                  columns=[a for a,b in bg_pd_only])
df = df.set_axis(["Occurences"], axis='index')
df.style.set_table_styles(styles).set_caption("Mapped bigrams that don't appear in Linear A")



Unnamed: 0,𐘃𐘅,𐘅𐙅,𐘞𐙅,𐘠𐘚,𐘠𐘿,𐘿𐘞,𐙁𐘝
Occurences,1,2,1,3,1,1,3


We can compare this with the table from (Davis 2018):

![alt text](17bigrams.png "Title")

The difference is the bigram: '𐘠𐘚', (when transliterated: TI-I). '𐘠𐘚' (TI-I) does not actually appear in Linear A. Where the two syllabograms are adjacent they are not word-internal, i.e. they are in adjacent words rather than the same word:

![alt text](PKZa11.png "Title")
![alt text](PYRWc4.png "Title")


Davis' probable source for the identification is a variation of 𐘚(TI) which is 𐘛(*28B). We find a single instance of '𐘠𐘛' (TI-*28B) in the Linear A corpus, in ZA6b:

![alt text](ZA6b.png "Title")

It is not clear to me if it is valid to treat (TI-*28B) as the equivalent of (TI-I). If it is not, then the number of matching bi-grams between the Phaistos disc and the Linear A corpus must be revised down to 16. This no longer falls within the region of statistical significance, which Davis identifies as 16.4 or above.



## Experimenting with Different Mappings
In this section we'll experiment with an expanded set of homomorphic mappings in the syllabograms of Linear A and the Phaistos disc and see if improves or changes the result of 17/23 observed by Davis. 


In [11]:

def runExperimentalMapping(exp_map):
    pd_inscription_as_la = list(map(lambda x: exp_map[x]
                                    if x in exp_map else x, pd_inscription))
    pd_inscription_as_la_words = ''.join(pd_inscription_as_la).split('|')
    pd_la_bigrams = getNgrams(pd_inscription_as_la_words,2)

    pd_bigrams_both = set([bg for bg in pd_la_bigrams
                           if all(g in exp_map.values() for g in bg)])
    

    bg_both = sorted([(bg, la_bigrams.count(bg), pd_la_bigrams.count(bg)) 
                      for bg in pd_bigrams_both & set(la_bigrams)])
    return (pd_bigrams_both, bg_both)
 
def displayExperimentalMappingResults(exp_both, exp_bigrams_both, exp_map):
    showDifferencesBetweenMappings(pd_la_davis_map, exp_map)
    
    exp_map_r = {y:x for x,y in exp_map.items()}
    df = pd.DataFrame([exp_bigrams_both,
                      [exp_map_r[x[:1]] + exp_map_r[x[-1:]] for x in exp_bigrams_both]],
                      columns=[i+1 for i,p in enumerate(exp_bigrams_both)])
    
    df = df.set_axis(['Linear A Bigrams', 'Disc Bigrams'], axis='index')
    df = (df.style.set_caption("The %d Hypothetical Phaistos Disc Bigrams Along" 
                               " With Their Hypothetical Linear A Counterparts" % len(exp_bigrams_both))
            .set_table_styles(styles))
    display(df)

    df = pd.DataFrame([[b for a,b,c in exp_both] + [sum([b for a,b,c in exp_both])],
                    [c for a,b,c in exp_both] + [sum([c for a,b,c in exp_both])]],
                    columns=[a for a,b,c in exp_both] + ["Total"])
    df = df.set_axis(['Occurences in Linear A', 'Occurences in Disc'], axis='index')
    df = (df.style.set_caption("The %d bigrams that actually appear in Linear A" % len(exp_both))
            .set_table_styles(styles))
    display(df)
    
    bg_pd_only = exp_bigrams_both - set(la_bigrams)
    bg_pd_only = sorted([(bg, pd_la_bigrams.count(bg)) 
                      for bg in bg_pd_only])
    
    df = pd.DataFrame([[b for a,b in bg_pd_only]],
                  columns=[a for a,b in bg_pd_only])
    df = df.set_axis(["Occurences in the Phaistos Disc"], axis='index')
    df = (df.style.set_caption("The %d bigrams that don't appear in Linear A"
                               % (len(exp_bigrams_both) - len(bg_both)))
            .set_table_styles(styles))
    display(df)

def showDifferencesBetweenMappings(map1, map2):
    row_index = set([k for k in map1] + [k for k in map2])
    row_index = [a for a in row_index 
                 if a not in map2 or a not in map1 or
                           map2[a] != map1[a]]

    df = pd.DataFrame([[map1[a] if a in map1 else "None"  
                       for a in row_index],
                       [map2[a] if a in map2 else "None"
                        for a in row_index]]
                      , columns=row_index)
    df = df.set_axis(["Davis", "Ours"], axis='index')
    df = df.style.set_caption("Differences between Davis and our mapping").set_table_styles(styles)
    display(df)


In the first instance we'll expand our mapping to include some additional glyphs and alter some others. The differences are given in the table below.


In [12]:
pd_la_hogan_map = {
"𐇑": "𐘚",
"𐇛": "𐘾",
"𐇬": "𐙁",
"𐇼": "𐘽",
"𐇖": "𐘠",  
"𐇱": "𐘢",
"𐇮": "𐙂",
"𐇲": "𐘃",
"𐇢": "𐘀",
"𐇦": "𐘅",
"𐇥": "𐘞",
"𐇟": "𐘸",
"𐇳": "𐘝",
"𐇶": "𐘇",
"𐇭": "𐘏", 
"𐇝": "𐘳",
"𐇨": "𐙅",  
"𐇪": "𐙒",
"𐇤": "𐘱",
"𐇫": "𐙆",
"𐇧": "𐘦",  
"𐇚": "𐘭",
}

df = pd.DataFrame([pd_la_hogan_map.keys()
                  , pd_la_hogan_map.values()
                  ])
df = df.set_axis(["Phaistos Disc", "Linear A"], axis='index')
df = df.style.set_caption("Hypothetical Revised Mapping of Linear A and PD Symbols").set_table_styles(styles)
display(df)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
Phaistos Disc,𐇑,𐇛,𐇬,𐇼,𐇖,𐇱,𐇮,𐇲,𐇢,𐇦,𐇥,𐇟,𐇳,𐇶,𐇭,𐇝,𐇨,𐇪,𐇤,𐇫,𐇧,𐇚
Linear A,𐘚,𐘾,𐙁,𐘽,𐘠,𐘢,𐙂,𐘃,𐘀,𐘅,𐘞,𐘸,𐘝,𐘇,𐘏,𐘳,𐙅,𐙒,𐘱,𐙆,𐘦,𐘭


Let's try this mapping:

In [13]:
pd_bigrams_both, bg_both = runExperimentalMapping(pd_la_hogan_map)
displayExperimentalMappingResults(bg_both, pd_bigrams_both, pd_la_hogan_map)

Unnamed: 0,𐇑,𐇤,𐇭,𐇫,𐇗,𐇪,𐇶,𐇧,𐇚,𐇛,𐇝
Davis,,,,,𐘚,,,,,𐘿,
Ours,𐘚,𐘱,𐘏,𐙆,,𐙒,𐘇,𐘦,𐘭,𐘾,𐘳


Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37
Linear A Bigrams,𐘠𐘾,𐘦𐘦,𐙁𐘦,𐘠𐘅,𐘅𐙅,𐘞𐙅,𐘾𐙂,𐘠𐘽,𐙒𐙒,𐘠𐘦,𐙁𐘝,𐘃𐘅,𐙒𐘳,𐙒𐘢,𐘢𐘅,𐙅𐙒,𐘞𐙁,𐘅𐘠,𐘽𐘠,𐘾𐙒,𐙒𐘃,𐘅𐘀,𐘝𐘠,𐘚𐘾,𐙒𐙅,𐙁𐘢,𐙁𐙁,𐘾𐘞,𐘸𐘳,𐘚𐙒,𐘸𐘅,𐘏𐘇,𐘠𐘃,𐙁𐘽,𐙒𐘽,𐘇𐘭,𐘀𐘃
Disc Bigrams,𐇖𐇛,𐇧𐇧,𐇬𐇧,𐇖𐇦,𐇦𐇨,𐇥𐇨,𐇛𐇮,𐇖𐇼,𐇪𐇪,𐇖𐇧,𐇬𐇳,𐇲𐇦,𐇪𐇝,𐇪𐇱,𐇱𐇦,𐇨𐇪,𐇥𐇬,𐇦𐇖,𐇼𐇖,𐇛𐇪,𐇪𐇲,𐇦𐇢,𐇳𐇖,𐇑𐇛,𐇪𐇨,𐇬𐇱,𐇬𐇬,𐇛𐇥,𐇟𐇝,𐇑𐇪,𐇟𐇦,𐇭𐇶,𐇖𐇲,𐇬𐇼,𐇪𐇼,𐇶𐇚,𐇢𐇲


Unnamed: 0,𐘀𐘃,𐘅𐘀,𐘅𐘠,𐘇𐘭,𐘚𐘾,𐘝𐘠,𐘞𐙁,𐘠𐘃,𐘠𐘅,𐘠𐘽,𐘠𐘾,𐘢𐘅,𐘸𐘅,𐘸𐘳,𐘽𐘠,𐘾𐘞,𐘾𐙂,𐙁𐘢,𐙁𐘽,𐙁𐙁,Total
Occurences in Linear A,2,1,4,5,2,1,3,1,2,2,2,2,2,6,1,2,3,1,1,1,44
Occurences in Disc,2,1,1,1,13,3,2,1,1,1,1,2,1,1,6,1,3,2,2,1,46


Unnamed: 0,𐘃𐘅,𐘅𐙅,𐘏𐘇,𐘚𐙒,𐘞𐙅,𐘠𐘦,𐘦𐘦,𐘾𐙒,𐙁𐘝,𐙁𐘦,𐙅𐙒,𐙒𐘃,𐙒𐘢,𐙒𐘳,𐙒𐘽,𐙒𐙅,𐙒𐙒
Occurences in the Phaistos Disc,1,2,0,0,1,0,0,0,3,0,0,0,0,0,0,0,0


We get 37 possible bigrams, of which 20 actually appear in Linear A. Of the 17 that do not appear in Linear A, only 4 occur in the Phaistos disc. A poor result. When we inspect the bigrams that don't appear in Linear A we can see that 5 syllabograms in particular don't produce any result at at all. If we remove these as a bad lot and rerun the analysis again we get a much better result:

In [14]:
del pd_la_hogan_map["𐇪"]
del pd_la_hogan_map["𐇨"]
del pd_la_hogan_map["𐇭"]
del pd_la_hogan_map["𐇤"]
del pd_la_hogan_map["𐇧"]

df = pd.DataFrame([pd_la_hogan_map.keys()
                  , pd_la_hogan_map.values()
                  ])
df = df.set_axis(["Phaistos Disc", "Linear A"], axis='index')
df = df.style.set_caption("Hypothetical Revised Mapping of Linear A and PD Symbols").set_table_styles(styles)
display(df)



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
Phaistos Disc,𐇑,𐇛,𐇬,𐇼,𐇖,𐇱,𐇮,𐇲,𐇢,𐇦,𐇥,𐇟,𐇳,𐇶,𐇝,𐇫,𐇚
Linear A,𐘚,𐘾,𐙁,𐘽,𐘠,𐘢,𐙂,𐘃,𐘀,𐘅,𐘞,𐘸,𐘝,𐘇,𐘳,𐙆,𐘭


In [15]:
pd_bigrams_both, bg_both = runExperimentalMapping(pd_la_hogan_map)
displayExperimentalMappingResults(bg_both, pd_bigrams_both, pd_la_hogan_map)

Unnamed: 0,𐇑,𐇫,𐇗,𐇶,𐇚,𐇛,𐇝,𐇨
Davis,,,𐘚,,,𐘿,,𐙅
Ours,𐘚,𐙆,,𐘇,𐘭,𐘾,𐘳,


Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
Linear A Bigrams,𐘠𐘾,𐘠𐘅,𐘾𐙂,𐘠𐘽,𐙁𐘝,𐘃𐘅,𐘢𐘅,𐘞𐙁,𐘅𐘠,𐘽𐘠,𐘅𐘀,𐘝𐘠,𐘚𐘾,𐙁𐘢,𐙁𐙁,𐘾𐘞,𐘸𐘳,𐘸𐘅,𐘠𐘃,𐙁𐘽,𐘇𐘭,𐘀𐘃
Disc Bigrams,𐇖𐇛,𐇖𐇦,𐇛𐇮,𐇖𐇼,𐇬𐇳,𐇲𐇦,𐇱𐇦,𐇥𐇬,𐇦𐇖,𐇼𐇖,𐇦𐇢,𐇳𐇖,𐇑𐇛,𐇬𐇱,𐇬𐇬,𐇛𐇥,𐇟𐇝,𐇟𐇦,𐇖𐇲,𐇬𐇼,𐇶𐇚,𐇢𐇲


Unnamed: 0,𐘀𐘃,𐘅𐘀,𐘅𐘠,𐘇𐘭,𐘚𐘾,𐘝𐘠,𐘞𐙁,𐘠𐘃,𐘠𐘅,𐘠𐘽,𐘠𐘾,𐘢𐘅,𐘸𐘅,𐘸𐘳,𐘽𐘠,𐘾𐘞,𐘾𐙂,𐙁𐘢,𐙁𐘽,𐙁𐙁,Total
Occurences in Linear A,2,1,4,5,2,1,3,1,2,2,2,2,2,6,1,2,3,1,1,1,44
Occurences in Disc,2,1,1,1,13,3,2,1,1,1,1,2,1,1,6,1,3,2,2,1,46


Unnamed: 0,𐘃𐘅,𐙁𐘝
Occurences in the Phaistos Disc,1,3


Now we have 20 mappings found in Linear A of a possible 22. This suggests our proposed modification to the mapping is better than Davis'.

## Calculate the Statistical Significance


### Recreate the Statistical Significance Results

To confirm we have a valid, improved mapping on Davis 2018 we'll first recreate the statistical significance results from Davis' paper before applying the same method to our mapping. We do this by running a million different random permutations of Davis' 14 proposed homomorphs and chart the results in the same way as Davis (2018).

In [16]:
import itertools as it
import numpy as np

def getStatSignificance(list1, list2):
    buckets = {}
    for c in range(0, 1000000):
        n_map = {k:v for k,v in zip(np.random.permutation(list2), np.random.permutation(list1))}
        pd_bigrams_both, bg_both = runExperimentalMapping(n_map)
        l = len(bg_both)
        if l in buckets:
            buckets[l] = buckets[l] + 1
        else:
            buckets[l] = 1
        if c % 50000 > 0:
            continue
    return buckets



In [17]:
dlist1 = list(pd_la_davis_map.values())
dlist2 = list(pd_la_davis_map.keys())

davis_results = getStatSignificance(dlist1, dlist2)
print(davis_results)

{8: 88127, 14: 70622, 11: 156755, 9: 122839, 12: 142218, 15: 37664, 10: 149954, 7: 53821, 13: 109614, 16: 16220, 17: 5535, 6: 27611, 5: 11942, 4: 4059, 18: 1477, 3: 1028, 19: 263, 21: 3, 2: 182, 1: 30, 20: 35, 0: 1}


In [18]:
from numpy import mean


def printStatSigResult(results, score):
    table = sorted([
                 (k, 
                  '{:,}'.format(v),
                  "{:.4%}".format(v / sum(results.values()))
                 )
                 for k,v in results.items()
             ], key=lambda x: x[0])
    df = pd.DataFrame(table,
                    columns=["Score out of 23", "Permutations with that score", "% of Permutations"])
    df = df.style.hide_index().set_caption("Syllabotactic similarity scores produced by 1,000,000 "
        "different random associations between the 14 PD and LA signs").set_table_styles(styles)
    display(df)

    l = list(it.chain.from_iterable([[k] * v for k,v in results.items()]))
    sd = np.std(l)
    avg = mean(l)
    print("Average Score : " + "{:.4}".format(avg))
    print("Standard Deviation : " + "{:.4}".format(sd))
    print("Average Score + 2 standard deviations: " + "{:.4}".format(avg + (sd*2)))

    p_val = sum([
                 (v / sum(results.values()))
                 for k,v in results.items()
                 if k > (score - 1)
                 ])

    print("P Value " + "{:.4}".format(p_val))

printStatSigResult(davis_results, 17)

Score out of 23,Permutations with that score,% of Permutations
0,1,0.0001%
1,30,0.0030%
2,182,0.0182%
3,1028,0.1028%
4,4059,0.4059%
5,11942,1.1942%
6,27611,2.7611%
7,53821,5.3821%
8,88127,8.8127%
9,122839,12.2839%


Average Score : 10.73
Standard Deviation : 2.478
Average Score + 2 standard deviations: 15.68
P Value 0.007313


We compare this result with Davis' findings and they are similar, in fact they are slightly better:
![alt text](Chart.png "Title")


Now we apply the same procedure to our own mapping.

In [None]:
hlist1 = list(pd_la_hogan_map.values())
hlist2 = list(pd_la_hogan_map.keys())

hogan_results = getStatSignificance(hlist1, hlist2)


In [None]:
printStatSigResult(hogan_results, 20)

Our p-value is much lower, suggesting that we have a superior mapping to that proposed by Davis (2018).

# Comparing Word-End Syllabograms
Let's compare glyphs that appear at the end of words in Linear A and the Disc.

In [None]:
syllables = {
'𐘀': 'DA', '𐘁': 'RO', '𐘂': 'PA', '𐘃': 'TE', '𐘄': 'TO', '𐘅': 'NA', 
'𐘆': 'DI', '𐘇': 'A', '𐘈': 'SE', '𐘉': 'U', '𐘊': 'PO', '𐘋': 'ME', 
'𐘌': 'QA', '𐘍': 'ZA', '𐘎': 'ZO', '𐘏': 'QI', '𐘕': 'MU', '𐘗': 'NE',
'𐘘': 'RU', '𐘙': 'RE', '𐘚': 'I', '𐘜': 'PU₂', '𐘝': 'NI', '𐘞': 'SA', 
'𐘠': 'TI', '𐘡': 'E', '𐘢': 'PI', '𐘣': 'WI', '𐘤': 'SI', '𐘥': 'KE',
'𐘦': 'DE', '𐘧': 'JE', '𐘩': 'NWA', '𐘫': 'PU', '𐘬': 'DU', '𐘭': 'RI',
'𐘮': 'WA', '𐘯': 'NU', '𐘰': 'PA₂', '𐘱': 'JA', '𐘲': 'SU', '𐘳': 'TA', 
'𐘴': 'RA', '𐘵': 'O', '𐘶': 'JU', '𐘷': 'TA₂', '𐘸': 'KI', '𐘹': 'TU', 
'𐘺': 'KO', '𐘻': 'MI', '𐘼': 'ZE', '𐘽': 'RA₂', '𐘾': 'KA', '𐘿': 'QE', 
'𐙁': 'MA', '𐙂': 'KU', '𐙄': 'AU', '𐙆': 'TWE', '𐙀': 'ZU'
}

vowels = {
'𐘇': 'A',
'𐘡': 'E',
'𐘚': 'I',
'𐘵': 'O',
'𐘉': 'U', 
'𐙄': 'AU',
}


Let's find the most common last syllabograms in Linear A words:

In [None]:
import collections

la_last_letters = { l[-1:]: len([w for w in la_words if w[-1:] == l[-1:]]) 
                   for l in la_words if len(l) > 1 and l[-1:] in syllables}
# Sort highest to top
la_last_letters = sorted(la_last_letters.items(), key=lambda x:x[1], reverse=True)
la_last_letters = collections.OrderedDict(la_last_letters)

r = {key: rank for rank, key in enumerate(sorted(set(la_last_letters.values()), reverse=True), 1)}
la_last_letters_ranked = {k: r[v] for k,v in la_last_letters.items()}

df = pd.DataFrame([[b for a,b in la_last_letters.items()],
                   [b for a,b in la_last_letters_ranked.items()]],
                columns=[a for a,b in la_last_letters.items()])
df = df.set_axis(['Occurrences', 'Ranking'], axis='index')
df = df.style.set_caption("Most Common Word-End Syllabograms in Linear A").set_table_styles(styles)
display(df)




And do the same for the disc:

In [None]:
pd_last_letters = { l[-1:]: len([w for w in pd_words if w[-1:] == l[-1:]]) 
                   for l in pd_words if len(l) > 1}
# Sort highest to top
pd_last_letters = sorted(pd_last_letters.items(), key=lambda x:x[1], reverse=True)
pd_last_letters = collections.OrderedDict(pd_last_letters)

r = {key: rank for rank, key in enumerate(sorted(set(pd_last_letters.values()), reverse=True), 1)}
pd_last_letters_ranked = {k: r[v] for k,v in pd_last_letters.items()}

df = pd.DataFrame([[b for a,b in pd_last_letters.items()],
                   [b for a,b in pd_last_letters_ranked.items()]],
                columns=[a for a,b in pd_last_letters.items()])
df = df.set_axis(['Occurrences', 'Ranking'], axis='index')
df = df.style.set_caption("Most Common Word-End Syllabograms in PD (By Occurrence)").set_table_styles(styles)
display(df)


In [None]:
pd_la_full_map = {
"𐇑": "𐘚",
"𐇛": "𐘾",
"𐇬": "𐙁",
"𐇼": "𐘽",
"𐇖": "𐘠",  
"𐇱": "𐘢",
"𐇮": "𐙂",
"𐇲": "𐘃",
"𐇢": "𐘀",
"𐇦": "𐘅",
"𐇥": "𐘞",
"𐇟": "𐘸",
"𐇳": "𐘝",
"𐇶": "𐘙",
"𐇭": "𐘏", 
"𐇝": "𐘳",
"𐇨": "𐙅",  
"𐇪": "𐙒",
"𐇤": "𐘱",
"𐇫": "𐙆",
"𐇧": "𐘦",  
}

n_ranking_comp = sorted([
                     (k, 
                      pd_last_letters_ranked[k], 
                      pd_la_full_map[k], 
                      max(1, int((la_last_letters_ranked[pd_la_full_map[k]] 
                                  / len(la_last_letters_ranked)) 
                                 * max([c for b,c in pd_last_letters_ranked.items()])))
                     )
                     for k,v in pd_la_full_map.items() if k in pd_last_letters and v in la_last_letters
                 ], key=lambda x: abs(x[1] - x[3]))

df = pd.DataFrame(n_ranking_comp,
                columns=["PD Glyph", "PD Ranking", "LA Glyph", "LA Ranking"])
df = df.style.hide_index().set_caption("Normalized Ranking").set_table_styles(styles)
display(df)


## Compare Word-Initial Syllabograms

Let's find the most common last syllabograms in Linear A words:

In [None]:
import collections

la_first_letters = { l[:1]: len([w for w in la_words if w[:1] == l[:1]]) 
                   for l in la_words if len(l) > 1 and l[:1] in syllables}
# Sort highest to top
la_first_letters = sorted(la_first_letters.items(), key=lambda x:x[1], reverse=True)
la_first_letters = collections.OrderedDict(la_first_letters)

r = {key: rank for rank, key in enumerate(sorted(set(la_first_letters.values()), reverse=True), 1)}
la_first_letters_ranked = {k: r[v] for k,v in la_first_letters.items()}

df = pd.DataFrame([[b for a,b in la_first_letters.items()],
                   [b for a,b in la_first_letters_ranked.items()]],
                columns=[a for a,b in la_first_letters.items()])
df = df.set_axis(['Occurrences', 'Ranking'], axis='index')
df = df.style.set_caption("Most Common Word-Initial Syllabograms in Linear A (By Occurrence)").set_table_styles(styles)
display(df)


And do the same for the disc:

In [None]:
pd_first_letters = { l[:1]: len([w for w in pd_words if w[:1] == l[:1]]) 
                   for l in pd_words if len(l) > 1}
# Sort highest to top
pd_first_letters = sorted(pd_first_letters.items(), key=lambda x:x[1], reverse=True)
pd_first_letters = collections.OrderedDict(pd_first_letters)

r = {key: rank for rank, key in enumerate(sorted(set(pd_first_letters.values()), reverse=True), 1)}
pd_first_letters_ranked = {k: r[v] for k,v in pd_first_letters.items()}

df = pd.DataFrame([[b for a,b in pd_first_letters.items()],
                   [b for a,b in pd_first_letters_ranked.items()]],
                columns=[a for a,b in pd_first_letters.items()])
df = df.set_axis(['Occurrences', 'Ranking'], axis='index')
df = df.style.set_caption("Most Common Word-Initial Syllabograms in PD (By Occurrence)").set_table_styles(styles)
display(df)


In [None]:
"""
ranking_comp = [(k, pd_first_letters_ranked[k], pd_la_full_map[k], la_first_letters_ranked[pd_la_full_map[k]])
                 for k,v in pd_la_full_map.items() if k in pd_first_letters and v in la_first_letters]

df = pd.DataFrame(ranking_comp,
                columns=["PD Glyph", "PD Ranking", "LA Glyph", "LA Ranking"])
df = df.style.hide_index().set_caption("Raw Ranking").set_table_styles(styles)
display(df)
"""

n_ranking_comp = sorted([
                     (k, 
                      pd_first_letters_ranked[k], 
                      pd_la_full_map[k], 
                      max(1, int((la_first_letters_ranked[pd_la_full_map[k]] 
                                  / len(la_first_letters_ranked)) 
                                 * max([c for b,c in pd_first_letters_ranked.items()])))
                     )
                     for k,v in pd_la_full_map.items() if k in pd_first_letters and v in la_first_letters
                 ], key=lambda x: abs(x[1] - x[3]))

df = pd.DataFrame(n_ranking_comp,
                columns=["PD Glyph", "PD Ranking", "LA Glyph", "LA Ranking"])
df = df.style.hide_index().set_caption("Normalized Ranking").set_table_styles(styles)
display(df)


## Examining Potential Illegal Combinations

In [None]:
syllables = {
'𐘀': 'DA', '𐘁': 'RO', '𐘂': 'PA', '𐘃': 'TE', '𐘄': 'TO', '𐘅': 'NA', 
'𐘆': 'DI', '𐘇': 'A', '𐘈': 'SE', '𐘉': 'U', '𐘊': 'PO', '𐘋': 'ME', 
'𐘌': 'QA', '𐘍': 'ZA', '𐘎': 'ZO', '𐘏': 'QI', '𐘕': 'MU', '𐘗': 'NE',
'𐘘': 'RU', '𐘙': 'RE', '𐘚': 'I', '𐘜': 'PU₂', '𐘝': 'NI', '𐘞': 'SA', 
'𐘠': 'TI', '𐘡': 'E', '𐘢': 'PI', '𐘣': 'WI', '𐘤': 'SI', '𐘥': 'KE',
'𐘦': 'DE', '𐘧': 'JE', '𐘩': 'NWA', '𐘫': 'PU', '𐘬': 'DU', '𐘭': 'RI',
'𐘮': 'WA', '𐘯': 'NU', '𐘰': 'PA₂', '𐘱': 'JA', '𐘲': 'SU', '𐘳': 'TA', 
'𐘴': 'RA', '𐘵': 'O', '𐘶': 'JU', '𐘷': 'TA₂', '𐘸': 'KI', '𐘹': 'TU', 
'𐘺': 'KO', '𐘻': 'MI', '𐘼': 'ZE', '𐘽': 'RA₂', '𐘾': 'KA', '𐘿': 'QE', 
'𐙁': 'MA', '𐙂': 'KU', '𐙄': 'AU', '𐙆': 'TWE', '𐙀': 'ZU'
}

vowels = {
'𐘇': 'A',
'𐘡': 'E',
'𐘚': 'I',
'𐘵': 'O',
'𐘉': 'U', 
'𐙄': 'AU',
}


In [None]:
v_bgs = [(syllables[l[:1]][1:2] if l[:1] in syllables else "?") 
             + (syllables[l[1:2]] if l[1:2] in syllables else "?")
        for l in la_bigrams if l[1:2] in vowels]
legal_vowel_combos_la = sorted(set([(bg, v_bgs.count(bg)) for bg in v_bgs]), key=lambda x:x[1], reverse=True)

df = pd.DataFrame([[b for a,b in legal_vowel_combos_la]],
                columns=[a for a,b in legal_vowel_combos_la])
df = df.set_axis(['Ranking'], axis='index')
df = df.style.set_caption("Adjoining Vowels in Linear A").set_table_styles(styles)
display(df)


In [None]:
v_bgs = [(syllables[l[:1]][1:2] if l[:1] in syllables else "?") 
             + (syllables[l[1:2]] if l[1:2] in syllables else "?")
        for l in pd_bigrams_both if l[1:2] in vowels]

legal_vowel_combos_pd = sorted(set([(bg, v_bgs.count(bg)) for bg in v_bgs]), key=lambda x:x[1], reverse=True)

df = pd.DataFrame([[b for a,b in legal_vowel_combos_pd]],
                columns=[a for a,b in legal_vowel_combos_pd])
df = df.set_axis(['Ranking'], axis='index')
df = df.style.set_caption("Adjoining Vowels in PD").set_table_styles(styles)
display(df)


In [None]:
possible_vowel_combos = [a+b for a,b in list(it.product(vowels.values(),vowels.values()))]
vowel_combos_in_la = [a for a,b in legal_vowel_combos_la]
vowel_combos_not_in_la = [a for a in possible_vowel_combos if a not in vowel_combos_in_la]
