### Comparing semantic similarity between Penner's English translation of Suárez' *De Anima, Disputation 12, Question 2* and Haldane's English translation of Descartes' *Meditationes* with respect to characterisation of term *soul*  

**1. Comparing set of sentences**

In [4]:
import pandas as pd
import re
import spacy

In [5]:
DeAnima_Soul = open('DeAnima_soul.txt', encoding="utf8")
sample_DeAnima_Soul = DeAnima_Soul.read()

from spacy.lang.en import English

raw_text_1 = sample_DeAnima_Soul
nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))
doc_DeAnima_soul = nlp(raw_text_1)
sentences = [sent.string.strip() for sent in doc_DeAnima_soul.sents]

sentences

['And, of course, having supposed that they are really distinguished from\nthe essence [of the soul], it is much more probable that they are also \n[really] distinguished from each other.',
 'Sixth, because our soul is the form of the body.',
 'Therefore, just as other forms\ndepend on the heavens in their operations and follow inﬂuence from it,\nso also our soul and will.',
 'Seventh, will and sensitive appetite are powers rooted in the same\nsoul.',
 'For it is the same soul desiring in either case.',
 'Moreover, the\nsame soul cannot at the same time desire contraries.',
 'All the ancient pagans who asserted that our soul is material\nand mortal especially erred in this question.',
 'Just as they posited two gods, one\nthe principle of goods, the other of bads, so also they posited two souls\nin us, one which necessitates to good, the other to bad.',
 'For these powers are spiritual and of a higher order, for although our soul is the \nform of the body it, nevertheless, is not wholl

In [6]:
sentences = [item.replace('\n', " ") for item in sentences]
print (sentences)

['And, of course, having supposed that they are really distinguished from the essence [of the soul], it is much more probable that they are also  [really] distinguished from each other.', 'Sixth, because our soul is the form of the body.', 'Therefore, just as other forms depend on the heavens in their operations and follow inﬂuence from it, so also our soul and will.', 'Seventh, will and sensitive appetite are powers rooted in the same soul.', 'For it is the same soul desiring in either case.', 'Moreover, the same soul cannot at the same time desire contraries.', 'All the ancient pagans who asserted that our soul is material and mortal especially erred in this question.', 'Just as they posited two gods, one the principle of goods, the other of bads, so also they posited two souls in us, one which necessitates to good, the other to bad.', 'For these powers are spiritual and of a higher order, for although our soul is the  form of the body it, nevertheless, is not wholly immersed in the 

In [7]:
Meditationes_Soul = open('Meditationes_soul.txt', encoding="utf8")
sample_Meditationes_Soul = Meditationes_Soul.read()

from spacy.lang.en import English

raw_text_2 = sample_Meditationes_Soul
nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))
doc_Meditationes_soul = nlp(raw_text_2)
sentences = [sent.string.strip() for sent in doc_Meditationes_soul.sents]

sentences

['In addition to this I considered that I was nourished, that I walked, that I\nfelt, and that I thought, and I referred all these actions to\nthe soul:  but I did not stop to consider what the soul was,\nor if I did stop, I imagined that it was something extremely\nrare and subtle like a wind, a flame, or an ether, which was\nspread throughout my grosser parts.',
 'Let us pass to the attributes of soul and see if there is any one which is in me?',
 'I do not now admit anything which is not necessarily true:  to\nspeak accurately I am not more than a thing which thinks, that\nis to say a mind or a soul, or an understanding, or a reason,\nwhich are terms whose significance was formerly unknown to me.',
 'And although possibly (or rather certainly, as I shall say in a\nmoment) I possess a body with which I am very intimately\nconjoined, yet because, on the one side, I have a clear and\ndistinct idea of myself inasmuch as I am only a thinking and\nunextended thing, and as, on the other, I

In [8]:
sentences = [item.replace('\n', " ") for item in sentences]
print (sentences)

['In addition to this I considered that I was nourished, that I walked, that I felt, and that I thought, and I referred all these actions to the soul:  but I did not stop to consider what the soul was, or if I did stop, I imagined that it was something extremely rare and subtle like a wind, a flame, or an ether, which was spread throughout my grosser parts.', 'Let us pass to the attributes of soul and see if there is any one which is in me?', 'I do not now admit anything which is not necessarily true:  to speak accurately I am not more than a thing which thinks, that is to say a mind or a soul, or an understanding, or a reason, which are terms whose significance was formerly unknown to me.', 'And although possibly (or rather certainly, as I shall say in a moment) I possess a body with which I am very intimately conjoined, yet because, on the one side, I have a clear and distinct idea of myself inasmuch as I am only a thinking and unextended thing, and as, on the other, I possess a dist

In [9]:
nlp = spacy.load("en_core_web_md")

In [10]:
doc_DeAnima_soul = nlp(raw_text_1)
doc_Meditationes_soul = nlp(raw_text_2)

similarity = doc_DeAnima_soul.similarity(doc_Meditationes_soul)
print(similarity)

0.9781099410397767


**Comments**: as predicted in *ideal_strategy*, comparing semantic similarity between entire sets of relevant sentences yields high degree of similarity since *all* sentences are dealing with the term "soul". Hence, this method is inadequate for the present purpose and was only carried out for educational reasons, i.e. in order to improve my understanding of the operation. 

Furthermore, I am less interested in the *absolute* degree of semantic similarity (which I regard as rather meaningless) than in the degree of semantic similarity *relative to* another degree of semantic similarity. One can then ask: is the degree of semantic similarity of comparison A (significantly) higher than the degee of semantic similarity of comparison B? This requires taking into account a third doc_object, i.e. the writings of Ockham, which - however - will not be done as long as the methodological problems encountered so far have not been solved. It must also be clarified what it means for a degree of similarity to be *significantly* higher than another one. 

**2. Comparing set of adjectives and verbs**

In [16]:
DeAnima_AdjVrb = open('DeAnima_AdjVrb.txt', encoding="utf8")
sample_DeAnima_AdjVrb = DeAnima_AdjVrb.read()

from spacy.lang.en import English

raw_text_3 = sample_DeAnima_AdjVrb
nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))
doc_DeAnima_AdjVrb = nlp(raw_text_3)
sentences = [sent.string.strip() for sent in doc_DeAnima_AdjVrb.sents]

sentences

['distinguished\nprobable\nother\nSixth\nother\nSeventh\nsensitive\nsame\nsame\nsame\nsame\nancient\nmaterial\nother\nother\nbad\nspiritual\nhigher\nrepugnant\nsame\ndifferent\nother\n\nhaving\nsupposed\ndistinguished\ndepend\nfollow\nrooted\ndesiring\ncan\nasserted\nerred\nposited\nposited\nnecessitates\nimmersed\nraised\ndesire\nconstrains']

In [17]:
sentences = [item.replace('\n', ", ") for item in sentences]
print (sentences)

['distinguished, probable, other, Sixth, other, Seventh, sensitive, same, same, same, same, ancient, material, other, other, bad, spiritual, higher, repugnant, same, different, other, , having, supposed, distinguished, depend, follow, rooted, desiring, can, asserted, erred, posited, posited, necessitates, immersed, raised, desire, constrains']


In [18]:
Meditationes_AdjVrb = open('Meditationes_AdjVrb.txt', encoding="utf8")
sample_Meditationes_AdjVrb = Meditationes_AdjVrb.read()

from spacy.lang.en import English

raw_text_4 = sample_Meditationes_AdjVrb
nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))
doc_Meditationes_AdjVrb = nlp(raw_text_4)
sentences = [sent.string.strip() for sent in doc_Meditationes_AdjVrb.sents]

sentences

['rare\nsubtle\ngrosser\ntrue\nmore\nunknown\nclear\ndistinct\ninasmuch\nunextended\nother\ndistinct\ninasmuch\nextended\nunthinking\ncertain\ndistinct\ndifferent\nagreeable\ndisagreeable\ncertain\ninasmuch\ndifferent\nagreeable\ndisagreeable\nother\ndropsical\nextrinsic\ninasmuch\ncomposite\nverbal\nreal\nhurtful\nimaginable\ndivisible\nsufficient\ndifferent\nother\n\nconsidered\nnourished\nwalked\nfelt\nthought\nreferred\nstop\nconsider\nstop\nimagined\nspread\nLet\npass\nsee\nadmit\nspeak\nthinks\nsay\nshall\nsay\npossess\nconjoined\npossess\nsay\ncan\nexist\nformed\nmay\nreceive\nsurround\nspeak\napply\nsay\ncorrupted\ndrink\nparched\nsay\nunited\nwould\nextended\ncan\ndivide\nrecognise\nwould\nteach\nlearned']

In [19]:
sentences = [item.replace('\n', ", ") for item in sentences]
print (sentences)

['rare, subtle, grosser, true, more, unknown, clear, distinct, inasmuch, unextended, other, distinct, inasmuch, extended, unthinking, certain, distinct, different, agreeable, disagreeable, certain, inasmuch, different, agreeable, disagreeable, other, dropsical, extrinsic, inasmuch, composite, verbal, real, hurtful, imaginable, divisible, sufficient, different, other, , considered, nourished, walked, felt, thought, referred, stop, consider, stop, imagined, spread, Let, pass, see, admit, speak, thinks, say, shall, say, possess, conjoined, possess, say, can, exist, formed, may, receive, surround, speak, apply, say, corrupted, drink, parched, say, united, would, extended, can, divide, recognise, would, teach, learned']


In [20]:
nlp = spacy.load("en_core_web_md")

doc_DeAnima_AdjVrb = nlp(raw_text_3)
doc_Meditationes_AdjVrb = nlp(raw_text_4)

similarity = doc_DeAnima_AdjVrb.similarity(doc_Meditationes_AdjVrb)
print(similarity)

0.9242025284235089


**Comment**: as I have not yet figured out how to extract only those adjectives and verbs that stand in *direct relation* to term "soul", the present operation compares semantic similarity between *all* adjectives and verbs used in the sentences in which a match has been found. The degree of semantic similarity (and the comparison as such) is thus meaningless and was, again, only carried out for educational reasons. 

Furthermore, as already indicated in the *Analysis_Sample*-notebooks, finding a method for extracting only the adjectives and verbs directly connected to the term "soul" will not solve the problem since the characterisation of a philosophical term (such as "soul") is far more complex than this, i.e. it goes beyond the simple analysis of those adjectives and verbs.

**technical question**: in the two preceding operations (semantic similarity between (i) sets of sentences and (ii) sets of adjectives and verbs), am I really comparing the cleaned-up versions (with commas and spaces) or merely the raw versions (with \n)?