Our overall goal is to classify texts based on the school of philosophy they come from. If we can do this, we should be able to build an app that can take user input and classify it, thus giving us insight into what school of thought new texts might represent or be aligned with.

In [1]:
import pandas as pd
import requests
import numpy as np
from nltk import sent_tokenize
import re
import json
import PyPDF2

def get_guten(url):
    # retrieve the source text
    r = requests.get(url)
    r.encoding = 'utf-8'
    text = r.text
    return text

def get_text(path, encoding='utf8'):
    f = open(path, 'r', encoding=encoding)
    text = f.read()
    f.close()
    return text

We chose to work with 8 schools of thought, representing nearly 2000 years of the history of western philosophy.

- Plato
- Aristotle
- the Rationalists
- the Empiricists
- the German Idealists
- Analytic Philosophy, which can be broken down into:
    - early Analytic
    - contemporary Analytic
- Phenomenology
- Contintental Philosophy

The following cell reads our source texts; some are from Project Gutenberg, while others are sourced independently.


In [2]:
# load the texts

## plato
plato_complete = get_text('.\phil_txts\plato_complete_works.txt')

# aristotle
aristotle_vol1 = get_text('.\phil_txts\\aristotle_complete_works_v1.txt')
aristotle_vol2 = get_text('.\phil_txts\\aristotle_complete_works_v2.txt')

## rationalists
spinoza_ethics = get_guten('http://www.gutenberg.org/cache/epub/3800/pg3800.txt')
spinoza_improve_understanding = get_guten('http://www.gutenberg.org/cache/epub/1016/pg1016.txt')
leibniz_theodicy = get_guten('http://www.gutenberg.org/cache/epub/17147/pg17147.txt')
descartes_discourse_method = get_guten('http://www.gutenberg.org/cache/epub/59/pg59.txt')
descartes_meditations = get_text('.\phil_txts\descartes_meditations.txt')

## empiricists
locke_understanding_1 = get_guten('http://www.gutenberg.org/cache/epub/10615/pg10615.txt')
locke_understanding_2 = get_guten('http://www.gutenberg.org/cache/epub/10616/pg10616.txt')
locke_treatise_gov = get_guten('http://www.gutenberg.org/cache/epub/7370/pg7370.txt')
hume_treatise = get_guten('http://www.gutenberg.org/cache/epub/4705/pg4705.txt')
hume_natural_religion = get_guten('http://www.gutenberg.org/cache/epub/4583/pg4583.txt')
berkeley_treatise = get_guten('http://www.gutenberg.org/cache/epub/4723/pg4723.txt')
berkeley_three_dialogues = get_guten('http://www.gutenberg.org/cache/epub/4724/pg4724.txt')

## german idealism
kant_practical_reason = get_text('.\phil_txts\kant_critique_practical_reason.txt')
kant_judgement = get_text('.\phil_txts\kant_critique_judgement.txt')
kant_pure_reason = get_text('.\phil_txts\kant_pure_reason.txt')
fichte_ethics = get_text('.\phil_txts\\fichte_system_of_ethics.txt')
hegel_logic = get_text('.\phil_txts\hegel_science_of_logic.txt')
hegel_phenomenology = get_text('.\phil_txts\hegel_phenomenology_of_spirit.txt')
hegel_right = get_text('.\phil_txts\hegel_elements_of_right.txt')

## early analytic
russell_problems_of_phil = get_guten('http://www.gutenberg.org/cache/epub/5827/pg5827.txt')
russell_analylsis_of_mind = get_guten('http://www.gutenberg.org/cache/epub/2529/pg2529.txt')
moore_studies = get_guten('http://www.gutenberg.org/files/50141/50141-0.txt')
moore_principia = get_guten('http://www.gutenberg.org/files/53430/53430-0.txt')
wittgenstein_tractatus = get_text('.\phil_txts\wittgenstein_tractatus.txt')
wittgenstein_investigations = get_text('.\phil_txts\wittgenstien_philosophical_investigations.txt')

## modern analytic
lewis_papers1 = get_text('.\phil_txts\lewis_papers_1.txt')
lewis_papers2 = get_text('.\phil_txts\lewis_papers_2.txt')
quine_quintessence = get_text('.\phil_txts\quine_quintessence.txt')
popper_science = get_text('.\phil_txts\popper_logic_of_science.txt')
popper_open_society = get_text('.\phil_txts\popper_open_society.txt')
kripke_troubles = get_text('.\phil_txts\kripke_philosophical_troubles.txt')
kripke_naming = get_text('.\phil_txts\kripke_naming_necessity.txt')

## phenomenology
ponty_perception = get_text('.\phil_txts\merleau-ponty_phenomenology_of_perception.txt')
husserl_idea_of = get_text('.\phil_txts\husserl_idea_of_phenomenology.txt')
husserl_crisis = get_text('.\phil_txts\husserl_crisis_of_euro_sciences.txt')
husserl_cartesian = get_text('.\phil_txts\husserl_cartesian_meditations.txt')
heidegger_being_time = get_text('.\phil_txts\heidegger_being_and_time.txt')
heidegger_track = get_text('.\phil_txts\heidegger_off_the_beaten_track.txt')

## continental
foucault_order = get_text('.\phil_txts\\foucault_order_of_things.txt')
foucault_madness = get_text('.\phil_txts\\foucault_history_of_madness.txt')
foucault_clinic = get_text('.\phil_txts\\foucault_birth_of_clinic.txt')
derrida_writing = get_text('.\phil_txts\derrida_writing_difference.txt')
deleuze_oedipus = get_text('.\phil_txts\deleuze_guattari_anti-oedipus.txt')
deleuze_diference = get_text('.\phil_txts\deleuze_difference_repetition.txt')



Unfortunately, most of these texts include front and end-matter. There isn't a clear and consistent way that these are differentiated, so we go through them one by one to clip the ends and get the actual philosophical discussion. Some texts include footnotes from translators or editors as well, but removing those would be more time consuming than we can really do here. So we leave those footnotes as a kind of noise in our data, and hope to remove them in a future iteration.

In [14]:
plato_complete = plato_complete.split('find that an enticing')[1][388:].split('Demeter, whose cult at')[0]
aristotle_vol_1 = aristotle_vol1.split('1a20-1b9')[1].split('799a16')[0]
aristotle_vol_2 = aristotle_vol2.split('830a5-830b4')[1].split('1462a5-1462a13')[0]
spinoza_ethics = spinoza_ethics.split('ranslated from the Latin by R.')[1][71:].split('End of the Ethics')[0]
spinoza_improve_understanding = spinoza_improve_understanding.split('Farewell.*')[1][20:].split('End of ')[0]
leinbiz_theodicy = leibniz_theodicy.split('appeared in 1710 as the')[1][202:].split('SUMMARY OF THE CON')[0][:-140]
descartes_discourse_method = descartes_discourse_method.split('PREFATORY NOTE')[1][18:].split('End of the Pr')[0]
descartes_meditations = descartes_meditations.split('LETTER')[1][1:].split('EXPLANATORY NOTES')[0][:-8]
locke_understanding_1 = locke_understanding_1.split('2 Dorset Court, 24th of May, 1689')[1][50:].split('End of the Pro')[0][:-30]
locke_understanding_2 = locke_understanding_2.split('1. Man fitted to form articulated Sounds.')[1][4:].split('End of the Pro')[0][:-25]
locke_treatise_gov = locke_treatise_gov.split('now lodged in Christ College, Cambridge.')[1][21:].split('FINIS.')[0]
hume_treatise = hume_treatise.split('ADVERTISEMENT')[1][9:].split('End of Pro')[0][:-14]
hume_natural_religion = hume_natural_religion.split('PAMPHILUS TO HERMIPPUS')[1][6:].split('End of the Pro')[0][:-22]
berkeley_treatise = berkeley_treatise.split('are too apt to condemn an opinion before they rightly')[1][47:].split('End of the Pr')[0][:-22]
berkeley_three_dialogues = berkeley_three_dialogues.split('THE FIRST DIALOGUE')[1][17:].split('End of the Pro')[0][:-22]
kant_practical_reason = kant_practical_reason.split('erner Pluhar an')[1][329:].split('stone of the wi')[0][:-20]
kant_judgement = kant_judgement.split('TO THE FIRST EDITION,* 1790')[1][1:].split('EXPLANATORY NOTES')[0][:-39]
kant_pure_reason = kant_pure_reason.split('Bacon of Verulam')[1][33:].split('(Persius, Satires, iii, 78-9).')[0][:-1]
fichte_ethics = fichte_ethics.split('(“Krause Nachschrift,” 1798/99)')[1][111:].split('Page 345')[0][:-2]
hegel_logic = hegel_logic.split('he complete transformati')[1][249:].split('Hegel’s Logic in its revised and unrevised parts')[0][:-32]
hegel_phenomenology = hegel_phenomenology.split('PREFACE: ON SCIENTIFIC')[1][1:].split('1I Adaptation')[0][:-62]
hegel_right = hegel_right.split('he immediate occasion f')[1][184:].split('I Hegel lectured on the topics in')[0][:-28]
russell_problems_of_phil = russell_problems_of_phil.split('n the following pages')[1].split('BIBLIOGRAPHICAL NOTE')[0]
russell_analylsis_of_mind = russell_analylsis_of_mind.split('H. D. Lewis')[2][21:].split('End of Pro')[0]
moore_studies = moore_studies.split('Aristotelian Society,_ 1919-20.')[1][23:].split('E Wes')[0][:-10]
moore_principia = moore_principia.split('AUCTOR')[1][20:].split('(129-133')[0]
wittgenstein_tractatus = wittgenstein_tractatus.split('TRACTATUS LOGICO-PHILOSOPHICUS')[1][70:].split('I NDEX')[0][:-8]
wittgenstein_investigations = wittgenstein_investigations.split('catty')[1][787:].split("above', 351")[0]
lewis_papers1 = lewis_papers1.split('The fifteen papers')[1][61:].split('Acquai')[0][:-10]
lewis_papers2 = lewis_papers2.split('Part Four Counterfactuals and Time')[1][17:].split('end p.342')[0]
quine_quintessence = quine_quintessence.split('T R UT H B Y C O N V E N T I O N')[1].split('CREDITS')[0][:-7]
popper_science = popper_science.split('F IRST E NGLISH E DITION, 1959')[1][2:].split('This is the end of the text of the original book.')[0]


In [320]:
popper_science.split('F IRST E NGLISH E DITION, 1959')[1][2:].split('This is the end of the text of the original book.')[0]

' the defences of obscurantism\nwhich bar the way of scientiﬁc advance. For the worship of this idol\nhampers not only the boldness of our questions, but also the rigour\nand the integrity of our tests. The wrong view of science betrays itself\nin the craving to be right; for it is not his possession of knowledge, of\nirrefutable truth, that makes the man of science, but his persistent and\nrecklessly critical quest for truth.\nHas our attitude, then, to be one of resignation? Have we to say that\nscience can fulﬁl only its biological task; that it can, at best, merely\nprove its mettle in practical applications which may corroborate it? Are\nits intellectual problems insoluble? I do not think so. Science never\npursues the illusory aim of making its answers ﬁnal, or even probable.\nIts advance is, rather, towards an inﬁnite yet attainable aim: that of ever\ndiscovering new, deeper, and more general problems, and of subjecting\nour ever tentative answers to ever renewed and ever more r