Our overall goal is to classify texts based on the school of philosophy they come from. If we can do this, we should be able to build an app that can take user input and classify it, thus giving us insight into what school of thought new texts might represent or be aligned with.

In [1]:
import pandas as pd
import requests
import tweepy
import numpy as np
from nltk import sent_tokenize
import re
import json
import PyPDF2

def get_guten(url):
    # retrieve the source text
    r = requests.get(url)
    r.encoding = 'utf-8'
    text = r.text
    return text

def get_text(path, encoding='utf8'):
    f = open(path, 'r', encoding=encoding)
    text = f.read()
    f.close()
    return text

We chose to work with 8 schools of thought, representing nearly 2000 years of the history of western thought.

- Plato
- Aristotle
- the Rationalists
- the Empiricists
- the German Idealists
- Analytic Philosophy, which can be broken down into:
    - early Analytic
    - contemporary Analytic
- Phenomenology
- Contintental Philosophy

The following cell reads our source texts; some are from Project Gutenberg, while others are sourced independently.


In [3]:
# load the texts

## plato
plato_complete = get_text('.\phil_txts\plato_complete_works.txt')

# aristotle
aristotle_vol1 = get_text('.\phil_txts\\aristotle_complete_works_v1.txt')
aristotle_vol2 = get_text('.\phil_txts\\aristotle_complete_works_v2.txt')

## rationalists
spinoza_ethics = get_guten('http://www.gutenberg.org/cache/epub/3800/pg3800.txt')
spinoza_improve_understanding = get_guten('http://www.gutenberg.org/cache/epub/1016/pg1016.txt')
leibniz_theodicy = get_guten('http://www.gutenberg.org/cache/epub/17147/pg17147.txt')
leibniz_monadology = get_text('.\phil_txts\leibniz_monadology.txt')
descartes_discourse_method = get_guten('http://www.gutenberg.org/cache/epub/59/pg59.txt')
descartes_meditations = get_text('.\phil_txts\descartes_meditations.txt')

## empiricists
locke_understanding_1 = get_guten('http://www.gutenberg.org/cache/epub/10615/pg10615.txt')
locke_understanding_2 = get_guten('http://www.gutenberg.org/cache/epub/10616/pg10616.txt')
locke_treatise_gov = get_guten('http://www.gutenberg.org/cache/epub/7370/pg7370.txt')
hume_treatise = get_guten('http://www.gutenberg.org/cache/epub/4705/pg4705.txt')
hume_natural_religion = get_guten('http://www.gutenberg.org/cache/epub/4583/pg4583.txt')
hume_princ_morals = get_guten('http://www.gutenberg.org/cache/epub/4320/pg4320.txt')
berkeley_treatise = get_guten('http://www.gutenberg.org/cache/epub/4723/pg4723.txt')
berkeley_three_dialogues = get_guten('http://www.gutenberg.org/cache/epub/4724/pg4724.txt')

## german idealism
kant_practical_reason = get_text('.\phil_txts\kant_critique_practical_reason.txt')
kant_judgement = get_text('.\phil_txts\kant_critique_judgement.txt')
kant_pure_reason = get_text('.\phil_txts\kant_pure_reason.txt')
fichte_ethics = get_text('.\phil_txts\\fichte_system_of_ethics.txt')
hegel_logic = get_text('.\phil_txts\hegel_science_of_logic.txt')
hegel_phenomenology = get_text('.\phil_txts\hegel_phenomenology_of_spirit.txt')
hegel_right = get_text('.\phil_txts\hegel_elements_of_right.txt')

## early analytic
russell_problems_of_phil = get_guten('http://www.gutenberg.org/cache/epub/5827/pg5827.txt')
russell_analylsis_of_mind = get_guten('http://www.gutenberg.org/cache/epub/2529/pg2529.txt')
moore_studies = get_guten('http://www.gutenberg.org/files/50141/50141-0.txt')
moore_principia = get_guten('http://www.gutenberg.org/files/53430/53430-0.txt')
wittgenstein_tractatus = get_text('.\phil_txts\wittgenstein_tractatus.txt')
wittgenstein_investigations = get_text('.\phil_txts\wittgenstien_philosophical_investigations.txt')

## modern analytic
lewis_papers1 = get_text('.\phil_txts\lewis_papers_1.txt')
lewis_papers2 = get_text('.\phil_txts\lewis_papers_2.txt')
quine_quintessence = get_text('.\phil_txts\quine_quintessence.txt')
popper_science = get_text('.\phil_txts\popper_logic_of_science.txt')
popper_open_society = get_text('.\phil_txts\popper_open_society.txt')
kripke_troubles = get_text('.\phil_txts\kripke_philosophical_troubles.txt')
kripke_naming = get_text('.\phil_txts\kripke_naming_necessity.txt')

## phenomenology
ponty_perception = get_text('.\phil_txts\merleau-ponty_phenomenology_of_perception.txt')
husserl_idea_of = get_text('.\phil_txts\husserl_idea_of_phenomenology.txt')
husserl_crisis = get_text('.\phil_txts\husserl_crisis_of_euro_sciences.txt')
husserl_cartesian = get_text('.\phil_txts\husserl_cartesian_meditations.txt')
heidegger_being_time = get_text('.\phil_txts\heidegger_being_and_time.txt')
heidegger_track = get_text('.\phil_txts\heidegger_off_the_beaten_track.txt')

## continental
foucault_order = get_text('.\phil_txts\\foucault_order_of_things.txt')
foucault_madness = get_text('.\phil_txts\\foucault_history_of_madness.txt')
foucault_clinic = get_text('.\phil_txts\\foucault_birth_of_clinic.txt')
derrida_writing = get_text('.\phil_txts\derrida_writing_difference.txt')
deleuze_oedipus = get_text('.\phil_txts\deleuze_guattari_anti-oedipus.txt')
deleuze_diference = get_text('.\phil_txts\deleuze_difference_repetition.txt')



Unfortunately, most of these texts include front and end-matter. There isn't a clear and consistent way that these are differentiated, so we go through them one by one to clip the ends and get the actual philosophical discussion. Some texts include footnotes from translators or editors as well, but removing those would be more time consuming than we can really do here. So we leave those footnotes as a kind of noise in our data, and hope to remove them in a future iteration.