# Crawling the Stanford Encyclopedia of Philosophy 

In [107]:
import requests
import os.path
from bs4 import BeautifulSoup

To get all the articles from the SEP, we will first visit the Table of Contents and get all the links.

In [7]:
url = 'https://plato.stanford.edu/contents.html'
html = requests.get(url).content
soup = BeautifulSoup(html, 'lxml')

In [52]:
content_section = soup('div', id='content')[0].find_all('li')
content_section_li = [li.a for li in content_section]

In [64]:
links = []
for a in content_section_li:
    try:
        url = a.get('href')
        if url not in links:
            links.append(url)
    except:
        continue
links = ['https://plato.stanford.edu/' + url for url in links]

For every link, we will visit, save title and content text, and save it to text.

In [114]:
visited = []
for link in links:
    print 'Progress: %s of %s (%2.f%%)' % (len(visited), len(links), (len(visited)/float(len(links)))*100)
    html = requests.get(link).content
    soup = BeautifulSoup(html, 'lxml')
    article = soup('div', id='article-content')[0]
    title = article.h1.getText()
    print 'Visited: %s' % title
    
    filename = 'sep_articles/' + title.lower().replace(' ', '_').replace('/','--') + '.txt'
    if not os.path.isfile(filename):
        preamble = article('div', id='preamble')[0].getText()
        content = article('div', id='main-text')[0].getText()
        full_text = preamble + content
        
        with open(filename, 'w') as f:
            f.write(full_text.encode('utf8'))
        print 'Saved.'
        visited.append(title)
    else:
        print "Already there. Skipping"
        visited.append(title)

Progress: 0 of 1652 ( 0%)
Visited: Abduction
Already there. Skipping
Progress: 1 of 1652 ( 0%)
Visited: Peter Abelard
Already there. Skipping
Progress: 2 of 1652 ( 0%)
Visited: Abhidharma
Already there. Skipping
Progress: 3 of 1652 ( 0%)
Visited: Abilities
Already there. Skipping
Progress: 4 of 1652 ( 0%)
Visited: Abner of Burgos
Already there. Skipping
Progress: 5 of 1652 ( 0%)
Visited: Judah Abrabanel
Already there. Skipping
Progress: 6 of 1652 ( 0%)
Visited: Abstract Objects
Already there. Skipping
Progress: 7 of 1652 ( 0%)
Visited: Essential vs. Accidental Properties
Already there. Skipping
Progress: 8 of 1652 ( 0%)
Visited: Action
Already there. Skipping
Progress: 9 of 1652 ( 0%)
Visited: Shared Agency
Already there. Skipping
Progress: 10 of 1652 ( 0%)
Visited: The Logic of Action
Already there. Skipping
Progress: 11 of 1652 ( 0%)
Visited: Action-based Theories of Perception
Already there. Skipping
Progress: 12 of 1652 ( 0%)
Visited: Action at a Distance in Quantum Mechanics
Alrea

Visited: Arabic and Islamic Psychology and Philosophy of Mind
Saved.
Progress: 106 of 1652 ( 0%)
Visited: Greek Sources in Arabic and Islamic Philosophy
Saved.
Progress: 107 of 1652 ( 0%)
Visited: Influence of Arabic and Islamic Philosophy on Judaic Thought
Saved.
Progress: 108 of 1652 ( 0%)
Visited: Influence of Arabic and Islamic Philosophy on the Latin West
Saved.
Progress: 109 of 1652 ( 0%)
Visited: Ibn Kammuna
Saved.
Progress: 110 of 1652 ( 0%)
Visited: Ikhwân al-Safâ’
Saved.
Progress: 111 of 1652 ( 0%)
Visited: Mysticism in Arabic and Islamic Philosophy
Saved.
Progress: 112 of 1652 ( 0%)
Visited: Arcesilaus
Saved.
Progress: 113 of 1652 ( 0%)
Visited: Philosophy of Architecture
Saved.
Progress: 114 of 1652 ( 0%)
Visited: Archytas
Saved.
Progress: 115 of 1652 ( 0%)
Visited: Hannah Arendt
Saved.
Progress: 116 of 1652 ( 0%)
Visited: Ancient Ethical Theory
Saved.
Progress: 117 of 1652 ( 0%)
Visited: Epistemic Utility Arguments for Probabilism
Saved.
Progress: 118 of 1652 ( 0%)
Visited

Visited: Bounded Rationality
Saved.
Progress: 231 of 1652 ( 0%)
Visited: Robert Boyle
Saved.
Progress: 232 of 1652 ( 0%)
Visited: Francis Herbert Bradley
Saved.
Progress: 233 of 1652 ( 0%)
Visited: Francis Herbert Bradley’s Moral and Political Philosophy
Saved.
Progress: 234 of 1652 ( 0%)
Visited: Bradley’s Regress
Saved.
Progress: 235 of 1652 ( 0%)
Visited: The Definition of Death
Saved.
Progress: 236 of 1652 ( 0%)
Visited: Skepticism and Content Externalism
Saved.
Progress: 237 of 1652 ( 0%)
Visited: Franz Brentano
Saved.
Progress: 238 of 1652 ( 0%)
Visited: Brentano’s Theory of Judgement
Saved.
Progress: 239 of 1652 ( 0%)
Visited: Charlie Dunbar Broad
Saved.
Progress: 240 of 1652 ( 0%)
Visited: Luitzen Egbertus Jan Brouwer
Saved.
Progress: 241 of 1652 ( 0%)
Visited: Giordano Bruno
Saved.
Progress: 242 of 1652 ( 0%)
Visited: Martin Buber
Saved.
Progress: 243 of 1652 ( 0%)
Visited: Buddha
Saved.
Progress: 244 of 1652 ( 0%)
Visited: Chan Buddhism
Saved.
Progress: 245 of 1652 ( 0%)
Visi

Visited: Computation in Physical Systems
Saved.
Progress: 356 of 1652 ( 0%)
Visited: Computational Linguistics
Saved.
Progress: 357 of 1652 ( 0%)
Visited: The Computational Theory of Mind
Saved.
Progress: 358 of 1652 ( 0%)
Visited: Computer and Information Ethics
Saved.
Progress: 359 of 1652 ( 0%)
Visited: The Philosophy of Computer Science
Saved.
Progress: 360 of 1652 ( 0%)
Visited: The Modern History of Computing
Saved.
Progress: 361 of 1652 ( 0%)
Visited: Computing and Moral Responsibility
Saved.
Progress: 362 of 1652 ( 0%)
Visited: Auguste Comte
Saved.
Progress: 363 of 1652 ( 0%)
Visited: Concepts
Saved.
Progress: 364 of 1652 ( 0%)
Visited: Condemnation of 1277
Saved.
Progress: 365 of 1652 ( 0%)
Visited: Étienne Bonnot de Condillac
Saved.
Progress: 366 of 1652 ( 0%)
Visited: Indicative Conditionals
Saved.
Progress: 367 of 1652 ( 0%)
Visited: Counterfactuals
Saved.
Progress: 368 of 1652 ( 0%)
Visited: The Logic of Conditionals
Saved.
Progress: 369 of 1652 ( 0%)
Visited: The History 

Visited: John Dewey
Saved.
Progress: 477 of 1652 ( 0%)
Visited: Dewey’s Moral Philosophy
Saved.
Progress: 478 of 1652 ( 0%)
Visited: Dewey’s Political Philosophy
Saved.
Progress: 479 of 1652 ( 0%)
Visited: Dharmakīrti
Saved.
Progress: 480 of 1652 ( 0%)
Visited: Diagrams
Saved.
Progress: 481 of 1652 ( 0%)
Visited: The Epistemology of Visual Thinking in Mathematics
Saved.
Progress: 482 of 1652 ( 0%)
Visited: Dialectical School
Saved.
Progress: 483 of 1652 ( 0%)
Visited: Hegel’s Dialectics
Saved.
Progress: 484 of 1652 ( 0%)
Visited: Dialetheism
Saved.
Progress: 485 of 1652 ( 0%)
Visited: Denis Diderot
Saved.
Progress: 486 of 1652 ( 0%)
Visited: Dietrich of Freiberg
Saved.
Progress: 487 of 1652 ( 0%)
Visited: The Philosophy of Digital Art
Saved.
Progress: 488 of 1652 ( 0%)
Visited: Wilhelm Dilthey
Saved.
Progress: 489 of 1652 ( 0%)
Visited: Diodorus Cronus
Saved.
Progress: 490 of 1652 ( 0%)
Visited: Pseudo-Dionysius the Areopagite
Saved.
Progress: 491 of 1652 ( 0%)
Visited: The Problem of 

Visited: Social Networking and Ethics
Saved.
Progress: 599 of 1652 ( 0%)
Visited: Thick Ethical Concepts
Saved.
Progress: 600 of 1652 ( 0%)
Visited: Virtue Ethics
Saved.
Progress: 601 of 1652 ( 0%)
Visited: Phenomenological Approaches to Ethics and Information Technology
Saved.
Progress: 602 of 1652 ( 0%)
Visited: Human Enhancement
Saved.
Progress: 603 of 1652 ( 0%)
Visited: Justice, Inequality, and Health
Saved.
Progress: 604 of 1652 ( 0%)
Visited: Justice and Access to Health Care
Saved.
Progress: 605 of 1652 ( 0%)
Visited: Pregnancy, Birth, and Medicine
Saved.
Progress: 606 of 1652 ( 0%)
Visited: Privacy and Medicine
Saved.
Progress: 607 of 1652 ( 0%)
Visited: Public Health Ethics
Saved.
Progress: 608 of 1652 ( 0%)
Visited: The Sale of Human Organs
Saved.
Progress: 609 of 1652 ( 0%)
Visited: Ethics of Stem Cell Research
Saved.
Progress: 610 of 1652 ( 0%)
Visited: Theory and Bioethics
Saved.
Progress: 611 of 1652 ( 0%)
Visited: Eugenics
Saved.
Progress: 612 of 1652 ( 0%)
Visited: Vol

Visited: Games, Full Abstraction and Full Completeness
Saved.
Progress: 719 of 1652 ( 0%)
Visited: Logic and Games
Saved.
Progress: 720 of 1652 ( 0%)
Visited: Logics for Analyzing Games
Saved.
Progress: 721 of 1652 ( 0%)
Visited: Game Theory
Saved.
Progress: 722 of 1652 ( 0%)
Visited: Epistemic Foundations of Game Theory
Saved.
Progress: 723 of 1652 ( 0%)
Visited: Game Theory and Ethics
Saved.
Progress: 724 of 1652 ( 0%)
Visited: Pierre Gassendi
Saved.
Progress: 725 of 1652 ( 0%)
Visited: Gelukpa [dge lugs pa]
Saved.
Progress: 726 of 1652 ( 0%)
Visited: Gene
Saved.
Progress: 727 of 1652 ( 0%)
Visited: Generalized Quantifiers
Saved.
Progress: 728 of 1652 ( 0%)
Visited: Early Philosophical Interpretations of General Relativity
Saved.
Progress: 729 of 1652 ( 0%)
Visited: Generic Generalizations
Saved.
Progress: 730 of 1652 ( 0%)
Visited: Genetic Drift
Saved.
Progress: 731 of 1652 ( 0%)
Visited: Evolutionary Genetics
Saved.
Progress: 732 of 1652 ( 0%)
Visited: The Genotype/Phenotype Distin

Visited: Logic in Classical Indian Philosophy
Saved.
Progress: 845 of 1652 ( 1%)
Visited: Naturalism in Classical Indian Philosophy
Saved.
Progress: 846 of 1652 ( 1%)
Visited: Perceptual Experience and Concepts in Classical Indian Philosophy
Saved.
Progress: 847 of 1652 ( 1%)
Visited: Methodological Individualism
Saved.
Progress: 848 of 1652 ( 1%)
Visited: The Problem of Induction
Saved.
Progress: 849 of 1652 ( 1%)
Visited: Inductive Logic
Saved.
Progress: 850 of 1652 ( 1%)
Visited: Space and Time: Inertial Frames
Saved.
Progress: 851 of 1652 ( 1%)
Visited: Infinite Regress Arguments
Saved.
Progress: 852 of 1652 ( 1%)
Visited: Informal Logic
Saved.
Progress: 853 of 1652 ( 1%)
Visited: Information
Saved.
Progress: 854 of 1652 ( 1%)
Visited: Logic and Information
Saved.
Progress: 855 of 1652 ( 1%)
Visited: Quantum Entanglement and Information
Saved.
Progress: 856 of 1652 ( 1%)
Visited: Semantic Conceptions of Information
Saved.
Progress: 857 of 1652 ( 1%)
Visited: Information Technology 

Visited: Legal Punishment
Saved.
Progress: 963 of 1652 ( 1%)
Visited: Interpretation and Coherence in Legal Reasoning
Saved.
Progress: 964 of 1652 ( 1%)
Visited: Precedent and Analogy in Legal Reasoning
Saved.
Progress: 965 of 1652 ( 1%)
Visited: Legal Rights
Saved.
Progress: 966 of 1652 ( 1%)
Visited: Political Legitimacy
Saved.
Progress: 967 of 1652 ( 1%)
Visited: Antoine Le Grand
Saved.
Progress: 968 of 1652 ( 1%)
Visited: Gottfried Wilhelm Leibniz
Saved.
Progress: 969 of 1652 ( 1%)
Visited: Leibniz’s Ethics
Saved.
Progress: 970 of 1652 ( 1%)
Visited: Leibniz’s Exoteric Philosophy
Saved.
Progress: 971 of 1652 ( 1%)
Visited: Leibniz’s Influence on 19th Century Logic
Saved.
Progress: 972 of 1652 ( 1%)
Visited: Leibniz’s Modal Metaphysics
Saved.
Progress: 973 of 1652 ( 1%)
Visited: Leibniz on Causation
Saved.
Progress: 974 of 1652 ( 1%)
Visited: Leibniz on the Problem of Evil
Saved.
Progress: 975 of 1652 ( 1%)
Visited: Leibniz’s Philosophy of Mind
Saved.
Progress: 976 of 1652 ( 1%)
Vis

Visited: Philosophy of Mathematics
Saved.
Progress: 1088 of 1652 ( 1%)
Visited: Indispensability Arguments in the Philosophy of Mathematics
Saved.
Progress: 1089 of 1652 ( 1%)
Visited: Naturalism in the Philosophy of Mathematics
Saved.
Progress: 1090 of 1652 ( 1%)
Visited: Nominalism in the Philosophy of Mathematics
Saved.
Progress: 1091 of 1652 ( 1%)
Visited: Platonism in the Philosophy of Mathematics
Saved.
Progress: 1092 of 1652 ( 1%)
Visited: Wittgenstein’s Philosophy of Mathematics
Saved.
Progress: 1093 of 1652 ( 1%)
Visited: John M. E. McTaggart
Saved.
Progress: 1094 of 1652 ( 1%)
Visited: George Herbert Mead
Saved.
Progress: 1095 of 1652 ( 1%)
Visited: The Normativity of Meaning and Content
Saved.
Progress: 1096 of 1652 ( 1%)
Visited: Word Meaning
Saved.
Progress: 1097 of 1652 ( 1%)
Visited: Theories of Meaning
Saved.
Progress: 1098 of 1652 ( 1%)
Visited: Meaning Holism
Saved.
Progress: 1099 of 1652 ( 1%)
Visited: Treating Persons as Means
Saved.
Progress: 1100 of 1652 ( 1%)
Vis

Visited: Nicholas of Autrecourt
Saved.
Progress: 1207 of 1652 ( 1%)
Visited: Friedrich Nietzsche
Saved.
Progress: 1208 of 1652 ( 1%)
Visited: Nietzsche’s Life and Works
Saved.
Progress: 1209 of 1652 ( 1%)
Visited: Nietzsche’s Moral and Political Philosophy
Saved.
Progress: 1210 of 1652 ( 1%)
Visited: Nominalism in Metaphysics
Saved.
Progress: 1211 of 1652 ( 1%)
Visited: Nonexistent Objects
Saved.
Progress: 1212 of 1652 ( 1%)
Visited: The Nonidentity Problem
Saved.
Progress: 1213 of 1652 ( 1%)
Visited: Social Norms
Saved.
Progress: 1214 of 1652 ( 1%)
Visited: John Norris
Saved.
Progress: 1215 of 1652 ( 1%)
Visited: Nothingness
Saved.
Progress: 1216 of 1652 ( 1%)
Visited: Georg Friedrich Philipp von Hardenberg [Novalis]
Saved.
Progress: 1217 of 1652 ( 1%)
Visited: Robert Nozick’s Political Philosophy
Saved.
Progress: 1218 of 1652 ( 1%)
Visited: Numenius
Saved.
Progress: 1219 of 1652 ( 1%)
Visited: Michael Oakeshott
Saved.
Progress: 1220 of 1652 ( 1%)
Visited: Object
Saved.
Progress: 1221

Visited: Principle of Sufficient Reason
Saved.
Progress: 1331 of 1652 ( 1%)
Visited: Arthur Prior
Saved.
Progress: 1332 of 1652 ( 1%)
Visited: Prisoner’s Dilemma
Saved.
Progress: 1333 of 1652 ( 1%)
Visited: Privacy
Saved.
Progress: 1334 of 1652 ( 1%)
Visited: Private Language
Saved.
Progress: 1335 of 1652 ( 1%)
Visited: Imprecise Probabilities
Saved.
Progress: 1336 of 1652 ( 1%)
Visited: Probability in Medieval and Renaissance Philosophy
Saved.
Progress: 1337 of 1652 ( 1%)
Visited: Interpretations of Probability
Saved.
Progress: 1338 of 1652 ( 1%)
Visited: Process Philosophy
Saved.
Progress: 1339 of 1652 ( 1%)
Visited: Process Theism
Saved.
Progress: 1340 of 1652 ( 1%)
Visited: Proclus
Saved.
Progress: 1341 of 1652 ( 1%)
Visited: Progress
Saved.
Progress: 1342 of 1652 ( 1%)
Visited: Promises
Saved.
Progress: 1343 of 1652 ( 1%)
Visited: Proof Theory
Saved.
Progress: 1344 of 1652 ( 1%)
Visited: Intellectual Property
Saved.
Progress: 1345 of 1652 ( 1%)
Visited: Prophecy
Saved.
Progress: 1

Visited: Śāntarakṣita
Saved.
Progress: 1453 of 1652 ( 1%)
Visited: Śāntideva
Saved.
Progress: 1454 of 1652 ( 1%)
Visited: Sakya Paṇḍita [sa skya paṇ ḍi ta]
Saved.
Progress: 1455 of 1652 ( 1%)
Visited: Wesley Salmon
Saved.
Progress: 1456 of 1652 ( 1%)
Visited: George Santayana
Saved.
Progress: 1457 of 1652 ( 1%)
Visited: Jean-Paul Sartre
Saved.
Progress: 1458 of 1652 ( 1%)
Visited: Max Scheler
Saved.
Progress: 1459 of 1652 ( 1%)
Visited: Friedrich Wilhelm Joseph von Schelling
Saved.
Progress: 1460 of 1652 ( 1%)
Visited: Schema
Saved.
Progress: 1461 of 1652 ( 1%)
Visited: Friedrich Schiller
Saved.
Progress: 1462 of 1652 ( 1%)
Visited: August Wilhelm von Schlegel
Saved.
Progress: 1463 of 1652 ( 1%)
Visited: Friedrich Schlegel
Saved.
Progress: 1464 of 1652 ( 1%)
Visited: Friedrich Daniel Ernst Schleiermacher
Saved.
Progress: 1465 of 1652 ( 1%)
Visited: Moritz Schlick
Saved.
Progress: 1466 of 1652 ( 1%)
Visited: Carl Schmitt
Saved.
Progress: 1467 of 1652 ( 1%)
Visited: Gershom Scholem
Saved

Visited: Time Machines
Saved.
Progress: 1575 of 1652 ( 1%)
Visited: Time Travel
Saved.
Progress: 1576 of 1652 ( 1%)
Visited: Time Travel and Modern Physics
Saved.
Progress: 1577 of 1652 ( 1%)
Visited: Timon of Phlius
Saved.
Progress: 1578 of 1652 ( 1%)
Visited: Toleration
Saved.
Progress: 1579 of 1652 ( 1%)
Visited: Theories of the Common Law of Torts
Saved.
Progress: 1580 of 1652 ( 1%)
Visited: Torture
Saved.
Progress: 1581 of 1652 ( 1%)
Visited: Touch
Saved.
Progress: 1582 of 1652 ( 1%)
Visited: Transcendental Arguments
Saved.
Progress: 1583 of 1652 ( 1%)
Visited: Transcendentalism
Saved.
Progress: 1584 of 1652 ( 1%)
Visited: Trinity
Saved.
Progress: 1585 of 1652 ( 1%)
Visited: Trust
Saved.
Progress: 1586 of 1652 ( 1%)
Visited: Truth
Saved.
Progress: 1587 of 1652 ( 1%)
Visited: Axiomatic Theories of Truth
Saved.
Progress: 1588 of 1652 ( 1%)
Visited: The Coherence Theory of Truth
Saved.
Progress: 1589 of 1652 ( 1%)
Visited: The Correspondence Theory of Truth
Saved.
Progress: 1590 of 1