In [1]:
import pandas as pd
import requests
import tweepy
import numpy as np
from nltk import sent_tokenize


def get_Guten(url):
    # retrieve the source text
    r = requests.get(url)
    r.encoding = 'utf-8'
    text = r.text
    return text

def clean_text(text):
    # remove line break notation
    text = text.replace("\r\n", ' ')
    # remove abbreviative periods and other unwanted periods and characters
    # this helps reduce character count so we get more tweetable quotes
    text = text.replace('_', '')
    text = text.replace('[', '').replace(']', '')
    text = text.replace('e.g.', 'eg')
    text = text.replace('i.e.', 'ie')
    text = text.replace('etc.', 'etc')
    text = text.replace('&c.', '&c')
    text = text.replace('viz.', 'viz')
    return text

def make_into_quotes(text, source):
    # make a list of quotes and clean them up
    quotes = sent_tokenize(text)
    # remove unnecessary spaces
    quotes = [x.strip() for x in quotes]
    # remove empty quotes
    quotes = list(filter(None, quotes))
    # cut out very short ones as they often have no real meaning
    quotes = [x for x in quotes if len(x) > 15]
    # remove the titles of sections & citation-type stuff
    quotes = [x for x in quotes if not x.isupper()]
    quotes = [x for x in quotes if not x.replace('the', '').replace('of', '').replace('and', '').replace('II', '').istitle()]
    quotes = [x for x in quotes if not set('Werke').issubset(x)]
    # remove oddities
    quotes = [x for x in quotes if x[0].isupper()]
    quotes = [x.replace('.', '') for x in quotes]
    quotes = [x for x in quotes if not x[-1].isupper()]
    # add the source
    quotes = [x+'\n- '+source for x in quotes]
    return quotes

In [66]:
# import different texts, cut out their front and end matter
hop1 = clean_text(get_Guten('http://www.gutenberg.org/files/51635/51635-0.txt'))[10545:-34150]
hop2 = clean_text(get_Guten('http://www.gutenberg.org/files/51636/51636-0.txt'))[4489:-42865]
hop3 = clean_text(get_Guten('http://www.gutenberg.org/files/58169/58169-0.txt'))[10068:-125524]
enc_logic = clean_text(get_Guten('http://www.gutenberg.org/files/55108/55108-0.txt'))[36755:-134712]

In [67]:
# turn these texts into quotes and assemble a list
hop1_quotes = make_into_quotes(hop1, 'HoP 1')
hop2_quotes = make_into_quotes(hop2, 'HoP 2')
hop3_quotes = make_into_quotes(hop3, 'HoP 3')
enc_logic_quotes = make_into_quotes(enc_logic, 'EnL')

master_q = hop1_quotes + hop2_quotes + hop3_quotes + enc_logic_quotes
master_q[0:10], len(master_q)


(['Since the History of Philosophy is to be the subject of these lectures, and to-day I am making my first appearance in this University, I hope you will allow me to say what satisfaction it gives me to take my place once more in an Academy of Learning at this particular time\n- HoP 1',
  'For the period seems to have been arrived at when Philosophy may again hope to receive some attention and love—this almost dead science may again raise its voice, and hope that the world which had become deaf to its teaching, may once more lend it an ear\n- HoP 1',
  'The necessities of the time have accorded to the petty interests of every-day life such overwhelming attention: the deep interests of actuality and the strife respecting these have engrossed all the powers and the forces of the mind—as also the necessary means—to so great an extent, that no place has been left to the higher inward life, the intellectual operations of a purer sort; and the better natures have thus been stunted in their g

In [68]:
# import the existing set of quotes and prepare it for merging
old = pd.read_csv('Original_Quote_sheet.csv')
old = old.drop('Unnamed: 0', axis=1)
old = old.rename(columns={'Select one from each column':'quotes'})
old = old.iloc[3:]
old['quotes'] = old['quotes'].str.capitalize()

In [69]:
# turn the list into a dataframe and weed out untweetabley-long quotes
quote_df = pd.DataFrame(master_q, columns=['quotes'])
quote_df = old.append(quote_df)
quote_df['length'] = quote_df['quotes'].str.len()
quote_tweetable = quote_df.loc[quote_df['length'] <= 240].copy()

quote_tweetable.head(), len(quote_tweetable), len(quote_df)

(                                              quotes  length
 3   Logic did not fare quite so badly as metaphysics    48.0
 4  Even the proofs of the existence of god are ci...   136.0
 5  The fact is that there no longer exists any in...   121.0
 6  [kantian philosophy] was a justification from ...   113.0
 7  There was seen the strange spectacle of a cult...   139.0,
 13210,
 17461)

In [70]:
# export csv for use by tweeter program
quote_tweetable.to_csv('Quote List.csv')

In [62]:
2850 + 40015


42865

In [65]:
hop2[-1000:-1]

'spirit. The revelation of God has not come to it as from an alien source. What we here consider so dry and abstract is concrete. “Such rubbish,” it is said, “as we consider when in our study we see philosophers dispute and argue, and settle things this way and that at will, are verbal abstractions only.” No, no; they are the deeds of the world-spirit, gentlemen, and therefore of fate. The philosophers are in so doing nearer to God than those nurtured upon spiritual crumbs; they read or write the orders as they receive them in the original; they are obliged to continue writing on. Philosophers are the initiated ones—those who have taken part in the advance which has been made into the inmost sanctuary; others have their particular interests—this dominion, these riches, this girl. Hundreds and thousands of years are required by the world-spirit to reach the point which we attain more quickly, because we have the advantage of having objects which are past and of dealing with abstraction.

In [None]:
hop1 = get_and_clean_Guten2('http://www.gutenberg.org/files/51635/51635-0.txt')[10520:-32300]
hop2 = get_and_clean_Guten2('http://www.gutenberg.org/files/51636/51636-0.txt')[4469:-40015]
hop3 = get_and_clean_Guten2('http://www.gutenberg.org/files/58169/58169-0.txt')[10056:-119754]

hop1_quotes = make_into_quotes2(hop1, 'HoP 1')
hop2_quotes = make_into_quotes2(hop2, 'HoP 2')
hop3_quotes = make_into_quotes2(hop3, 'HoP 3')

master_q = hop1_quotes + hop2_quotes + hop3_quotes
master_q[0:10], len(master_q)

In [None]:
hop1_quotes[-30:-1]

In [None]:
hop2_quotes[0:20]

In [None]:
hop3_quotes[0:20]

In [None]:
r = requests.get('http://www.gutenberg.org/files/55334/55334-0.txt')
r.encoding = 'utf-8'
opening = r.text[0:5000]
sent_tokenize(opening)


In [None]:
def get_and_clean_Guten2(url):
    # retrieve the source text
    r = requests.get(url)
    r.encoding = 'utf-8'
    text = r.text
    text = text.replace("\r\n", ' ')
    # remove unicode characters and other unnecessaries
    # text = text.replace('â\x80\x9d', ' ')
    # text = text.replace('â\x80\x94', '-')
    # text = text.replace('â\x80\x9c', '')
    # text = text.replace('â\x80\x99', '')
    # text = text.replace('â\x80\x98', '')
    # text = text.replace('\x86', '')
    # text = text.replace('Å\x93', '')
    # remove abbreviative periods and other unwanted periods and characters
    # text = re.sub(r".(?=[^)(]*\))", "", text)
    text = text.replace('_', '')
    text = text.replace('[', '').replace(']', '')
    text = text.replace('e.g.', 'eg')
    text = text.replace('i.e.', 'ie')
    text = text.replace('etc.', 'etc')
    text = text.replace('&c.', '&c')
    text = text.replace('viz.', 'viz')
    #text = text.replace('(', '').replace(')', '')
    return text

In [63]:
def make_into_quotes2(text, source):
    # make a list of quotes and clean them up
    quotes = sent_tokenize(text)
    # remove unnecessary spaces
    quotes = [x.strip() for x in quotes]
    # remove empty quotes
    quotes = list(filter(None, quotes))
    # cut out very short ones as they often have no real meaning
    quotes = [x for x in quotes if len(x) > 15]
    # remove the titles of sections & citation-type stuff
    quotes = [x for x in quotes if not x.isupper()]
    quotes = [x for x in quotes if not x.replace('the', '').replace('of', '').replace('and', '').replace('II', '').istitle()]
    quotes = [x for x in quotes if not set('Werke').issubset(x)]
    # remove oddities
    quotes = [x for x in quotes if x[0].isupper()]
    quotes = [x.replace('.', '') for x in quotes]
    quotes = [x for x in quotes if not x[-1].isupper()]
    # add the source
    quotes = [x+'\n- '+source for x in quotes]
    return quotes

In [47]:
enc_logic1 = get_and_clean_Guten2('http://www.gutenberg.org/files/55108/55108-0.txt')[36673:-133982]

In [48]:
enl_quotes1 = make_into_quotes2(enc_logic1, 'EnL')

In [49]:
enc_logic = get_and_clean_Guten('http://www.gutenberg.org/files/55108/55108-0.txt')[36673:-133982]
enc_logic = make_into_quotes(enc_logic, 'EnL')

In [51]:
len(enc_logic), len(enl_quotes1)

(4346, 4386)

In [57]:
enc_logic[-20:-1]

['This life which has returned to itself from the bias and finitude of cognition, and which by the activity of the notion has become identical with it, is the Speculative or Absolute Idea\n- EnL',
 'The Idea, as unity of the Subjective and Objective Idea, is the notion of the Idea,--a notion whose object  is the Idea as such, and for which the objective  is Idea,--an Object which embraces all characteristics in its unity\n- EnL',
 'This unity is consequently I the absolute and all truth, the Idea which thinks itself,--and here at least as a thinking or Logical Idea\n- EnL',
 'In cognition we had the idea in a biassed, one-sided shape\n- EnL',
 'The process of cognition has issued in the overthrow of this bias and the restoration of that unity, which as unity, and in its immediacy, is in the first instance the Idea of Life\n- EnL',
 'The defect of life lies in its being only the idea implicit or natural: whereas cognition is in an equally one-sided way the merely conscious idea, or the 

In [62]:
enl_quotes1[-80:-50]

['This is the right attitude of rational cognition\n- EnL',
 'Nullity and transitoriness constitute only the superficial features and not the real essence of the world\n- EnL',
 'That essence is the notion in posse and in esse: and thus the world is itself the idea\n- EnL',
 'All unsatisfied endeavour ceases, when we recognise that the final purpose of the world is accomplished no less than ever accomplishing itself\n- EnL',
 "Generally speaking, this is the man's way of looking; while the young imagine that the world is utterly sunk in wickedness, and that the first thing needful is a thorough transformation\n- EnL",
 'The religious mind, on the contrary, views the world as ruled by Divine Providence, and therefore correspondent with what it ought to be\n- EnL',
 "But this harmony between the 'is' and the 'ought to be' is not torpid and rigidly stationary\n- EnL",
 'Good, the final end of the world, has being, only while it constantly produces itself\n- EnL',
 'And the world of spirit