### Extracting and querying Einstein's quotes (of Goodreads website)
> source: https://www.goodreads.com/author/quotes/9810.Albert_Einstein

In [1]:
from bs4 import BeautifulSoup as bs
import requests

In [2]:
def extract_quotes(url):
    
    html = requests.get(url).text
    soup = bs(html, "lxml")
    
    # div 'quote' 
        # > div 'quoteText'       # text
        # > div 'quoteFooter'
            # > div 'right'       # likes 

    quotes_div = soup.findAll('div', class_='quote')
    
    quotes = []
    
    for q in quotes_div:

        # quote - Author
        quote = q.find('div', class_='quoteText').text.strip() 
        txt, *author = quote.strip().split("\n")
        
        # num of likes
        likes = q.find('div', class_='right').text.strip().split()[0]
        
        
        quotes.append((txt, likes))

    return quotes
    

In [3]:
%%time

# scraping quotes from the 33 pages of Einstein's
page_quotes = []
base_url = 'https://www.goodreads.com/author/quotes/9810.Albert_Einstein?page='
for i in range(1, 34):
    url = base_url + str(i)
    txt = extract_quotes(url)
    page_quotes.append(txt)

CPU times: user 2.23 s, sys: 69.8 ms, total: 2.3 s
Wall time: 28.3 s


In [4]:
# concatenate quotes from the different pages into one list
quotes = []
for q in page_quotes:
    quotes.extend(q)

In [5]:
# unpack quotes txt and likes
txts, likes = zip(*quotes)

---
We are done extracting the quotes from Goodreads website; next lets load the quotes into a (pandas) dataframe so we can perform some queries:

---

In [6]:
import pandas as pd
pd.set_option('display.max_colwidth', 140) # for better readability 

In [7]:
df = pd.DataFrame(columns=['likes', 'quote'])

In [8]:
df['likes'] = list(likes)
df['likes'] = df['likes'].astype(int)
df['quote'] = list(txts)

### Do some queries

In [14]:
df[:10]

Unnamed: 0,likes,quote
0,101792,“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”
1,40326,“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
2,31956,“I am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagin...
3,25794,"“If you can't explain it to a six year old, you don't understand it yourself.”"
4,21812,"“If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales.”"
5,21621,“Logic will get you from A to Z; imagination will get you everywhere.”
6,18708,"“Life is like riding a bicycle. To keep your balance, you must keep moving.”"
7,15392,“Anyone who has never made a mistake has never tried anything new.”
8,11578,"“I speak to everyone in the same way, whether he is the garbage man or the president of the university.”"
9,8997,“When you are courting a nice girl an hour seems like a second. When you sit on a red-hot cinder a second seems like an hour. That's rel...


In [9]:
df.quote[2]

'“I am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.”'

In [10]:
# quotes that contain the word 'imagination'
df[df['quote'].str.contains('imagination')]

Unnamed: 0,likes,quote
2,31956,“I am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagin...
5,21621,“Logic will get you from A to Z; imagination will get you everywhere.”
86,597,"“Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces th..."
122,318,“The true sign of intelligence is not knowledge but imagination.”
201,106,"“A society's competitive advantage will come not from how well its schools teach the multiplication and periodic tables, but from how we..."
245,76,"“I believe in intuition and inspiration. Imagination is more important than knowledge. For knowledge is limited, whereas imagination emb..."
267,63,“Your imagination is your preview of life’s coming attractions.”
362,24,“The true sign of intelligence is not knowledge but imagination. I have no special talent. I am only passionately curious.”
373,23,"“To invent something, all you need is imagination and a big pile of junk.”"
470,11,"“Beyond the realms of what we see, into the regions or the unexplored limited only by our imaginations.”"


In [11]:
# quotes with more than 7000 likes
df[df["likes"] > 7000]

Unnamed: 0,likes,quote
0,101792,“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”
1,40326,“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
2,31956,“I am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagin...
3,25794,"“If you can't explain it to a six year old, you don't understand it yourself.”"
4,21812,"“If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales.”"
5,21621,“Logic will get you from A to Z; imagination will get you everywhere.”
6,18708,"“Life is like riding a bicycle. To keep your balance, you must keep moving.”"
7,15392,“Anyone who has never made a mistake has never tried anything new.”
8,11578,"“I speak to everyone in the same way, whether he is the garbage man or the president of the university.”"
9,8997,“When you are courting a nice girl an hour seems like a second. When you sit on a red-hot cinder a second seems like an hour. That's rel...


In [13]:
# save to csv
df.to_csv('einstein-quotes-goodreads.csv', sep='\t', index=False)

---

In [17]:
%%bash
whoami
date

Aziz
Thu Feb 25 19:21:50 EST 2016
