# Scraping texts and keywords to be used in the tarot deck

In this notebook, I'll compile divination texts from a number of sources to be used to re-train GPT-2. Fortunately, there are several pre-processed datasets that will help me, so I'll only need to scrape a few more sources for more variation. 

In [180]:
#-----Imports-----#
#-----------------#
# standard DS
import pandas as pd

# Scraping
from bs4 import BeautifulSoup
import requests

# Formatting
import re

# Organizing
import os

## Tarot: Rider-Waite Deck

Kaggle really has it all - [this dataset](https://www.kaggle.com/lsind18/tarot-json) has the entire Rider-Waite deck images as well as a file with names, descriptions, interpretations, and more.

In [181]:
df = pd.read_json('data_files/tarot_rw.json', orient='records')
df = pd.json_normalize(df['cards'])

In [182]:
df.head()

Unnamed: 0,name,number,arcana,suit,img,fortune_telling,keywords,Archetype,Hebrew Alphabet,Numerology,Elemental,Mythical/Spiritual,Questions to Ask,meanings.light,meanings.shadow,Astrology,Affirmation
0,The Fool,0,Major Arcana,Trump,m00.jpg,"[Watch for new projects and new beginnings, Pr...","[freedom, faith, inexperience, innocence]",The Divine Madman,Aleph/Ox/1,0 (off the scale; pure potential),Air,Adam before the fall. Christ as a wandering ho...,[What would I do if I felt free to take a leap...,"[Freeing yourself from limitation, Expressing ...","[Being gullible and naive, Taking unnecessary ...",,
1,The Magician,1,Major Arcana,Trump,m01.jpg,"[A powerful man may play a role in your day, Y...","[capability, empowerment, activity]",The Ego/The Self,Beth/House/2,"1 (origins, unity, seeds)",The Sun/Mercury,"Thoth, the Egyptian god of wisdom, known to th...","[What am I empowered to do?, How might my abil...","[Taking appropriate action, Receiving guidance...","[Inflating your own ego, Abusing talents, Mani...",,
2,The High Priestess,2,Major Arcana,Trump,m02.jpg,"[A mysterious woman arrives, A sexual secret m...","[intuition, reflection, purity, initiation]",The Virgin/The Maiden,Gimel/Camel/3,"2 (division, debate, duality)",The Moon,"The feminine aspect of divinity, particularity...","[What might a rebel against tradition do?, Wha...","[Listening to your feelings and intuitions, Ex...","[Being aloof, Obsessing on secrets and conspir...",,
3,The Empress,3,Major Arcana,Trump,m03.jpg,"[Pregnancy is in the cards, An opportunity to ...","[fertility, productivity, ripeness, nurturing]",The Mother,Daleth/Door/4,"3 (expression, productivity, output)",Venus,"Gaia, Mother Earth, Ishtar, DemeterÑmature, re...",[What would a concerned and capable mother do?...,"[Nurturing yourself and others, Bearing fruit,...","[Overindulging, Being greedy, Smothering someo...",,
4,The Emperor,4,Major Arcana,Trump,m04.jpg,"[A father figure arrives, A new employer or au...","[authority, regulation, direction, structure]",The Father,"He[as]/Window/5, or in some decks, Tzaddi/Fish...","4 (stability, equality, persistence)",Mars/Aries,"Masculine gods, including the Hebrew God, the ...",[How does the issue of control or regulation i...,"[Exercising authority, Defining limits, Direct...","[Micromanaging, Crushing the creativity of oth...",,


In [183]:
tarot_names = [name for name in df.name.tolist()]
#tarot_names

In [184]:
tarot_keywords = [item for sublist in df.keywords.tolist() for item in sublist]
#tarot_keywords

In [185]:
tarot_fortunes = [item for sublist in df.fortune_telling.tolist() for item in sublist]
#tarot_fortunes

In [186]:
import numpy as np

tarot_atype = [at for at in df.Archetype.tolist() if at is not np.nan]
#tarot_atype

In [187]:
questions = [item for sublist in df['Questions to Ask'].tolist() for item in sublist]
#questions

In [188]:
light = [item for sublist in df['meanings.light'].tolist() for item in sublist]
#light

In [189]:
dark = [item for sublist in df['meanings.shadow'].tolist() for item in sublist]
# dark

## Horoscopes

This GitHub user had scraped close to 13,000 horoscopes into a csv - perfect!

In [190]:
url = 'https://raw.githubusercontent.com/dsnam/markovscope/master/data/horoscopes.csv'
cols = ['horoscope', 'date', 'sign']

h = pd.read_csv(url, sep='|', names=cols)

In [191]:
horoscopes = h.horoscope.tolist()

In [192]:
for h in horoscopes[:5]:
    print(h,'\n')

You’re not the sort to play safe and even if you have been a bit more cautious than usual in recent weeks you will more than make up for it over the next few days. Plan your new adventure today and start working on it tomorrow. 

There is no such thing as something for nothing and if you do not quite believe that now you will believe it by the end of the day. If you lose something valuable accept it as the price you must pay to learn this important lesson. 

As the new moon falls in one of the more adventurous areas of your chart you will take the kind of risk you might usually steer clear of. No doubt it will surprise a few people, including yourself, when it pays off handsomely. 

You will hear something amazing today but can you believe it? If it sounds too good to be true then it might be wise to check it out. Commit too soon and you could find you have signed up for something that does you no good. 

A friend or colleague you have not seen for a while will come back into your life

In [193]:
## Fortune Cookies

### From a GitHub txt file

This user had a txt file with 250 fortune cookie fortunes. 

In [194]:
url = 'https://raw.githubusercontent.com/reggi/fortune-cookie/master/fortune-cookies.txt'

cols = ['fortunes']

fc = pd.read_csv(url, sep='|', names=cols)

In [195]:
fortunes = fc.fortunes.tolist()

In [196]:
len(fortunes)

250

In [197]:
for f in fortunes[:5]:
    print(f,'\n')

With integrity and consistency -- your credits are piling up. 

Reach out your hand today to support others who need you. 

It is not the outside riches bit the inside ones that produce happiness. 

How dark is dark?, How wise is wise? 

We can admire all we see, but we can only pick one. 



### From the web

[This page](https://joshmadison.com/2008/04/20/fortune-cookie-fortunes/) contained several hundred fortune cookie fortunes that were very easily scraped. 

In [198]:
url = 'https://joshmadison.com/2008/04/20/fortune-cookie-fortunes/'

response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'html.parser')

In [199]:
def remove_count(text):
    if text.endswith(")"):
        return text[:-4]
    else:
        return text

In [200]:
fortune_list = [remove_count(f.text) for f in soup.find('article').find_all('li')]
fortune_list[:5]

['A beautiful, smart, and loving person will be coming into your life.',
 'A dubious friend may be an enemy in camouflage.',
 'A faithful friend is a strong defense.',
 'A feather in the hand is better than a bird in the air.',
 'A fresh start will put you on your way.']

In [201]:
len(fortune_list)

365

### Combining the fortunes

Compiling the fortune cookie fortunes into one list -- making sure to remove any possible duplicates with `set`.

In [202]:
fortunes_all = list(set(fortunes + fortune_list))

In [203]:
len(fortunes_all)

593

In [204]:
fortunes_all[:5]

['Rest has a peaceful effect on your physical and emotional health.',
 'You will find great forces in unexpected places.',
 'You will learn something new every day.',
 'Your mentality is alert, practical, and analytical.',
 'If at first you do not succeed... try something harder.']

## Feelings

This GitHub user had a list of feelings - perfect for adding some keywords to possibly use as card titles.

In [205]:
url = 'https://raw.githubusercontent.com/lynneyun/Electronic-Rituals/master/Oracle%20Cards/card_descriptions/feelings_list.txt'
cols = ['feels']

feelings = pd.read_csv(url, sep='|', names=cols)

In [206]:
feelings = feelings.feels.tolist()

In [207]:
feelings[:10]

['Achy',
 'Airy',
 'Blocked',
 'Breathless',
 'Bruised',
 'Burning',
 'Buzzy',
 'Clammy',
 'Clenched',
 'Cold']

## Magic the Gathering

More keywords from a [Magic the Gathering dataset on Kaggle](https://www.kaggle.com/mylesoneill/magic-the-gathering-cards). 

In [208]:
url = 'https://storage.googleapis.com/kagglesdsdata/datasets/196/792844/Keywords.json?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20230411%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20230411T053325Z&X-Goog-Expires=259200&X-Goog-SignedHeaders=host&X-Goog-Signature=9cd23a8463e5a82200725b4ad2f3c93d0ec39be221c243f28e031940c3444a2c056ded0efc26241b341d288b14eabf3a99d0d013cefeb2f052d5e9b8310dec30db2dce1365e2627d47740d53a9df8d4e29f48435180008d866c2236735d4210ef5954a9c523a0ee6d349f9e0347f09496db137ed36e2cf2c6e4c8ac882216b051760f014d62b19b1ca4e0765d2377092547c20761bf50f80af47ae7a462f84e929cdd516763cfc296bb3e9d050285c457bcb9ff9bc25323ac643c63fdc3ec237c486dbc5cf0951aaca5bece8908937b9ae3b688232bc8d0ea8e9cc4a93fcc516a9bf364704dd5509f1e6c9eef4a115cc105fc39de7d1123c8adbbbb8274230c7'

In [209]:
import urllib.request
import json

response = urllib.request.urlopen(url)
encoding = response.info().get_content_charset('utf8')
mtg = json.loads(response.read().decode(encoding))

In [210]:
abilities = mtg['abilityWords']
abilities[:10]

['adamant',
 'addendum',
 'battalion',
 'bloodrush',
 'channel',
 'chroma',
 'cohort',
 'constellation',
 'converge',
 "council's dilemma"]

In [211]:
kw_abilities = mtg['keywordAbilities']
kw_abilities[:10]

['absorb',
 'affinity',
 'afflict',
 'afterlife',
 'aftermath',
 'amplify',
 'annihilator',
 'ascend',
 'assist',
 'aura swap']

In [212]:
kw_actions = mtg['keywordActions']
kw_actions[:10]

['abandon',
 'activate',
 'adapt',
 'amass',
 'assemble',
 'attach',
 'bolster',
 'cast',
 'clash',
 'counter']

## Book of Revelation from the Bible

I was raised in the Bible Belt and spent a lot of time in a Southern Baptist mega-church. The book of Revelation has always fascinated me and feels appropriate to add to my corpus of divination texts. 

Source: http://www.readbibleonline.net/

In [213]:
urls = ['http://www.readbibleonline.net/?page_id=73',
       'http://www.readbibleonline.net/?page_id=272',
       'http://www.readbibleonline.net/?page_id=273',
       'http://www.readbibleonline.net/?page_id=274',
       'http://www.readbibleonline.net/?page_id=275'
       ]

In [214]:
def scrape_revelation(url_list):
    passages = []
    count = 1
    
    print("Starting scrape...\n")
    
    for url in url_list:
        response = requests.get(url)
        web_page = response.text
        soup = BeautifulSoup(web_page, 'html.parser')

        page = [p.text for p in soup.find_all('p')]
        start_idx = [page.index(i) for i in page if i.startswith("Revelation ")][0]
        end_idx = page.index('Top of page.')
        
        passage = page[start_idx+1: end_idx]
        passages.append(passage)
        
        print(f'\tPage {count} scraped!')
        count += 1
    
    print('')    
    print('Scrape complete!')
    
    p_flattened = [item for sublist in passages for item in sublist]
    rev_string = ' '.join(p_flattened)
    rev_string = rev_string.replace("Top of page.", "")
    regex = re.compile(r'[\n\r\t]')
    rev_string = regex.sub("", rev_string)
    rev = re.split('\d+', rev_string)
    revelations = [r.strip() for r in rev][1:]
    
    return revelations

In [215]:
rev = scrape_revelation(urls)

Starting scrape...

	Page 1 scraped!
	Page 2 scraped!
	Page 3 scraped!
	Page 4 scraped!
	Page 5 scraped!

Scrape complete!


In [216]:
len(rev)

408

## I-Ching

The I-Ching is one of the oldest Chinese divination practices. [This site](http://the-iching.com/) has interpretations and meanings of the 64 hexagrams used in the practice. This source will give me a great set of keywords and 'fortunes'.

In [217]:
def scrape_i_ching(num_hexagrams):
    
    from collections import defaultdict
    
    d = defaultdict(list)
    
    name_regex = re.compile(r"\(.*\)")    
    adv_regex = re.compile(r'[\n\r\t]')   
    
    url_prefix = 'http://the-iching.com/hexagram_'
    
    print("Scraping i-ching...\n")
    
    for i in range(1, num_hexagrams+1):
        
        url = url_prefix + str(i)
        response = requests.get(url)
        web_page = response.text
        soup = BeautifulSoup(web_page, 'html.parser')
        
        name = name_regex.sub("", soup.find('h1', class_='hexagram_name').text)
        name = " ".join([word for word in name.split(" ") if word.isalpha()])   
        
        advice = adv_regex.sub("", soup.find('p', class_='iching_advise_text').text)
        
        text_raw = [p.text for p in soup.find_all('p', class_='iching_page_text')] 
        text = [adv_regex.sub(" ", t).strip() for t in text_raw[:4]]
        
        d[i] += [name, advice, text]
        
        print(f'\tHexagram {i} page scraped!')
        
    print('\nScrape complete!')
    return dict(d)

In [218]:
iching = scrape_i_ching(64)

Scraping i-ching...

	Hexagram 1 page scraped!
	Hexagram 2 page scraped!
	Hexagram 3 page scraped!
	Hexagram 4 page scraped!
	Hexagram 5 page scraped!
	Hexagram 6 page scraped!
	Hexagram 7 page scraped!
	Hexagram 8 page scraped!
	Hexagram 9 page scraped!
	Hexagram 10 page scraped!
	Hexagram 11 page scraped!
	Hexagram 12 page scraped!
	Hexagram 13 page scraped!
	Hexagram 14 page scraped!
	Hexagram 15 page scraped!
	Hexagram 16 page scraped!
	Hexagram 17 page scraped!
	Hexagram 18 page scraped!
	Hexagram 19 page scraped!
	Hexagram 20 page scraped!
	Hexagram 21 page scraped!
	Hexagram 22 page scraped!
	Hexagram 23 page scraped!
	Hexagram 24 page scraped!
	Hexagram 25 page scraped!
	Hexagram 26 page scraped!
	Hexagram 27 page scraped!
	Hexagram 28 page scraped!
	Hexagram 29 page scraped!
	Hexagram 30 page scraped!
	Hexagram 31 page scraped!
	Hexagram 32 page scraped!
	Hexagram 33 page scraped!
	Hexagram 34 page scraped!
	Hexagram 35 page scraped!
	Hexagram 36 page scraped!
	Hexagram 37 pag

In [219]:
iching[1]

['Force The Creative',
 'Life is endless sequence of changes. Try to evaluate energy, learn to acquire, accumulate and give, lose. Swallow your pride. Do not try to raise higher Heavens as everything will return to the Earth. The great is similar to the small.',
 ['The Creative works sublime success, Furthering through perseverance.',
  'The movement of heaven is full of power. Thus the superior man makes himself strong and untiring.',
  'There appears a flight of dragons without heads. Good fortune.',
  'It is beginning to everything. It is time to act in accordance with Higher Reason. Something started should be finished. Study to manage the creative process, be able to restrain and direct energy consciously. Do not think and reason about benefits. Do not reject joy and grief. Be constant and reserved in speech, careful and consistent in actions. Moving forward on the way to knowledge, improve your life, find new goals. Do not neglect trifles – the great consists of small things. Hav

In [220]:
iching_kw = []

for kw in iching.values():
    iching_kw.append(kw[0])

In [221]:
iching_kw

['Force The Creative',
 'Field The Receptive',
 'Sprouting Difficulty at the Beginning',
 'Enveloping Youthful Folly',
 'Attending Waiting',
 'Arguing Conflict',
 'Leading The Army',
 'Grouping Holding Together',
 'Small Accumulating Small Taming',
 'Treading',
 'Pervading Peace',
 'Obstruction Standstill',
 'Concording People Fellowship',
 'Great Possessing Great Possession',
 'Humbling Modesty',
 'Enthusiasm',
 'Following',
 'Corrupting Work on the Decayed',
 'Nearing Approach',
 'Viewing Contemplation',
 'Gnawing Bite Biting Through',
 'Adorning Grace',
 'Stripping Splitting Apart',
 'Returning Return',
 'Without Embroiling Innocence',
 'Great Accumulating Great Taming',
 'Swallowing Mouth Corners',
 'Great Exceeding Great Preponderance',
 'Gorge The Abysmal Water',
 'Radiance The Clinging',
 'Conjoining Influence',
 'Persevering Duration',
 'Retiring Retreat',
 'Great Invigorating Great Power',
 'Prospering Progress',
 'Brightness Hiding Darkening of the Light',
 'Dwelling People T

In [222]:
iching_fortunes = []

for f in iching.values():
    iching_fortunes.append(f[1])

In [223]:
iching_fortunes[:5]

['Life is endless sequence of changes. Try to evaluate energy, learn to acquire, accumulate and give, lose. Swallow your pride. Do not try to raise higher Heavens as everything will return to the Earth. The great is similar to the small.',
 'Benefit is in expecting changes. Only having realized necessity and inevitability of cataclysms during transition from one state to another, it is possible to man and overcome difficulties – stop dawdling and spinning the wheels.',
 'Benefit is in expecting changes. Only having realized necessity and inevitability of cataclysms during transition from one state to another, it is possible to man and overcome difficulties – stop dawdling and spinning the wheels.',
 'Ignorance is won by wisdom. Emptiness should be filled in. Nature stands no emptiness.',
 'Keep calm being in involuntary failure. Try to see no inauspicious where there is no it.']

In [224]:
iching_meanings_list = []

for f in iching.values():
    iching_meanings_list.append(f[2])

In [225]:
iching_meanings_list = [item for sublist in iching_meanings_list for item in sublist]

In [226]:
len(iching_meanings_list)

256

In [227]:
iching_combined = list(set(iching_fortunes + iching_meanings_list))

In [228]:
len(iching_combined)

317

## Putting it all together

Combining the texts above into two files - one for keywords and one for fortunes:

Keywords
- feelings 
- tarot keywords
- tarot names
- tarot archetypes
- Magic the Gathering keywords
- i-ching keywords

Fortunes
- horoscopes
- fortune cookies
- revelations
- i-ching fortunes & meanings
- tarot fortunes
- tarot questions
- tarot light
- tarot dark

In [229]:
keywords = list(set(tarot_names + tarot_keywords + tarot_atype + feelings + abilities + kw_abilities + kw_actions + iching_kw
                   ))

len(keywords)

922

In [230]:
fortunes = list(set(tarot_fortunes + questions + light + dark + horoscopes + fortunes_all + rev + iching_combined
                   ))
len(fortunes)

15478

In [231]:
base = os.getcwd()

In [232]:
new_folder = '/data_files/texts'

In [233]:
os.mkdir(base + new_folder)

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\agentklepto\\PycharmProjects\\ai-generated-tarot\\notebooks/data_files/texts'

In [235]:
# with open("./data_files/texts/keywords.txt", mode='wt', encoding='utf-8') as myfile:
#     myfile.write('\n'.join(keywords))

In [236]:
# with open("./data_files/texts/fortunes.txt", mode='wt', encoding='utf-8') as myfile:
#     myfile.write('\n'.join(fortunes[1:]))

## Late Additions

After scraping the above texts, I found more tarot content as well as a site with a ton of Alan Watts quotes. I definitely want as much text as possible, and I think I'll appreciate the added flavor Alan Watts could add to the mix.

### More tarot interpretations

This GitHub user had a great dataset of tarot card meanings and interpretations (note: column names are in French)

In [252]:
url = 'https://raw.githubusercontent.com/sheoak/tarot-deck/master/export.csv'

In [253]:
df = pd.read_csv(url)

In [254]:
df.head()

Unnamed: 0,id,label,legende,description,description_endroit,description_envers,est_arcane,ordre
0,1,The Magician,,"A youthful figure in the robe of a magician, h...","Skill, diplomacy, address, subtlety; sickness,...","Physician, Magus, mental disease, disgrace, di...",1,1
1,2,The High Priestess,,"She has the lunar crescent at her feet, a horn...","Secrets, mystery, the future as yet unrevealed...","Passion, moral or physical ardour, conceit, su...",1,2
2,3,The Empress,,"A stately figure, seated, having rich vestment...","Fruitfulness, action, initiative, length of da...","Light, truth, the unravelling of involved matt...",1,3
3,4,The Emperor,,He has a form of the Crux ansata for his scept...,"Stability, power, protection, realization; a g...","Benevolence, compassion, credit; also confusio...",1,4
4,5,The Hierophant,,He wears the triple crown and is seated betwee...,"Marriage, alliance, captivity, servitude; by a...","Society, good understanding, concord, overkind...",1,5


In [255]:
for d in df.description[:5]:
    print(d, '\n')

A youthful figure in the robe of a magician, having the countenance of divine Apollo, with smile of confidence and shining eyes. Above his head is the mysterious sign of the Holy Spirit, the sign of life, like an endless cord, forming the figure 8 in a horizontal position . About his waist is a serpent-cincture, the serpent appearing to devour its own tail. This is familiar to most as a conventional symbol of eternity, but here it indicates more especially the eternity of attainment in the spirit. In the Magician's right hand is a wand raised towards heaven, while the left hand is pointing to the earth. This dual sign is known in very high grades of the Instituted Mysteries; it shews the descent of grace, virtue and light, drawn from things above and derived to things below. The suggestion throughout is therefore the possession and communication of the Powers and Gifts of the Spirit. On the table in front of the Magician are the symbols of the four Tarot suits, signifying the elements 

In [257]:
kw_light = df.description_endroit.tolist()

In [258]:
kw_light[:5]

['Skill, diplomacy, address, subtlety; sickness, pain, loss, disaster, snares of enemies; self-confidence, will; the Querent, if male.',
 'Secrets, mystery, the future as yet unrevealed; the woman who interests the Querent, if male; the Querent herself, if female; silence, tenacity; mystery, wisdom, science.',
 'Fruitfulness, action, initiative, length of days; the unknown, clandestine; also difficulty, doubt, ignorance.',
 'Stability, power, protection, realization; a great person; aid, reason, conviction; also authority and will.',
 'Marriage, alliance, captivity, servitude; by another account, mercy and goodness; inspiration; the man to whom the Querent has recourse.']

In [259]:
len(kw_light)

78

In [260]:
kw_dark = df.description_envers.tolist()

In [261]:
kw_dark[:5]

['Physician, Magus, mental disease, disgrace, disquiet.',
 'Passion, moral or physical ardour, conceit, surface knowledge.',
 'Light, truth, the unravelling of involved matters, public rejoicings; according to another reading, vacillation.',
 'Benevolence, compassion, credit; also confusion to enemies, obstruction, immaturity.',
 'Society, good understanding, concord, overkindness, weakness.']

In [262]:
more_keywords = list(set(kw_light + kw_dark))

In [263]:
len(more_keywords)

156

In [264]:
more_keywords.pop(1)

'The card foretells material trouble above all, whether in the form illustrated--that is, destitution--or otherwise. For some cartomancists, it is a card of love and lovers-wife, husband, friend, mistress; also concordance, affinities. These alternatives cannot be harmonized.'

In [265]:
more_keywords

['Evil, suspicion, suspense, fear, mistrust.',
 'Destiny, fortune, success, elevation, luck, felicity.',
 'Love, passion, friendship, affinity, union, concord, sympathy, the interrelation of the sexes, and--as a suggestion apart from all offices of divination--that desire which is not in Nature, but by which Nature is sanctified.',
 'Economy, moderation, frugality, management, accommodation.',
 'Litigation, disputes, trickery, contradiction.',
 'Repose of the false heart, indignation, violence.',
 'House of the true heart, joy, content, abode, nourishment, abundance, fertility; Holy Table, felicity hereof.',
 'Degradation, destruction, revocation, infamy, dishonour, loss, with the variants and analogues of these.',
 'Mental alienation, error, loss, distraction, disorder, confusion.',
 "The card has been so designed that it can cover several significations; on the surface, it is a victor triumphing, but it is also great news, such as might be carried in state by the King's courier; it i

Going to add both of these to descriptions

In [266]:
tarot_additions = list(set(list(more_keywords + df.description.tolist())))
len(tarot_additions)

233

In [268]:
#with open("/data_files/texts/fortunes.txt", mode='a', encoding='utf-8') as myfile:
#    myfile.write('\n'.join(tarot_additions))

### Alan Watts Quotes

In [269]:
url = 'https://www.goodreads.com/author/quotes/1501668.Alan_W_Watts?page=1'

response = requests.get(url)
web_page = response.text
soup = BeautifulSoup(web_page, 'html.parser')

regex = re.compile(r"(.*?)")

#page = [d.text for d in soup.find_all('div', class_='quoteText')]
page = [''.join(regex.findall(d.text)).split('―')[0].strip()[1:-1] 
        for d in soup.find_all('div', class_='quoteText')]

In [270]:
page[:5]

['Trying to define yourself is like trying to bite your own teeth.',
 'Man suffers only because he takes seriously what the gods made for fun.',
 'We seldom realize, for example that our most private thoughts and emotions are not actually our own. For we think in terms of languages and images which we did not invent, but which were given to us by our society.',
 'The meaning of life is just to be alive. It is so plain and so obvious and so simple. And yet, everybody rushes around in a great panic as if it were necessary to achieve something beyond themselves.',
 'This is the real secret of life -- to be completely engaged with what you are doing in the here and now. And instead of calling it work, realize it is play.']

In [271]:
def scrape_alan_watts():
    
    quotes = []
    i = 1
    
    url_prefix = 'https://www.goodreads.com/author/quotes/1501668.Alan_W_Watts?page='
    
    print("Starting scrape...\n")
    
    for i in range(1, 39):
        url = url_prefix + str(i)
        
        response = requests.get(url)
        web_page = response.text
        soup = BeautifulSoup(web_page, 'html.parser')
        
        # get page of quotes
        regex = re.compile(r"(.*?)")
        page = [''.join(regex.findall(d.text)).split('―')[0].strip()[1:-1] 
                for d in soup.find_all('div', class_='quoteText')]
        
        quotes.append(page)
        
        print(f'\tPage {i} scraped!')
        
    print('\nScrape complete!')
    
    quote_list = [q for sublist in quotes for q in sublist]
    return quote_list

In [272]:
watts = scrape_alan_watts()

Starting scrape...

	Page 1 scraped!
	Page 2 scraped!
	Page 3 scraped!
	Page 4 scraped!
	Page 5 scraped!
	Page 6 scraped!
	Page 7 scraped!
	Page 8 scraped!
	Page 9 scraped!
	Page 10 scraped!
	Page 11 scraped!
	Page 12 scraped!
	Page 13 scraped!
	Page 14 scraped!
	Page 15 scraped!
	Page 16 scraped!
	Page 17 scraped!
	Page 18 scraped!
	Page 19 scraped!
	Page 20 scraped!
	Page 21 scraped!
	Page 22 scraped!
	Page 23 scraped!
	Page 24 scraped!
	Page 25 scraped!
	Page 26 scraped!
	Page 27 scraped!
	Page 28 scraped!
	Page 29 scraped!
	Page 30 scraped!
	Page 31 scraped!
	Page 32 scraped!
	Page 33 scraped!
	Page 34 scraped!
	Page 35 scraped!
	Page 36 scraped!
	Page 37 scraped!
	Page 38 scraped!

Scrape complete!


In [273]:
len(watts)

1140

In [274]:
watts[:10]

['Trying to define yourself is like trying to bite your own teeth.',
 'Man suffers only because he takes seriously what the gods made for fun.',
 'We seldom realize, for example that our most private thoughts and emotions are not actually our own. For we think in terms of languages and images which we did not invent, but which were given to us by our society.',
 'The meaning of life is just to be alive. It is so plain and so obvious and so simple. And yet, everybody rushes around in a great panic as if it were necessary to achieve something beyond themselves.',
 'This is the real secret of life -- to be completely engaged with what you are doing in the here and now. And instead of calling it work, realize it is play.',
 'Muddy water is best cleared by leaving it alone.',
 'Advice? I don’t have advice. Stop aspiring and start writing. If you’re writing, you’re a writer. Write like you’re a goddamn death row inmate and the governor is out of the country and there’s no chance for a pardon

In [277]:
  with open("./data_files/texts/fortunes.txt", mode='a', encoding='utf-8') as myfile:
      myfile.write('\n'.join(watts))