# <font color='violet'> Scrape Erowid for Narrative Experience Reports
    
Here, I'll create a dataframe out of information from Erowid, which has a large "experience vault," where there are thousands of narrative descriptions of psychoactive drugs that could be compared with ratings and reviews of prescription psych meds using the model I created based on more formal psychiatric studies. 

In [19]:
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
import pandas as pd

<font color='violet'> Explore the basic structure of the front page of the experience vault.
    
Get a list of drug names, then use that list to extract links associated with those names. The links can then be followed to find the narratives connected to each drug. 

In [2]:
front_url = 'https://erowid.org/experiences/exp_list.shtml'
front_page = requests.get(front_url)
front_soup = BeautifulSoup(front_page.content, "html.parser")
front_pretty = front_soup.prettify().splitlines()
front_pretty[:500]

['<html>',
 ' <head>',
 '  <title>',
 '   Complete Substance and Category List : Erowid Experience Vaults',
 '  </title>',
 '  <meta content="The full list of substances and categories covered by Erowid\'s collection of first-hand experience reports with psychoactive plants and drugs." name="description"/>',
 '  <meta content="Experience Report Vaults, trip reports, stories, descriptions" name="keywords"/>',
 '  <link href="/includes/general_default.css" rel="stylesheet" type="text/css"/>',
 '  <link href="includes/exp.css" rel="stylesheet" type="text/css"/>',
 '  <script src="/includes/javascript/jquery-3.2.1.min.js" type="text/javascript">',
 '  </script>',
 '  <script language="javascript" src="/includes/javascript/erowid_combobox_lib.js" type="text/javascript">',
 '  </script>',
 '  <script language="javascript" src="/includes/javascript/external/mobile-detect.min.js" type="text/javascript">',
 '  </script>',
 ' </head>',
 ' <body alink="#008080" bgcolor="#000000" link="#7777AA" te

It appears as though the name attribute, when inside an a element, only connects with a letter of the alphabet or with drug names. I can get a list of all drugs by pulling the contents of the name attributes into a list and then just deleting the single letters "A," "B," "C," etc.

In [3]:
a_element_name = front_soup.find_all('a', attrs={"name": True})
drug_names = []
for result in a_element_name:
    drug_names.append(result.attrs['name'])
drug_names

['A',
 'AB-001',
 'AB-CHMINACA',
 'AB-FUBINACA',
 'Absinthe',
 'Acacia',
 'Acacia confusa',
 'Acacia maidenii',
 'Acacia phlebophylla',
 'Acepromazine',
 'Acetaminophen',
 'Acetildenafil',
 'Acetylfentanyl',
 'Aconitum napellus',
 'ADB-FUBINACA',
 'Adrafinil',
 'Adrenochrome',
 'AET',
 'AH-7921',
 'AL-LAD',
 'Albizia julibrissin',
 'Alcohol',
 'Alcohol - Beer/Wine',
 'Alcohol - Hard',
 'ALD-52',
 'ALEPH',
 'Aleph-4',
 'Allylescaline',
 'Aloes',
 'alpha-GPC',
 'alpha-PCYP',
 'alpha-PHiP',
 'alpha-PHP',
 'alpha-PVP',
 'AM-2201',
 'AM-DIPT',
 'Amanitas',
 'Amanitas - A. muscaria',
 'Amanitas - A. pantherina',
 'Amphetamines',
 'Amphetamines - Substituted',
 'AMT',
 'Anabolic Steroids',
 'Anadenanthera colubrina',
 'Anadenanthera peregrina',
 'Anadenanthera spp.',
 'Animals',
 'Animals - Black Widow Spider',
 'Animals - Fire Ants',
 'Animals - Frogs',
 'Aniracetam',
 'AP-238',
 'Argemone spp. ',
 'Armodafinil',
 'Arundo donax',
 'Arylcyclohexylamines',
 'Ashwagandha',
 'Aspirin',
 'Atropin

In [4]:
# Get a list of just psychedelic drugs of interest to me. 
psychedelic_drugs = ['AET', 'AL-LAD', 'ALD-52', 'ALEPH', 'Aleph-4', 'Allylescaline',
                     'AMT', 'Arylcyclohexylamines', 'Ayahuasca', 'Banisteriopsis caapi', 
                     'BOD', 'BOH-2C-B', 'Bufotenin', 'Cacti - Mescaline-containing', 'DALT', 
                     'Deschloroketamine', 'DET', 'DiPT', 'DMT', 'DMT-Containing', 'DMXE', 
                     'DOB', 'DOC', 'DOET', 'DOF', 'DOI', 'DOIP', 'DOM', 'DON', 'DOPR', 'DPT', 
                     'EIPLA', 'EPT', 'Escaline', 'ETH-LAD', 'Fluorexetamine', 'H.B. Woodrose',
                     'Harmaline', 'Harmine', 'Herbal Ecstasy', 'HOT-17', 'HOT-2', 'HOT-7',
                     'Huasca Brew', 'Huasca Brew Group', 'Huasca Combo', 'Huasca Group', 'HXE',
                     'Iboga Alkaloid Group', 'Ibogaine', 'Isoproscaline', 'Ketamine', 'LSA',
                     'LSD', 'LSM-775', 'LSZ', 'MALT', 'MDA', 'MDAI', 'MDE', 'MDMA', 'MEM',
                     'Mescaline', 'MET', 'Methallylescaline', 'Methoxetamine', 
                     'Methoxpropamine', 'Mimosa ophthalmocentra', 'Mimosa spp.',
                     'Mimosa tenuiflora', 'MIPLA', 'MIPT', 'MMDA', 'MMDA-3a', 'MPT',
                     'Mushrooms', 'Mushrooms - G. spectabilis', 'Mushrooms - P. atlantis',
                     'Mushrooms - P. azurescens', 'Mushrooms - P. cubensis', 
                     'Mushrooms - P. cyanescens', 'Mushrooms - P. mexicana',
                     'Mushrooms - P. semilanceata', 'Mushrooms - P. subaeruginosa',
                     'Mushrooms - P. tampanensis', 'Mushrooms - P. weilii',
                     'Mushrooms - Panaeolus cyanescens', 'MXiPr', 'PCE', 'PCP', 'Peyote',
                     'Phenethylamine', 'Phenethylamines', 'Phenethylamines - Other',
                     'PIPT', 'Proscaline', 'Psilocin', 'Psilocybin', 'S-Ketamine',
                     'Tabernanthe iboga', 'TCB-2', 'Tetrahydroharmine', 'TMA', 'TMA-2', 
                     'TMA-6', 'Tryptamines - Substituted', '1B-LSD', '1cP-AL-LAD', '1cP-LSD',
                     '1F-LSD', '1P-ETH-LAD', '1P-LSD', '1V-LSD', "2'-Oxo-PCE",
                     '2-Fluorodeschloroketamine', '2-Me-DMT', '2C-B', '2C-B-Fly', '2C-C',
                     '2C-CN', '2C-D', '2C-E', '2C-EF', '2C-G-N', '2C-H', '2C-I', '2C-IP',
                     '2C-N', '2C-P', '2C-T', '2C-T-13', '2C-T-2', '2C-T-21', '2C-T-4', 
                     '2C-T-7', '2C-TFM', '3,4-MD-PCP', '3-Cl-PCP', '3-HO-PCE', '3-HO-PCP',
                     '3-Me-PCE', '3-Me-PCPy', '3-MEO-PCE', '3-MeO-PCMo', '3-MeO-PCP',
                     '3-Methyl-PCP', '3C-E', '3C-P', '3F-PCP', '4-AcO-DALT', '4-AcO-DET',
                     '4-AcO-DiPT', '4-AcO-DMT', '4-AcO-DPT', '4-AcO-EIPT', '4-AcO-EPT',
                     '4-AcO-MALT', '4-AcO-MET', '4-AcO-MiPT', '4-AcO-MPT', '4-HO-DET',
                     '4-HO-DiPT', '4-HO-DPT', '4-HO-EPT', '4-HO-MALT', '4-HO-MCPT', '4-HO-MET',
                     '4-HO-MiPT', '4-HO-MPT', '4-HO-PIPT', '4-MeO-DMT', '4-MeO-MiPT',
                     '4-MeO-PCP', '4-MTA', '4-PrO-DMT', '4C-D', '5-Chloro-AMT', '5-MeO-AET',
                     '5-MeO-AMT', '5-MeO-DALT', '5-MeO-DET', '5-MeO-DiPT', '5-MeO-DMT', 
                     '5-MeO-DPT', '5-MeO-EIPT', '5-MeO-MALT', '5-MeO-MET', '5-MeO-MIPT',
                     '5-MeO-PIPT', '5-MeO-TMT', '5-Methoxy-Tryptamine']
len(psychedelic_drugs)

191

create list of links associated with these drugs. The format is: 
https://erowid.org/experiences/subs/exp_<DRUG>.shtml

In [5]:
# Dashes and periods need to be deleted and spaces need to be replaced with underscores
for drug in psychedelic_drugs:
    no_dash = drug.replace('-', '')
    no_period = no_dash.replace('.', '')
    no_double_space = no_period.replace('  ', '_')
    no_space = no_double_space.replace(' ', '_')
    for i in range(len(psychedelic_drugs)):
        if psychedelic_drugs[i] == drug:
            psychedelic_drugs[i] = no_space
psychedelic_drugs

['AET',
 'ALLAD',
 'ALD52',
 'ALEPH',
 'Aleph4',
 'Allylescaline',
 'AMT',
 'Arylcyclohexylamines',
 'Ayahuasca',
 'Banisteriopsis_caapi',
 'BOD',
 'BOH2CB',
 'Bufotenin',
 'Cacti_Mescalinecontaining',
 'DALT',
 'Deschloroketamine',
 'DET',
 'DiPT',
 'DMT',
 'DMTContaining',
 'DMXE',
 'DOB',
 'DOC',
 'DOET',
 'DOF',
 'DOI',
 'DOIP',
 'DOM',
 'DON',
 'DOPR',
 'DPT',
 'EIPLA',
 'EPT',
 'Escaline',
 'ETHLAD',
 'Fluorexetamine',
 'HB_Woodrose',
 'Harmaline',
 'Harmine',
 'Herbal_Ecstasy',
 'HOT17',
 'HOT2',
 'HOT7',
 'Huasca_Brew',
 'Huasca_Brew_Group',
 'Huasca_Combo',
 'Huasca_Group',
 'HXE',
 'Iboga_Alkaloid_Group',
 'Ibogaine',
 'Isoproscaline',
 'Ketamine',
 'LSA',
 'LSD',
 'LSM775',
 'LSZ',
 'MALT',
 'MDA',
 'MDAI',
 'MDE',
 'MDMA',
 'MEM',
 'Mescaline',
 'MET',
 'Methallylescaline',
 'Methoxetamine',
 'Methoxpropamine',
 'Mimosa_ophthalmocentra',
 'Mimosa_spp',
 'Mimosa_tenuiflora',
 'MIPLA',
 'MIPT',
 'MMDA',
 'MMDA3a',
 'MPT',
 'Mushrooms',
 'Mushrooms_G_spectabilis',
 'Mushrooms_

In [6]:
# Create strings for urls
drug_urls = []
for drug in psychedelic_drugs:
    drug_urls.append('https://erowid.org/experiences/subs/exp_' + drug + '.shtml')
drug_urls[:5]

['https://erowid.org/experiences/subs/exp_AET.shtml',
 'https://erowid.org/experiences/subs/exp_ALLAD.shtml',
 'https://erowid.org/experiences/subs/exp_ALD52.shtml',
 'https://erowid.org/experiences/subs/exp_ALEPH.shtml',
 'https://erowid.org/experiences/subs/exp_Aleph4.shtml']

These work correctly. Navigagte to each page and gather the link to "Show All" experience reports. 

<font color='violet'> Explore the structure of a drug's page.

In [7]:
aet_url = 'https://erowid.org/experiences/subs/exp_AET.shtml'
aet_page = requests.get(aet_url)
aet_soup = BeautifulSoup(aet_page.content, "html.parser")
aet_pretty = aet_soup.prettify().splitlines()
aet_pretty[:100]

['<html>',
 ' <head>',
 '  <title>',
 '   AET (also Alpha-ethyltryptamine; Monase) : Erowid Exp: Main Index',
 '  </title>',
 '  <meta content="A categorized index of first-person experiences with AET" name="description"/>',
 '  <meta content="Experience Report Vaults, trip reports, stories, descriptions" name="keywords"/>',
 '  <link href="/includes/general_default.css" rel="stylesheet" type="text/css"/>',
 '  <link href="includes/exp.css" rel="stylesheet" type="text/css"/>',
 '  <script src="/includes/javascript/jquery-3.2.1.min.js" type="text/javascript">',
 '  </script>',
 '  <script language="javascript" src="/includes/javascript/erowid_combobox_lib.js" type="text/javascript">',
 '  </script>',
 '  <script language="javascript" src="/includes/javascript/external/mobile-detect.min.js" type="text/javascript">',
 '  </script>',
 ' </head>',
 ' <body alink="#008080" bgcolor="#000000" link="#7777AA" text="#999977" vlink="#999999">',
 '  <table align="CENTER" border="0" cellpadding="0" 

The href for the link to all of a drug's reports will be inside the a element where the img alt text = "Show All Reports."

In [8]:
# Find the correct href on one of the pages
aet_soup.find('img', attrs={'alt':'Show New Reports'}).parent['href']

'/experiences/exp.cgi?New&S1=299'

In [9]:
# Collect all hrefs
vault_hrefs = []
bad_urls = []

for url in tqdm(drug_urls):
    # Some urls could be wrong if I didn't change the drug names properly. 
    try:
        drug_page = requests.get(url)
        drug_soup = BeautifulSoup(drug_page.content, "html.parser")
        href = drug_soup.find('img', attrs={'alt':'Show New Reports'}).parent['href']
        vault_hrefs.append(href)
    except: bad_urls.append(url)

vault_hrefs[:5]

100%|██████████| 191/191 [07:43<00:00,  2.43s/it]


['/experiences/exp.cgi?New&S1=299',
 '/experiences/exp.cgi?New&S1=603',
 '/experiences/exp.cgi?New&S1=748',
 '/experiences/exp.cgi?New&S1=807',
 '/experiences/exp.cgi?New&S1=557']

In [10]:
len(bad_urls)

0

In [11]:
# Turn hrefs into proper urls
vault_urls = []
for href in vault_hrefs:
    vault_urls.append('https://erowid.org' + href)
vault_urls[:5]

['https://erowid.org/experiences/exp.cgi?New&S1=299',
 'https://erowid.org/experiences/exp.cgi?New&S1=603',
 'https://erowid.org/experiences/exp.cgi?New&S1=748',
 'https://erowid.org/experiences/exp.cgi?New&S1=807',
 'https://erowid.org/experiences/exp.cgi?New&S1=557']

The pages in the vault_urls list are now just full of direct links to each experience report. Gather the links to all the reports. 

Check out how to do this using just one of the pages, for the drug 5-MEO-DALT

In [12]:
dalt_vault_url = 'https://erowid.org/experiences/exp.cgi?S1=321'
dalt_page = requests.get(dalt_vault_url)
dalt_soup = BeautifulSoup(dalt_page.content, "html.parser")
dalt_pretty = dalt_soup.prettify().splitlines()
dalt_pretty[:200]

['<html>',
 ' <head>',
 '  <title>',
 '   Search Results : Erowid Experience Vaults',
 '  </title>',
 '  <meta content="Erowid Experience Vaults: An Experience" name="description"/>',
 '  <meta content="Experience Report Vaults, trip reports, stories, descriptions" name="keywords"/>',
 '  <link href="/includes/general_default.css" rel="stylesheet" type="text/css"/>',
 '  <link href="includes/exp.css" rel="stylesheet" type="text/css"/>',
 '  <!-- Sperowider <noindex/> -->',
 '  <script src="/includes/javascript/jquery-3.2.1.min.js" type="text/javascript">',
 '  </script>',
 '  <script language="javascript" src="/includes/javascript/erowid_combobox_lib.js" type="text/javascript">',
 '  </script>',
 '  <script language="javascript" src="/includes/javascript/external/mobile-detect.min.js" type="text/javascript">',
 '  </script>',
 ' </head>',
 ' <body alink="#008080" bgcolor="#000000" link="#7777AA" text="#999977" vlink="#999999">',
 '  <table align="CENTER" border="0" cellpadding="0" cell

There's only one mention of colspan=3; it's an attribute of a td tag for a table, and every single link inside the table (hrefs located inside a tags) is one that I want to collect. 

In [13]:
# Try with one page first
dalt_reports = dalt_soup.find('td', attrs={'colspan':3}).find_all('a')
dalt_hrefs = []
for a in range(len(dalt_reports)):
    href = dalt_reports[a]['href']
    dalt_hrefs.append(href)
dalt_hrefs[:5]    

['exp.php?ID=105518',
 'exp.php?ID=86869',
 'exp.php?ID=86866',
 'exp.php?ID=37775',
 'exp.php?ID=35721']

In [14]:
# That worked. How many reports were linked on this one page?
len(dalt_hrefs)

71

In [16]:
# work through all links in vault_urls to get all hrefs 
report_hrefs = []
for url in tqdm(vault_urls):
    this_page = requests.get(url)
    this_soup = BeautifulSoup(this_page.content, "html.parser")
    this_reports = this_soup.find('td', attrs={'colspan':3}).find_all('a')
    for a in range(len(this_reports)):
        href = this_reports[a]['href']
        report_hrefs.append(href)    
report_hrefs[:5]

100%|██████████| 191/191 [09:11<00:00,  2.89s/it]


['exp.php?ID=116975',
 'exp.php?ID=116935',
 'exp.php?ID=116610',
 'exp.php?ID=35291',
 'exp.php?ID=87694']

In [17]:
# How many total reports are there?
len(href)

17

In [18]:
# Turn these hrefs into proper urls
report_urls = []
for href in report_hrefs:
    report_urls.append('https://erowid.org/experiences/' + href)
report_urls[:5]

['https://erowid.org/experiences/exp.php?ID=116975',
 'https://erowid.org/experiences/exp.php?ID=116935',
 'https://erowid.org/experiences/exp.php?ID=116610',
 'https://erowid.org/experiences/exp.php?ID=35291',
 'https://erowid.org/experiences/exp.php?ID=87694']

I now have a url for each experience report. 

Each experience report page could have information about drugs the person was on, the dose they took, their body weight, the year of their experience, their gender, age at time of experience, a title, an alias for the author, and the narrative itself.

I won't need all of this information to meet my primary objective of assigning a rating based on the narrative content, but it would be interesting to explore some of the other detials as well, so pull everything into a dataframe. 

<font color='violet'> Figure out how to turn each report page's contents into a row of a dataframe, with page elements as columns.
    
There are tables on these pages, but most of the information is not in a table. There may be more efficient ways, but I'll start out by just pulling various elements separately and then joining. 

In [37]:
# Try with just the first report url
trial_df = pd.read_html('https://erowid.org/experiences/exp.php?ID=116975')
trial_df

[                                                   0
 0  #message { background: #afbFf0; width: 610px; ...,
     0   1
 0 NaN NaN,
        0         1       2         3
 0  DOSE:  repeated  smoked       DMT
 1    NaN  repeated  smoked  Cannabis,
               0       1
 0  BODY WEIGHT:  102 kg,
                                                    0  \
 0                                     Exp Year: 2022   
 1                                       Gender: Male   
 2                      Age at time of experience: 22   
 3                            Published: Jan 28, 2023   
 4  [ View as PDF (for printing) ] [ View as LaTeX...   
 5  DMT (18), Cannabis (1) : First Times (2), Comb...   
 
                                                    1  
 0                                      ExpID: 116975  
 1                                                NaN  
 2                                                NaN  
 3                                          Views: 67  
 4  [ View as PDF (fo

In [38]:
# I want information out of dfs 2, 3, 4 
trial_df[2]

Unnamed: 0,0,1,2,3
0,DOSE:,repeated,smoked,DMT
1,,repeated,smoked,Cannabis


In [46]:
drugs_reviewed = trial_df[2][3].to_list()
drugs_reviewed

['DMT', 'Cannabis']

In [47]:
trial_df[3]

Unnamed: 0,0,1
0,BODY WEIGHT:,102 kg


In [63]:
weight = trial_df[3][1].to_list()
weight

['102 kg']

In [50]:
trial_df[4]

Unnamed: 0,0,1
0,Exp Year: 2022,ExpID: 116975
1,Gender: Male,
2,Age at time of experience: 22,
3,"Published: Jan 28, 2023",Views: 67
4,[ View as PDF (for printing) ] [ View as LaTeX...,[ View as PDF (for printing) ] [ View as LaTeX...
5,"DMT (18), Cannabis (1) : First Times (2), Comb...","DMT (18), Cannabis (1) : First Times (2), Comb..."


In [53]:
remainiing_relevant_info = trial_df[4][0][0:3]
remainiing_relevant_info

0                   Exp Year: 2022
1                     Gender: Male
2    Age at time of experience: 22
Name: 0, dtype: object

In [57]:
year = remainiing_relevant_info[0]
year

'Exp Year: 2022'

In [61]:
gender = remainiing_relevant_info[1]
gender

'Gender: Male'

In [62]:
age = remainiing_relevant_info[2]
age

'Age at time of experience: 22'

I can clean these up as part of the process of adding them to a dataframe, i.e. turn 'Exp Year: 2022' into just '2022.' I'll be creating duplicates, the way the reviews in the study came to me, but I can just deal with that later. Here, for example, the cannabis row will just be deleted because it's not one of the drugs in my list of target drugs. 

In [67]:
trial_df = pd.DataFrame({'drug':drugs_reviewed, 'weight':weight[0], 
                         'year':year.replace('Exp Year: ', ''), 
                         'gender':gender.replace('Gender: ', ''), 
                         'age':age.replace('Age at time of experience: ', '')})
trial_df

Unnamed: 0,drug,weight,year,gender,age
0,DMT,102 kg,2022,Male,22
1,Cannabis,102 kg,2022,Male,22


In [77]:
# Pull the text of the report into the dataframe; it's inside a div.
trial_url = 'https://erowid.org/experiences/exp.php?ID=116975'
trial_page = requests.get(trial_url)
trial_soup = BeautifulSoup(trial_page.content, "html.parser")
trial_text = trial_soup.find('div', attrs={'class':'report-text-surround'}).get_text()
trial_text

"\n\n\xa0\n\n\n\n\nDOSE:\n\xa0 repeated\nsmoked\nDMT\n\n\n\xa0\n\xa0 repeated\nsmoked\nCannabis\n\n\n\n\n\nBODY WEIGHT:\n102 kg\n\n\n\n\nSeeing my Buddha-Nature on DMT \r\n\nI am a 22 year old male around 102kg. What I am about to tell you is my experience of using DMT for the first time. I took around 100-150mg of DMT about a month ago. I have no clue as to what the exact dosage is I have no clue as to what the exact dosage is, because I eventually started eyeballing it trying to take bigger dosages in my attempt to “break through”, which I believe was unsuccessful. My only other psychedelic experience is LSD which I tripped heavily on around a year ago, but I stopped completely a couple months before this experience. I am writing this from memory so all of the details may not be 100 percent accurate. \r\n\nThis trip happened about a month ago. I decided to try DMT for the first time because the guy I usually see had it and it tested clean. I also live in student accommodation and eve

Some information from the tables and a bunch of formatting characters remain, but I can clean those out of all the strings later, given that I don't know of a better way to use beautiful soup to do it now because the text is not inside its own element apart from the div that also contains all this other stuff. 

Add this review to the dataframe I started. 

In [78]:
trial_df['report'] = trial_text
trial_df

Unnamed: 0,drug,weight,year,gender,age,report
0,DMT,102 kg,2022,Male,22,\n\n \n\n\n\n\nDOSE:\n repeated\nsmoked\nDMT\...
1,Cannabis,102 kg,2022,Male,22,\n\n \n\n\n\n\nDOSE:\n repeated\nsmoked\nDMT\...


I can build from this, if I streamline the process I used to creat it. I can creat other mini-dfs and concatenate them onto this one. 

<font color='violet'> Create a function for building a df from a drug report page.

In [81]:
# Put together all the steps I just took
def page_to_df(url):
    mini_df = pd.read_html(url)
    drugs_reviewed = mini_df[2][3].to_list()
    weight = mini_df[3][1].to_list()
    remainiing_relevant_info = mini_df[4][0][0:3]
    year = remainiing_relevant_info[0]
    gender = remainiing_relevant_info[1]
    age = remainiing_relevant_info[2]
    mini_df = pd.DataFrame({'drug':drugs_reviewed, 'weight':weight[0], 
                         'year':year.replace('Exp Year: ', ''), 
                         'gender':gender.replace('Gender: ', ''), 
                         'age':age.replace('Age at time of experience: ', '')})
    this_page = requests.get(trial_url)
    this_soup = BeautifulSoup(trial_page.content, "html.parser")
    this_text = trial_soup.find('div', attrs={'class':'report-text-surround'}).get_text()
    mini_df['report'] = this_text
    return mini_df

# See if I get the same result as trial_df above
starter_df = page_to_df('https://erowid.org/experiences/exp.php?ID=116975')

That worked. 

<font color='violet'> Build a full dataframe

Start with what I already created, build from there.   

In [None]:
df = starter_df

for url in tqdm(report_urls):
    new_rows = page_to_df(url)
    df = pd.concat([df, new_rows])

df.info()

  0%|          | 0/36 [00:00<?, ?it/s]