We will need requests and beautifulsoup
```
$ pip install requests
$ pip install beautifulsoup4
```

In [1]:
import requests
from bs4 import BeautifulSoup

## Download the HTML

Requests allows us to send HTTP requests.  We will provide a URL string to `requests.get()` and be returned with a response object.  The response object contains the server (websites) response to the the HTTP request. 

In [2]:
url = 'http://www.fieldexperiments.com/papers/'
page = requests.get(url)

In [3]:
page

<Response [200]>

200 is the OK status response code, meaning success!

In [None]:
page.content

## Parse the reponse content

In [None]:
soup = BeautifulSoup(page.content, "html.parser")
print(soup.prettify())

### Navigating the HTML

In [6]:
soup.title  

<title>Field Experiments</title>

In [7]:
soup.title.string

'Field Experiments'

We can create a list of all a certain type of html tags using find_all, e.g. 'p', 'a', 'div'...

In [None]:
#soup.find_all('div')
soup.find_all('a')

### Find element by 'id'

In [9]:
container = soup.find(id='accordion')

Let's get a list of all the elements with CSS class name 'panel'.

In [10]:
paperList = container.find_all('div', class_='panel')

In [None]:
paperList

In [12]:
first = paperList[0]

First we get the title using the "a" tag.

In [13]:
title = first.find('a').text
title

"2020: A Summary Of Artefactual Field Experiments On Fieldexperiments.Com: The Who's, What's, Where's, And When's"

Next we can use `find_all` to get a list of authors

In [None]:
first

In [15]:
authorsList = first.find_all(attrs={'name': 'citation_author'})
authorsList[0].attrs

{'name': 'citation_author', 'content': 'List John A'}

In [16]:
first_author = authorsList[0]['content']
first_author

'List John A'

Since we know each paper only has one publication date, we can use 'find'

In [17]:
yearMeta = first.find(attrs={'name': 'citation_publication_date'})
yearMeta.attrs

{'name': 'citation_publication_date', 'content': '2020'}

In [18]:
year = yearMeta['content']

In [19]:
d = []
for paper in paperList:
    title = paper.find('a').text
    
    authorsList = paper.find_all(attrs={'name': 'citation_author'})
    first_author = authorsList[0]['content']
    
    yearMeta = paper.find(attrs={'name': 'citation_publication_date'})
    year = yearMeta['content']
    

    tempDict = dict(
        title=title,
        first_author=first_author,
        year=year
    )
    d.append(tempDict)
    
d

[{'title': "2020: A Summary Of Artefactual Field Experiments On Fieldexperiments.Com: The Who's, What's, Where's, And When's",
  'first_author': 'List John A',
  'year': '2020'},
 {'title': "2020: A Summary Of Framed Field Experiments On Fieldexperiments.Com: The Who's, What's Where's, And When's",
  'first_author': 'List John A',
  'year': '2020'},
 {'title': '2020 Summary Data Of Natural Field Experiments Published On Fieldexperiments.Com',
  'first_author': 'List John A',
  'year': '2020'},
 {'title': '2021 Summary Data Of Artefactual Field Experiments Published On Fieldexperiments.Com',
  'first_author': 'List John A',
  'year': '2022'},
 {'title': '2021 Summary Data Of Natural Field Experiments Published On Fieldexperiments.Com',
  'first_author': 'List John A',
  'year': ''},
 {'title': 'Academic Economists Behaving Badly? A Survey On Three Areas Of Unethical Behavior',
  'first_author': 'Bailey Charles ',
  'year': '2001'},
 {'title': 'Achievement Awards For High School Matricul

## Export to csv

Since we have the data stored as a list of dictionaries, we can easily create a dataframe object of our data and then write it to csv.


In [20]:
import pandas as pd

df = pd.DataFrame(d)
df

Unnamed: 0,title,first_author,year
0,2020: A Summary Of Artefactual Field Experimen...,List John A,2020.0
1,2020: A Summary Of Framed Field Experiments On...,List John A,2020.0
2,2020 Summary Data Of Natural Field Experiments...,List John A,2020.0
3,2021 Summary Data Of Artefactual Field Experim...,List John A,2022.0
4,2021 Summary Data Of Natural Field Experiments...,List John A,
5,Academic Economists Behaving Badly? A Survey O...,Bailey Charles,2001.0
6,Achievement Awards For High School Matriculati...,Angrist Joshua D,2003.0
7,Actions And Beliefs: Estimating Distribution-B...,Bellemare Charles,2005.0
8,Active Decisions And Pro-Social Behavior: A Fi...,Goette Lorenz,2007.0
9,A Dollar For Your Thoughts: Feedback-Condition...,Cabral Luis,2015.0


In [22]:
import os

csvFilePath = os.path.join(os.getcwd(), 'fe_scrape.csv')
df.to_csv(csvFilePath, index=False)