Scraping Wikipedia for the S&P 500

The wikipedia page of the S&P 500 contains a list of all tickers currently in the index. After having a look at the underlying HTML code, I came up with the following python script:

import bs4 as bs
import requests

resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', { 'class': 'wikitable sortable'})
for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[0].text
    print(ticker)

Which retrieves the first table and prints the text content of the first cell of every row, except the first one (Because is only contains the table header). If you like to have this information inside python you can append the tickers to a list and do some further analysis.

Additional notes: The DAX 30

Just in case you have problems adopting this script to other pages, here is a script which scrapes the DAX 30 index.

import bs4 as bs
import requests

resp = requests.get('https://en.wikipedia.org/wiki/DAX')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', { 'class': 'wikitable sortable'})
for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[3].text
    print(ticker)

This page has the tickers written on the fourth column.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2017-04-24-retrieve-sp500-tickers.markdown

2017-04-24-retrieve-sp500-tickers.markdown

Scraping Wikipedia for the S&P 500

Additional notes: The DAX 30

Files

2017-04-24-retrieve-sp500-tickers.markdown

Latest commit

History

2017-04-24-retrieve-sp500-tickers.markdown

File metadata and controls

Scraping Wikipedia for the S&P 500

Additional notes: The DAX 30