# Obtaining constituent lists of the S&P indices using Wikipedia

In this notebook, I will show how scraping data from Wikipedia can be used to obtain (historical) constituent lists of the S&P indices.

In [1]:
# First import the necessary libraries
import pandas as pd
import numpy as np

pd.set_option('display.max_rows', 150)

## Why and how use Wikipedia to obtain (historical) constituent lists of the S&P indices?

Both researchers and practitioners (e.g., analysts) regularly use (current or historical) constituent lists of the S&P stock market indices, of which the S&P 500 index is the most important. As stated on Wikipedia, this index, maintained by S&P Dow Jones Indices, (at the moment of writing this notebook) comprises 505 common stocks issued by 500 large-cap companies and traded on American stock exchanges (including the 30 companies that compose the Dow Jones Industrial Average), and covers about 80 percent of the American equity market by capitalization. Until recently, it was possible to obtain detailed information on individual companies as they moved in and out of the constituent lists over the years from the (Capital IQ) Compustat databases. Unfortunately, this is no longer possible, since the company behind the S&P indices has decided to withdraw their constituent data from these databases.

Wikipedia provides (some of) this information, however. That is, on https://en.wikipedia.org/wiki/List_of_S%26P_500_companies, you can find two tables with information about the constituents of the S&P 500 index: (a) "S&P 500 component stocks", and (b) "Selected changes to the list of S&P 500 components". Similarly, on https://en.wikipedia.org/wiki/List_of_S%26P_400_companies, you can find similar information for the constituents of the S&P (MidCap) 400 index, and on https://en.wikipedia.org/wiki/List_of_S%26P_600_companies, for the constituents of the S&P (SmallCap) 600 index, although the information about the "changes to the list of S&P 600 components" is rather limited in time.

As shown below, the contents of the tables on these websites can relatively easily be scraped using pandas' `read_html()` method. On the website concerning the S&P 500 index, the first table provides a list of its current constituents, whereas the second table can be used to adapt this list if you want the list of the constituents at another (historical) point in time. On the websites concerning the S&P 400 and S&P 600 lists, you can find similar tables, but be aware that the column names and/or the order of the tables differs slightly.

Below, I first provide code that can be used to obtain a list of the tickers of the constituents of the S&P 500 index, and to write that list of tickers to a CSV file (but as .txt file). The code asks the user to input the date (format: YYYY-MM-DD) for which he/she would like to obtain the list, and automatically corrects for possible errors that *may* occur when there are firms that have moved in *and* out of the list since the requested date. Next, I provide code to do the same for the S&P 400 and S&P 600 indices. Finally, I provide code that combines the three lists, thereby generating the constituent list of the S&P 1500 index.

One common use of these kinds of lists is to obtain data on the companies that are included in the list from the WRDS (Wharton Research Data Services) databases, such as the (Capital IQ) Compustat databases. When gathering data from these databases, you can upload this list in the databases and collect the requested data for all companies that are included in the list. For this purpose, you need a plain text (.txt) file having one code (in this case: ticker) per line, however. To obtain such a file, you can use pandas' `to_csv()` method.

## How to obtain (historical) constituent lists of the S&P 500 index using Wikipedia?

In [2]:
# Scrape the tables from the website, and print the first table
table_sp500 = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
table_sp500[0]

Unnamed: 0,Symbol,Security,SEC filings,GICS Sector,GICS Sub-Industry,Headquarters Location,Date first added,CIK,Founded
0,MMM,3M,reports,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1976-08-09,66740,1902
1,ABT,Abbott Laboratories,reports,Health Care,Health Care Equipment,"North Chicago, Illinois",1964-03-31,1800,1888
2,ABBV,AbbVie,reports,Health Care,Pharmaceuticals,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
3,ABMD,Abiomed,reports,Health Care,Health Care Equipment,"Danvers, Massachusetts",2018-05-31,815094,1981
4,ACN,Accenture,reports,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989
...,...,...,...,...,...,...,...,...,...
500,YUM,Yum! Brands,reports,Consumer Discretionary,Restaurants,"Louisville, Kentucky",1997-10-06,1041061,1997
501,ZBRA,Zebra Technologies,reports,Information Technology,Electronic Equipment & Instruments,"Lincolnshire, Illinois",2019-12-23,877212,1969
502,ZBH,Zimmer Biomet,reports,Health Care,Health Care Equipment,"Warsaw, Indiana",2001-08-07,1136869,1927
503,ZION,Zions Bancorp,reports,Financials,Regional Banks,"Salt Lake City, Utah",2001-06-22,109380,1873


In [3]:
# Print the column names of the first table
table_sp500[0].columns

Index(['Symbol', 'Security', 'SEC filings', 'GICS Sector', 'GICS Sub-Industry',
       'Headquarters Location', 'Date first added', 'CIK', 'Founded'],
      dtype='object')

In [4]:
# Print (part of) the second table (to identify the rows that need to be adapted)
table_sp500[1].head(25)

Unnamed: 0_level_0,Date,Added,Added,Removed,Removed,Reason
Unnamed: 0_level_1,Date,Ticker,Security,Ticker,Security,Reason
0,"June 4, 2021",,,HFC,HollyFrontier,Market capitalization change.[6]
1,"June 3, 2021",OGN,Organon & Co.,,,S&P 500/100 constituent Merck & Co. spun off O...
2,"May 14, 2021",CRL,Charles River Laboratories,FLIR,FLIR Systems,S&P 500 constituent Teledyne Technologies acqu...
3,"April 20, 2021",PTC,PTC,VAR,Varian Medical Systems,Siemens Healthineers acquired Varian Medical S...
4,"March 22, 2021",NXPI,NXP,FLS,Flowserve,Market capitalization change.[9]
5,"March 22, 2021",PENN,Penn National Gaming,SLG,SL Green Realty,Market capitalization change.[9]
6,"March 22, 2021",GNRC,Generac Holdings,XRX,Xerox,Market capitalization change.[9]
7,"March 22, 2021",CZR,Caesars Entertainment,VNT,Vontier,Market capitalization change.[9]
8,"February 12, 2021",MPWR,Monolithic Power Systems,FTI,TechnipFMC,TechnipFMC was removed from the S&P 500 in ant...
9,"January 21, 2021",TRMB,Trimble,CXO,Concho Resources,S&P 500/100 constituent ConocoPhillips acquire...


In [5]:
# Print the column names of the second table
table_sp500[1].columns

MultiIndex([(   'Date',     'Date'),
            (  'Added',   'Ticker'),
            (  'Added', 'Security'),
            ('Removed',   'Ticker'),
            ('Removed', 'Security'),
            ( 'Reason',   'Reason')],
           )

In [6]:
# Obtain the list of tickers of the constituents of the S&P 500 index on a particular date
inp = input("Provide the date (format: YYYY-MM-DD) for which you would like to obtain the list: ")

from datetime import datetime
table_sp500[1][(   'Date',     'Date')] =  pd.to_datetime(table_sp500[1][(   'Date',     'Date')])

added = []
for i in range(len(table_sp500[1])):
    if (table_sp500[1][(   'Date',     'Date')][i]) > datetime.strptime(inp, '%Y-%m-%d'):
        added.append(table_sp500[1][(  'Added',   'Ticker')][i])
added = [item for item in added[::-1] if str(item) != 'nan']

removed = []
for i in range(len(table_sp500[1])):
    if (table_sp500[1][(   'Date',     'Date')][i]) > datetime.strptime(inp, '%Y-%m-%d'):
        removed.append(table_sp500[1][(  'Removed',   'Ticker')][i])
removed = [item for item in removed[::-1] if str(item) != 'nan']

# Adapt the list of tickers using list functions
current_list = list(table_sp500[0].Symbol)
remove = list(added)
print("Removed tickers (as these were added after the inputted date): ", remove)
add = list(removed)
print("Added tickers (as these were removed after the inputted date): ", add)
current_list.extend(add)
historical_list = [i for i in current_list if i not in remove]

# Check whether there is overlap between the 'remove' and 'add' lists
overlap = list(set(add).intersection(set(remove)))
print("Overlap: ", overlap)

# Correction for when a ticker was in the historical list, but then removed and added again
correction = []
for item in overlap:
    indices_remove = [i for i, tupl in enumerate(remove, 0) if tupl == item]
    indices_add = [i for i, tupl in enumerate(add, 0) if tupl == item]
    if indices_remove > indices_add:
        print(indices_remove, indices_add)
        correction.append(item)
correction
historical_list.extend(correction)

# Convert the historical list into a Series, and print this Series (in alphabetical order)
lst_sp500 = pd.Series(historical_list, name='Symbol').sort_values()
print(lst_sp500)

Provide the date (format: YYYY-MM-DD) for which you would like to obtain the list: 2020-12-31
Removed tickers (as these were added after the inputted date):  ['ENPH', 'TRMB', 'MPWR', 'CZR', 'GNRC', 'PENN', 'NXPI', 'PTC', 'CRL', 'OGN']
Added tickers (as these were removed after the inputted date):  ['TIF', 'CXO', 'FTI', 'VNT', 'XRX', 'SLG', 'FLS', 'VAR', 'FLIR', 'HFC']
Overlap:  []
11        A
28      AAL
8       AAP
45     AAPL
2      ABBV
       ... 
490     YUM
492     ZBH
491    ZBRA
493    ZION
494     ZTS
Name: Symbol, Length: 505, dtype: object


In [7]:
# Write the list of tickers to a CSV file (but as .txt file)
lst_sp500.to_csv('data/lst_sp500.txt', header=None, index=None, sep=' ')

## How to obtain (historical) constituent lists of the S&P 400 index using Wikipedia?

In [8]:
# Scrape the tables from the website, and print the first table
table_sp400 = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_400_companies')
table_sp400[0]

Unnamed: 0,Security,Ticker symbol,GICS Sector,GICS Sub-Industry,SEC filings
0,Acadia Healthcare,ACHC,Health Care,Health Care Facilities,reports
1,ACI Worldwide,ACIW,Information Technology,Application Software,reports
2,Adient plc,ADNT,Consumer Discretionary,Auto Parts & Equipment,reports
3,Adtalem Global Education,ATGE,Consumer Discretionary,Education Services,reports
4,AECOM,ACM,Industrials,Construction & Engineering,reports
...,...,...,...,...,...
395,Xerox,XRX,Information Technology,"Technology Hardware, Storage & Peripherals",reports
396,Alleghany Corporation,Y,Financials,Reinsurance,reports
397,XPO Logistics,XPO,Industrials,Air Freight & Logistics,reports
398,Yelp,YELP,Communication Services,Interactive Media & Services,reports


In [9]:
# Print the column names of the first table
table_sp400[0].columns

Index(['Security', 'Ticker symbol', 'GICS Sector', 'GICS Sub-Industry',
       'SEC filings'],
      dtype='object')

In [10]:
# Print (part of) the second table (to identify the rows that need to be adapted)
table_sp400[1].head(25)

Unnamed: 0_level_0,Date,Added,Added,Removed,Removed,Reason
Unnamed: 0_level_1,Date,Ticker,Security,Ticker,Security,Reason
0,"June 15, 2021",ELY,Callaway Golf,GRUB,Grubhub,Just Eat Takeaway.com NV acquired GrubHub.[2]
1,"June 9, 2021",TRGP,Targa Resources,CLGX,CoreLogic,Stone Point Capital and Insight Partners acqui...
2,"June 9, 2021",ENV,Envestnet,TCF,TCF Financial Corporation,S&P 500 constituent Huntington Bancshares acqu...
3,"June 4, 2021",HFC,HollyFrontier,SVC,Service Properties Trust,Market capitalization change.[4]
4,"June 2, 2021",CROX,Crocs,CMD,Cantel Medical Corporation,S&P 500 constituent STERIS plc acquired Cantel...
5,"May 17, 2021",AZPN,Aspen Technology,AVNS,Avanos Medical,Market capitalization change.[6]
6,"May 14, 2021",NSA,National Storage Affiliates Trust,CRL,Charles River Laboratories,Charles River Laboratories replaced FLIR Syste...
7,"May 7, 2021",RCM,R1 RCM,PRSP,Perspecta Inc.,Veritas Capital acquired Perspecta.[7]
8,"May 3, 2021",G,Genpact,GNW,Genworth Financial,Market capitalization change.[8]
9,"April 20, 2021",LSCC,Lattice Semiconductor,PTC,PTC Inc.,PTC replaced Varian Medical Systems in the S&P...


In [11]:
# Print the column names of the second table
table_sp400[1].columns

MultiIndex([(   'Date',     'Date'),
            (  'Added',   'Ticker'),
            (  'Added', 'Security'),
            ('Removed',   'Ticker'),
            ('Removed', 'Security'),
            ( 'Reason',   'Reason')],
           )

In [12]:
# Remove the footnotes from some of the items in the (   'Date',     'Date') column
variable_split = table_sp400[1][(   'Date',     'Date')].str.split('[')
table_sp400[1][(   'Date',     'Date')] = variable_split.str.get(0)
table_sp400[1].head(25)

Unnamed: 0_level_0,Date,Added,Added,Removed,Removed,Reason
Unnamed: 0_level_1,Date,Ticker,Security,Ticker,Security,Reason
0,"June 15, 2021",ELY,Callaway Golf,GRUB,Grubhub,Just Eat Takeaway.com NV acquired GrubHub.[2]
1,"June 9, 2021",TRGP,Targa Resources,CLGX,CoreLogic,Stone Point Capital and Insight Partners acqui...
2,"June 9, 2021",ENV,Envestnet,TCF,TCF Financial Corporation,S&P 500 constituent Huntington Bancshares acqu...
3,"June 4, 2021",HFC,HollyFrontier,SVC,Service Properties Trust,Market capitalization change.[4]
4,"June 2, 2021",CROX,Crocs,CMD,Cantel Medical Corporation,S&P 500 constituent STERIS plc acquired Cantel...
5,"May 17, 2021",AZPN,Aspen Technology,AVNS,Avanos Medical,Market capitalization change.[6]
6,"May 14, 2021",NSA,National Storage Affiliates Trust,CRL,Charles River Laboratories,Charles River Laboratories replaced FLIR Syste...
7,"May 7, 2021",RCM,R1 RCM,PRSP,Perspecta Inc.,Veritas Capital acquired Perspecta.[7]
8,"May 3, 2021",G,Genpact,GNW,Genworth Financial,Market capitalization change.[8]
9,"April 20, 2021",LSCC,Lattice Semiconductor,PTC,PTC Inc.,PTC replaced Varian Medical Systems in the S&P...


In [13]:
# Obtain the list of tickers of the constituents of the S&P 400 index on a particular date
inp = input("Provide the date (format: YYYY-MM-DD) for which you would like to obtain the list: ")

from datetime import datetime
table_sp400[1][(   'Date',     'Date')] =  pd.to_datetime(table_sp400[1][(   'Date',     'Date')])

added = []
for i in range(len(table_sp400[1])):
    if (table_sp400[1][(   'Date',     'Date')][i]) > datetime.strptime(inp, '%Y-%m-%d'):
        added.append(table_sp400[1][(  'Added',   'Ticker')][i])
added = [item for item in added[::-1] if str(item) != 'nan']

removed = []
for i in range(len(table_sp400[1])):
    if (table_sp400[1][(   'Date',     'Date')][i]) > datetime.strptime(inp, '%Y-%m-%d'):
        removed.append(table_sp400[1][(  'Removed',   'Ticker')][i])
removed = [item for item in removed[::-1] if str(item) != 'nan']

# Adapt the list of tickers using list functions
current_list = list(table_sp400[0]['Ticker symbol'])
remove = list(added)
print("Removed tickers (as these were added after the inputted date): ", remove)
add = list(removed)
print("Added tickers (as these were removed after the inputted date): ", add)
current_list.extend(add)
historical_list = [i for i in current_list if i not in remove]

# Check whether there is overlap between the 'remove' and 'add' lists
overlap = list(set(add).intersection(set(remove)))
print("Overlap: ", overlap)

# Correction for when a ticker was in the historical list, but then removed and added again
correction = []
for item in overlap:
    indices_remove = [i for i, tupl in enumerate(remove, 0) if tupl == item]
    indices_add = [i for i, tupl in enumerate(add, 0) if tupl == item]
    if indices_remove > indices_add:
        print(indices_remove, indices_add)
        correction.append(item)
correction
historical_list.extend(correction)

# Convert the historical list into a Series, and print this Series (in alphabetical order)
lst_sp400 = pd.Series(historical_list, name='Symbol').sort_values()
print(lst_sp400)

Provide the date (format: YYYY-MM-DD) for which you would like to obtain the list: 2020-12-31
Removed tickers (as these were added after the inputted date):  ['BRKS', 'CPRI', 'YETI', 'STAA', 'IRDM', 'AMKR', 'CLF', 'VNT', 'XRX', 'SLG', 'FLS', 'NBIX', 'NVST', 'PGNY', 'LSCC', 'G', 'RCM', 'NSA', 'AZPN', 'CROX', 'HFC', 'ENV', 'TRGP', 'ELY']
Added tickers (as these were removed after the inputted date):  ['WPX', 'ENPH', 'TRMB', 'PBH', 'MPWR', 'HNI', 'EV', 'CZR', 'GNRC', 'PENN', 'EPC', 'OI', 'IDCC', 'UFS', 'PTC', 'GNW', 'PRSP', 'CRL', 'AVNS', 'CMD', 'SVC', 'TCF', 'CLGX', 'GRUB']
Overlap:  []
7       ACC
0      ACHC
1      ACIW
4       ACM
2      ADNT
       ... 
372       X
45      XEC
374     XPO
373       Y
375    YELP
Name: Symbol, Length: 400, dtype: object


In [14]:
# Write the list of tickers to a CSV file (but as .txt file)
lst_sp400.to_csv('data/lst_sp400.txt', header=None, index=None, sep=' ')

## How to obtain (historical) constituent lists of the S&P 600 index using Wikipedia?

In [15]:
# Scrape the tables from the website, and print the first table
table_sp600 = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_600_companies')
table_sp600[1]

Unnamed: 0,Company,Ticker symbol,GICS Sector,GICS Sub-Industry,SEC filings,CIK
0,The Aaron's Company,AAN,Consumer Discretionary,Homefurnishing Retail,view,1821393
1,Applied Optoelectronics,AAOI,Information Technology,Communications Equipment,view,1158114
2,"AAON, Inc.",AAON,Industrials,Building Products,view,824142
3,American Assets Trust,AAT,Real Estate,Diversified REITs,view,1500217
4,Atlas Air Worldwide Holdings,AAWW,Industrials,Air Freight & Logistics,view,1135185
...,...,...,...,...,...,...
596,Xencor Inc,XNCR,Health Care,Biotechnology,view,1326732
597,Xperi Holding Corp,XPER,Information Technology,Semiconductor Equipment,view,1803696
598,"Olympic Steel, Inc.",ZEUS,Materials,Steel,view,917470
599,"Zumiez, Inc.",ZUMZ,Consumer Discretionary,Apparel Retail,view,1318008


In [16]:
# Print the column names of the first table
table_sp600[1].columns

Index(['Company', 'Ticker symbol', 'GICS Sector', 'GICS Sub-Industry',
       'SEC filings', 'CIK'],
      dtype='object')

In [17]:
# Print (part of) the second table (to identify the rows that need to be adapted)
table_sp600[2].head(50)

Unnamed: 0_level_0,Date,Added,Added,Removed,Removed,Reason
Unnamed: 0_level_1,Date,Ticker,Security,Ticker,Security,Reason
0,"July 15, 2021",MSEX,Middlesex Water Company,LMNX,Luminex Corporation,MSEX replaced LMNX after they were acquired by...
1,"July 7, 2021",ATGE,Adtalem Global Education,BPFH,Boston Private Financial Holdings,ATGE moved down from the S&P 400 and replaced ...
2,"June 22, 2021",TWO,Two Harbors Investment Corp.,CATM,Cardtronics,TWO replaced CATM after they were acquired by ...
3,"June 15, 2021",AMEH,"Apollo Medical Holdings, Inc.",ELY,Callaway Golf Company,AMEH replaced ELY who replaced Grubhub in the ...
4,"June 10, 2021",SLQT,"SelectQuote, Inc.",CTB,Cooper Tire & Rubber Company,SLQT replaced CTB after they were acquired by ...
5,"June 4, 2021",SVC,Service Properties Trust,LCI,"Lannett Company, Inc.",SVC moved down from the S&P 400 and replaced L...
6,"June 2, 2021",ORGO,Organogenesis Holdings Inc.,CROX,"Crocs, Inc.",ORGO replaced CROX who replaced Cantel Medical...
7,"May 27, 2021",JYNT,The Joint Corp.,CUB,Cubic Corporation,JYNT replaced CUB after they were acquired by ...
8,"May 17, 2021",AVNS,Avanos Medical,AEGN,Aegion Corp.,AEGN was acquired by New Mountain Capital. AVN...
9,"May 14, 2021",EFC,Ellington Financial,NSA,National Storage Affiliates Trust,EFC replaced NSA who replaced Charles River La...


In [18]:
# Print the column names of the second table
table_sp600[2].columns

MultiIndex([(   'Date',     'Date'),
            (  'Added',   'Ticker'),
            (  'Added', 'Security'),
            ('Removed',   'Ticker'),
            ('Removed', 'Security'),
            ( 'Reason',   'Reason')],
           )

In [19]:
# Obtain the list of tickers of the constituents of the S&P 600 index on a particular date
inp = input("Provide the date (format: YYYY-MM-DD) for which you would like to obtain the list: ")

from datetime import datetime
table_sp600[2][(   'Date',     'Date')] =  pd.to_datetime(table_sp600[2][(   'Date',     'Date')])

added = []
for i in range(len(table_sp600[2])):
    if (table_sp600[2][(   'Date',     'Date')][i]) > datetime.strptime(inp, '%Y-%m-%d'):
        added.append(table_sp600[2][(  'Added',   'Ticker')][i])
added = [item for item in added[::-1] if str(item) != 'nan']

removed = []
for i in range(len(table_sp600[2])):
    if (table_sp600[2][(   'Date',     'Date')][i]) > datetime.strptime(inp, '%Y-%m-%d'):
        removed.append(table_sp600[2][(  'Removed',   'Ticker')][i])
removed = [item for item in removed[::-1] if str(item) != 'nan']

# Adapt the list of tickers using list functions
current_list = list(table_sp600[1]['Ticker symbol'])
remove = list(added)
print("Removed tickers (as these were added after the inputted date): ", remove)
add = list(removed)
print("Added tickers (as these were removed after the inputted date): ", add)
current_list.extend(add)
historical_list = [i for i in current_list if i not in remove]

# Check whether there is overlap between the 'remove' and 'add' lists
overlap = list(set(add).intersection(set(remove)))
print("Overlap: ", overlap)

# Correction for when a ticker was in the historical list, but then removed and added again
correction = []
for item in overlap:
    indices_remove = [i for i, tupl in enumerate(remove, 0) if tupl == item]
    indices_add = [i for i, tupl in enumerate(add, 0) if tupl == item]
    if indices_remove > indices_add:
        print(indices_remove, indices_add)
        correction.append(item)
correction
historical_list.extend(correction)

# Convert the historical list into a Series, and print this Series (in alphabetical order)
lst_sp600 = pd.Series(historical_list, name='Symbol').sort_values()
print(lst_sp600)

Provide the date (format: YYYY-MM-DD) for which you would like to obtain the list: 2020-12-31
Removed tickers (as these were added after the inputted date):  ['ELF', 'CELH', 'HTH', 'PBH', 'ISBC', 'COLL', 'HNI', 'WSFS', 'EPC', 'VCEL', 'OI', 'CARA', 'IDCC', 'RILY', 'UFS', 'UTL', 'GNW', 'TBBK', 'EFC', 'AVNS', 'JYNT', 'ORGO', 'SVC', 'SLQT', 'AMEH', 'TWO', 'ATGE', 'MSEX']
Added tickers (as these were removed after the inputted date):  ['BRKS', 'CPRI', 'YETI', 'FBM', 'BEAT', 'IRDM', 'VRTU', 'CLF', 'EXTN', 'QEP', 'HMSY', 'MTSC', 'CKH', 'MIK', 'EGOV', 'WDR', 'GLUU', 'RCM', 'NSA', 'AEGN', 'CUB', 'CROX', 'LCI', 'CTB', 'ELY', 'CATM', 'BPFH', 'LMNX']
Overlap:  []
0       AAN
1      AAOI
2      AAON
3       AAT
4      AAWW
       ... 
569    XPER
575    YETI
570    ZEUS
571    ZUMZ
572    ZYXI
Name: Symbol, Length: 601, dtype: object


In [20]:
# Write the list of tickers to a CSV file (but as .txt file)
lst_sp600.to_csv('data/lst_sp600.txt', header=None, index=None, sep=' ')

## How to combine the (historical) constituent lists of the three S&P indices?

__NOTE:__ THIS IS ONLY POSSIBLE FOR THE PERIOD FOR WHICH THE WEBSITE CONCERNING THE S&P 600 LIST PROVIDES THE NECESSARY DATA! ALWAYS CHECK THIS FIRST BEFORE USING THE CODE BELOW!

In [21]:
# Check whether there is overlap between the S&P 500 and S&P 400 lists
set(lst_sp500).intersection(set(lst_sp400))

set()

In [22]:
# Check whether there is overlap between the S&P 500 and S&P 600 lists
set(lst_sp500).intersection(set(lst_sp600))

set()

In [23]:
# Check whether there is overlap between the S&P 400 and S&P 600 lists
set(lst_sp400).intersection(set(lst_sp600))

set()

In [24]:
# Combine the S&P 500, S&P 400 and S&P 600 lists
lst_comb1 = lst_sp500.append(lst_sp400, ignore_index=True).sort_values()
lst_comb2 = lst_comb1.append(lst_sp600, ignore_index=True).sort_values()
lst_comb_no_dupl = (list(set(lst_comb2)))

# Convert the historical list into a Series, and print this Series (in alphabetical order)
lst_sp1500 = pd.Series(lst_comb_no_dupl, name='Symbol').sort_values()
print(lst_sp1500)

1000       A
1355     AAL
738      AAN
541     AAOI
1496    AAON
        ... 
1400    ZEUS
919     ZION
1159     ZTS
194     ZUMZ
247     ZYXI
Name: Symbol, Length: 1506, dtype: object


In [25]:
# Write the list of tickers to a CSV file (but as .txt file)
lst_sp1500.to_csv('data/lst_sp1500.txt', header=None, index=None, sep=' ')