## Data Scraping :

Data scraping, or web scraping, is the process of automatically collecting information from websites or online sources. It's used for various purposes like research, competitor analysis, and more. The process involves accessing a website, extracting specific data from it, and storing that data for analysis. It should be done ethically and within legal boundaries, and you should be aware of the website's terms of service and any anti-scraping measures in place. There are programming libraries and tools available to make web scraping easier. Always consider legal and ethical implications when scraping data from the web.

In [1]:
import pandas as pd
import numpy as np
import yfinance as yf

The code extracts the list of S&P 500 company symbols from a Wikipedia page, downloads historical stock data for these companies from Yahoo Finance spanning eight years up to '2023-9-27', and structures the data into a DataFrame. It then applies the `describe()` function to provide summary statistics for the numerical columns in the DataFrame, offering insights into the historical performance of these S&P 500 companies.

In [3]:
sp500 = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]

sp500['Symbol'] = sp500['Symbol'].str.replace('.', '-')

symbol_list = sp500['Symbol'].unique().tolist()

end_date = '2023-9-27'

start_date = pd.to_datetime(end_date) - pd.DateOffset(365 * 8)

dataframe = yf.download(tickers = symbol_list, 
                        start = start_date, 
                        end = end_date).stack()

dataframe.index_names = ['date', 'tickers']

dataframe.columns = dataframe.columns.str.lower()

dataframe 

  sp500['Symbol'] = sp500['Symbol'].str.replace('.', '-')


[*********************100%%**********************]  503 of 503 completed


1 Failed download:
['VLTO']: Exception("%ticker%: Data doesn't exist for startDate = 1443499200, endDate = 1695787200")





  dataframe.index_names = ['date', 'tickers']


Unnamed: 0_level_0,Unnamed: 1_level_0,adj close,close,high,low,open,volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015-09-29,A,31.588043,33.740002,34.060001,33.240002,33.360001,2252400.0
2015-09-29,AAL,37.361622,39.180000,39.770000,38.790001,39.049999,7478800.0
2015-09-29,AAPL,24.748632,27.264999,28.377501,26.965000,28.207500,293461600.0
2015-09-29,ABBV,37.024639,52.790001,54.189999,51.880001,53.099998,12842800.0
2015-09-29,ABT,33.807274,39.500000,40.150002,39.029999,39.259998,12287500.0
...,...,...,...,...,...,...,...
2023-09-26,YUM,124.010002,124.010002,124.739998,123.449997,124.239998,1500600.0
2023-09-26,ZBH,112.216316,112.459999,117.110001,112.419998,116.769997,3610500.0
2023-09-26,ZBRA,223.960007,223.960007,226.649994,222.580002,225.970001,355400.0
2023-09-26,ZION,33.990002,33.990002,34.700001,33.840000,33.840000,1586100.0


In [4]:
dataframe.to_csv('sp500.csv')