<h1>Scraping fidelity.com</h1>
In this document, I wrote a function to get the latest sector performance data from the US markets, and to get the total market capitalization for each sector. 

The end result is a function: get_us_sector_performance() that will return a list of tuples. Each tuple should correspond to a sector and should contain the following data:

the sector name
the amount the sector has moved
the market capitalization of the sector
the market weight of the sector
a link to the fidelity page for that sector
The data should be sorted by decreasing order of market weight. I.e., the sector with the highest weight should be in the first tuple, etc.

In [1]:
def get_us_sector_performance():
    output_list = list()
    url = "https://eresearch.fidelity.com/eresearch/goto/markets_sectors/landing.jhtml"

    import requests
    from bs4 import BeautifulSoup
    
    response = requests.get(url)
    
    if not response.status_code == 200:
        return recipe_list
    
    results_page = BeautifulSoup(response.content)
    sectors = results_page.find_all('a',{'class':'heading1'})
    
    for sector in sectors:
        
        sector_page_link = "https://eresearch.fidelity.com" + sector.get('href')
        
        sector_name = sector.find('strong').get_text()
        
        sector_change,sector_market_cap,sector_market_weight = get_sector_change_and_market_cap(sector_page_link)
        output_list.append((sector_name,sector_change,sector_market_cap,sector_market_weight,sector_page_link))
        
        output_list.sort(key=lambda k: k[3], reverse=True) 
        
    return output_list
    

In [3]:
def get_sector_change_and_market_cap(sector_page_link):
    
    import requests
    from bs4 import BeautifulSoup
    
    response = requests.get(sector_page_link)
    soup = BeautifulSoup(response.text)
    content = soup.find_all('tbody')
    
    sector_change = content[1].find_all('span')[0].get_text()
    sector_market_cap = content[1].find_all('span')[2].get_text()
    sector_market_weight = float(content[1].find_all('span')[4].get_text().strip('%'))
    
    return sector_change,sector_market_cap,sector_market_weight

In [5]:
#Test get_sector_change_and_market_cap()
link = "https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25"
get_sector_change_and_market_cap(link)

('-1.31%', '$8.12T', 11.75)

In [6]:
#Test get_us_sector_performance()
get_us_sector_performance()


[('Information Technology',
  '-1.93%',
  '$14.58T',
  27.08,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=45'),
 ('Health Care',
  '-0.04%',
  '$8.25T',
  14.33,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=35'),
 ('Consumer Discretionary',
  '-1.31%',
  '$8.12T',
  11.75,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25'),
 ('Financials',
  '-0.77%',
  '$8.39T',
  11.08,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=40'),
 ('Communication Services',
  '-1.46%',
  '$5.50T',
  9.27,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=50'),
 ('Industrials',
  '+0.20%',
  '$5.50T',
  7.67,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?ta