<h1>Scraping morningstar.com</h1>
In this assignment, you will scrape data from morningstar.com. The goal of the exercise is to get the latest sector performance data from the US markets. 

The end result is to write a function: <i>get_us_sector_performance()</i> that will return a list of tuples. Each tuple should correspond to a sector and should contain the following data:
<li>the sector name
<li>the amount the sector has moved
<li>a link to the morningstar detail page for that sector

<p>
The data should be sorted by decreasing order of change. I.e., the sector with the highest change should be in the first tuple, etc.

<h2>Sample output (as of 8/25/2023)</h2>
<pre>
[('Consumer Cyclical',
  0.99,
  'https://morningstar.com//indexes/ixus/mccs/quote'),
 ('Energy', 0.99, 'https://morningstar.com//indexes/ixus/mes/quote'),
 ('Technology', 0.78, 'https://morningstar.com//indexes/ixus/mts/quote'),
 ('Industrials', 0.76, 'https://morningstar.com//indexes/ixus/mis/quote'),
 ('Utilities', 0.75, 'https://morningstar.com//indexes/ixus/mus/quote'),
 ('Sensitive', 0.73, 'https://morningstar.com//indexes/ixus/mssemsss/quote'),
 ('Defensive', 0.66, 'https://morningstar.com//indexes/ixus/mssemdss/quote'),
 ('Consumer Defensive',
  0.66,
  'https://morningstar.com//indexes/ixus/mcds/quote'),
 ('Healthcare', 0.66, 'https://morningstar.com//indexes/ixus/mhs/quote'),
 ('Cyclical', 0.55, 'https://morningstar.com//indexes/ixus/mssemcss/quote'),
 ('Financial Services',
  0.29,
  'https://morningstar.com//indexes/ixus/mfss/quote'),
 ('Basic Materials', 0.26, 'https://morningstar.com//indexes/ixus/mbms/quote'),
 ('Real Estate', 0.17, 'https://morningstar.com//indexes/ixus/mrets/quote'),
 ('Communication Services',
  0.17,
  'https://morningstar.com//indexes/ixus/mcss/quote')]
</pre>

<b>Notes:</b>
<li>Note that the sector change is a string with a % sign at the back. You will need to get rid of the % and convert the string into a float before you can sort it. Note also that the names of sectors as well as the changes contain a series of special characters (\n or \t) and you need to get rid of these (see below)</li>
<li>To sort a list of tuples by an arbitrary element, use the example at the bottom of this notebook

<h3>IMPORTANT: Dealing with negative change values</h3>
Morningstar is using a special character for the negative sign in negative numbers. Since this is not the ordinary python negative sign, you cannot convert this into a float. 

In [None]:
#The function "ord" returns the character unicode equivalent of a char
#The following code shows that the two are different
morningstar_negative = "−23.25"
morningstar_neg_sign = morningstar_negative[0]
print("unicode for morningstar",ord(morningstar_neg_sign))
normal_negative = "-23.25"
normal_negative_sign = normal_negative[0]
print("unicode for normal",ord(normal_negative_sign))

In [None]:
morningstar_negative = "−23.25"
float(morningstar_negative) #Throws an exception

In [None]:
#Use the following code to fix this 
#Though the two negative signs look the same, they are different
proper_negative = morningstar_negative.replace("−","-")
float(proper_negative) #This works

<h2>Extracting the url</h2>


In [None]:
from bs4 import BeautifulSoup
page_data = '<html><body><a href="https://www.columbia.edu">Columbia University</a></body></html>'
soup = BeautifulSoup(page_data)
url = soup.find('a').get('href')
print(url)

In [None]:
#Function scaffold

def get_sector_performance():
    import requests
    from bs4 import BeautifulSoup
    sector_performance_list = list()
    url = "https://www.morningstar.com/markets"

    
    
    
    return sector_performance_list

In [None]:
def get_sector_performance():
    import requests
    from bs4 import BeautifulSoup
    sector_performance_list = list()
    url = "https://www.morningstar.com/markets"
    response = requests.get(url)
    sectorpage = BeautifulSoup(response.text)
    tables = sectorpage.find_all("table")
    for j in [9,10,11]:
        all_rows = tables[j].find_all('td')
        i=0
        while i in range(0,len(all_rows)):
            try:
                link = "https://morningstar.com/"+all_rows[i].find('a').get('href')
                sector = all_rows[i].find('a').get_text().strip()
                data = all_rows[i+1].find('span').get_text()
                data = data.replace("−","-")
                data = float(data.strip()[:-1])
                sector_performance_list.append((sector,data,link))
                i+=2
            except:
                i+=1
                continue
    return sector_performance_list


In [None]:
sorted(get_sector_performance(),key=lambda k: k[1],reverse=True)

<h3>Sorting</h3>
<li>the <span style="color:red">sort</span> function sorts a list "in-place". I.e., the list itself changes so that the contents are in sorted order</li>
<li>the <span style="color:red">sorted</span> function returns a new sorted list</li>
<li>both functions take arguments that determine the key (<span style="color:red">key=</span>) and the order. Ascending is the default order, to flip it use <span style="color:red">reverse=True</span></li>
<li>sort and sorted will only work if the data is sortable. For example, sorted([1,9,2,8,11,'a']) will not work because an integer and a string cannot be ordered. In Python, the order operator is the < (less than) sign</li>

In [None]:
x = [1,9,2,8,11]
x.sort()
print(x) #x is now a sorted list
x.sort(reverse=True)
print(x) #x is not sorted in descending order

y = [1,9,2,8,11]
sorted(y)
print(y) #y is unchanged because sorted(y) returned a new list

z = sorted(y)
print(z) #z contains the sorted contents of y

<h3>comparing tuples</h3

In [None]:
(1,2,3,5) < (1,2,4,1) 
#Python does an element wise comparison, somewhat like when strings are compared
#When sorting tuples (or lists), that element wise comparison is used to determine ordering

In [None]:

x = [('c',17.4,'f'),('e',1.74,'bb'),('d',29.2,'z'),('a',23.2,'b'),('d',29.2,'a')]
x.sort() #Sorts using tuple comparison, going left to right. Note the order of the 'd' tuples
x

<li>When a key is specified, sort (and sorted) will only use that key
<li>In the example below, note that the two 'd' tuples are not re-ordered (because 'z' and 'a' are not compared)
<li>sort and sorted are stable sort functions. If two elements are "equal" (per the sorting rule), they will be returned in the same order as they were in the original array

<h1>lambda functions</h1>
<li>lambda functions are anonymous functions, created on the fly, and typically meant to be used once</li>
<li>since they are unnamed, they are not callable but are meant to be used in context</li>
<li>but, since python functions are first order functions, you can give them a name</li>

<li>lambda functions can have only one expression and they return whatever the expression returns</li>
<li>the if .. else .. structure in a lambda function is in the form of an "expression if"</li>
<li>multiple arguments are separated by a comma</li>




In [None]:
#Example
#Three arguments, a,b, c separated by commas
#The function returns a if (condition 1) else it returns b if (condition 2); else it returns c
#Note the expression if structure
#No return statement. Something is always returned!

func = lambda a,b,c: a if a>b and a>c else b if b>a and b>c else c
func(10,3,8)

In [None]:
x = [('c',17.4,'f'),('e',1.74,'bb'),('d',29.2,'z'),('a',23.2,'b'),('d',29.2,'a')]
x.sort(key=lambda k: k[1]) 
x

In [None]:
#Alternatively, for the sort (and max, min) function
#itemgetter returns the item at the specified index in a collection
#the key= below tells sort to use the element at location 1 when comparing elements in x

from operator import itemgetter
x = [('c',17.4,'f'),('e',1.74,'bb'),('d',29.2,'z'),('a',23.2,'b'),('d',29.2,'a')]
x.sort(key=itemgetter(1)) 
x

<h1>Cleaning a string</h1>
<li>The function <span style="color:blue">strip</span> cleans a string by removing all leading and trailing special characters and spaces</li>
<li>In the example below, since % is not a special character, it is not removed from the string

In [None]:
sample_string = " \n\n\t42%"
print(sample_string) #The string will be indented and will contain blank lines
cleaned_string = sample_string.strip()
print(cleaned_string)

<h1>string to float</h1>
<li>The <span style="color:blue">float</span> function converts a string to a float</li>
<li>For the function to not throw an exception, the string must be convertible</li>
<li>For example, the string "42" is convertible but "42%" is not, since the % is not a number</li> 
<li>Get rid of the % by slicing the string and removing the %</li>

In [None]:
sample_string = "42%"
sample_float = float(sample_string) #This will throw an exception

In [None]:
sample_string = "42%"
sample_float = float(sample_string[:-1]) #This will not throw an exception since we're dropping the last character
print(sample_float)

In [5]:
data = "cyclical \n -0.3"
splitted = data.split("\n")
sector = splitted[0].strip()
change = splitted[1].strip()
print(sector)
print(change)

cyclical
-0.3
