In [334]:
from bs4 import BeautifulSoup
import requests
import re ##For Regex
import pandas as pd

 ## load https://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1501390

#### Question 1 - use your browser's development tools to find a unique way to access its list price and its current price. What do you choose? 

In [335]:
url= "https://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1501390" ##Storing the URL in a variable
headers = {'User-Agent': 'Mozilla/5.0'} ##Need to access the url using user agent as a browser
page = requests.get(url, headers = headers) ##get the page content using get request
soup = BeautifulSoup(page.text, 'html.parser') ##creating beautiful soup object to parse the text of the page
list_price_content = soup.select("div.pdp-price > p.list-price > span > del") ##this is the path for list price, its strike-through. 'div' is the tag and following it is the path to get list price
list_price = [] ##creating a list then appending the content of list price in it
for i in list_price_content:
    list_price.append(i.text.strip())
print("List Price:",list_price[0]) ##print

List Price: $1,399.99


In [336]:
##Used similar method described above to get the sale price (it is extracted in a slightly different format than list price)
sale_price_content = soup.select("div.pdp-price > p.final-price > span.sale-price > span.sr-only")
sale_price = []
for i in sale_price_content:
    sale_price.append(i.text.strip())
print("Sale Price:",sale_price[0])

Sale Price: $1,029
          and 99 cents


#### Question 2 - store the prices to strings

In [337]:
##Combine the two lists of list price and sale price
prices = list_price + sale_price
prices

['$1,399.99', '$1,029\r\n          and 99 cents']

In [338]:
##Convert them to string
string = ""
string=' '.join(map(str,prices))
list_string = ""
list_string=' '.join(map(str,list_price))
sale_string = ""
sale_string=' '.join(map(str,sale_price))

#### Question 3 - use Python's (or Java's) regex (!!) functionality to convert the prices to "1234.56" (no dollar sign, comma, just a "." separator for cents)

In [339]:
##Regex to extract only prices from list price string with no dollar sign, comma, just a "." separator for cents
##First replace ',' and '$' with '' in list price
pattern = '[$,]'
#empty string to replace it with
replace = ''
new_string = re.sub(pattern, replace, list_string) 
print(new_string)

1399.99


In [341]:
sale_string

'$1,029\r\n          and 99 cents'

In [342]:
##Regex to extract only prices from sale price string with no dollar sign, comma, just a "." separator for cents
##First replace ',' letters 'and','cents' '\r' and '$' with '' in list price
pattern2 = '[$,a-z\r+]'
replace = ''
new_string2 = re.sub(pattern2, replace, sale_string) 
##There is still a whitespace between 1029 and 99. We need to replace it with a single dot '.'
pattern3 = '[\s]+(?=[0-9])' ##finding white space followed by a digit, positive lookahead
replace2 = '.'
new_string3 = re.sub(pattern3, replace2, new_string2) 
print(new_string3)

1029.99 


#### Question 4 - print both, the list price and the current price to screen

In [343]:
print("List Price:",new_string,"\nSale Price:",new_string3)

List Price: 1399.99 
Sale Price: 1029.99 


#### Question 5 - Write code that loads "https://www.usnews.com/"

In [344]:
url= "https://www.usnews.com/"
headers = {'User-Agent': 'Mozilla/5.0'}
page2 = requests.get(url, headers = headers)
soup2 = BeautifulSoup(page2.text, 'html.parser')
##Same method as in question 1

#### Question 6 - "finds" its current "Top Stories" (do not hard-code it's URL!)

In [345]:
t = soup2.find('div', class_ = "Box-w0dun1-0 ArmRestTopStories__Part-s0vo7p-1 erkdnc biVKSR")
for i in t:
	print(i.text.strip())
    
##Top stories were stored in above class in 'div' tag, it was extracted from its path 
##Here I used find() instead of select() since select would locate all elements of a particular this class which was many more stories while I was able to extract only top stories using find since it finds only the first instance

McCarthy, Biden to Talk Amid Debt ThreatA precarious partisan battle is threatening the first-ever default on the national debt.Kaia Hubbard
Existing Homes Fall 1.5% in DecemberThe performance was better than expected and prices inched up a little from a year ago.Tim Smart


#### Question 7 - read + print the URL of the _second_ current top story to the screen

In [346]:
t2 = soup2.findAll('h3', class_ = "Heading-sc-1w5xk2o-0 ContentBox__StoryHeading-sc-1egb8dt-3 MRvpF fqJuKa story-headline")
t2
##There are 4 instances where h3 tag exists and the required top story 2 is at the third occurence. These elements also contain tag 'a' where URLs of the stories are stored
##t2 is bs4.element.ResultSet i.e. Resultset of elements with h3 tag and we are interested in 3rd element that is t[2]
t2[2] ##This is the required element

<h3 class="Heading-sc-1w5xk2o-0 ContentBox__StoryHeading-sc-1egb8dt-3 MRvpF fqJuKa story-headline"><a href="https://www.usnews.com/news/economy/articles/2023-01-20/existing-homes-fall-1-5-in-december-marking-11th-month-of-declines">Existing Homes Fall 1.5% in December</a></h3>

In [347]:
##URL of the required second news is a part of this element, we save this element in a variable
a = t2[2].find('a',href=True)
##We can extract the required URL by using 'href' as below

In [348]:
print("URL of second top story is:\n", a['href'])

URL of second top story is:
 https://www.usnews.com/news/economy/articles/2023-01-20/existing-homes-fall-1-5-in-december-marking-11th-month-of-declines


#### Question 8 - load that page

In [349]:
url_top2story = a['href']
type(url_top2story)

str

In [350]:
##Requesting the page through get request passing URL and User Agent Mozilla as headers
page3 = requests.get(url_top2story, headers = headers)
##Creating the beautiful soup object to parse and load the page
soup3 = BeautifulSoup(page3.text, 'html.parser')

#### Question 9 - read + print the header as well as the first 3 sentences of the main body to the screen

In [351]:
header_top2 = soup3.findAll('h1', class_ = "Heading-sc-1w5xk2o-0 iQhOvV") ##only one header with h1 tag stored under this class
header_top2[0].text ##This is the header of the second story
content = soup3.findAll('div', class_ = "Raw-slyvem-0 bCYKCn") ##entire content / sentences of the article (excluding images, ads, other sentences) in this class
content[0]
content[1] ##Here 2nd element of content list has just one space, it doesn't contain a sentence of the story, so we have to add one more element content[3]
content[2]
content[3]

<div class="Raw-slyvem-0 bCYKCn"><p>“December was another difficult month for buyers, who continue to face limited inventory and high mortgage rates,” said NAR Chief Economist Lawrence Yun. “However, expect sales to pick up again soon since mortgage rates have markedly declined after peaking late last year.”</p></div>

In [352]:
content[1]

<div class="Raw-slyvem-0 bCYKCn"><p></p></div>

In [353]:
print(header_top2[0].text,"\n",content[0].text,"\n",content[1].text,content[2].text,"\n",content[3].text) 

Existing Homes Fall 1.5% in December, Marking 11th Month of Declines 
 Sales of existing homes slid 1.5% in December, somewhat better than expected but the 11th straight month of decline, the National Association of Realtors said on Friday. 
  The number was better than estimates of a 3.4% drop and brings the annual rate of home sales just a hair above 4 million. Sales are now down 34% from year-ago levels. 
 “December was another difficult month for buyers, who continue to face limited inventory and high mortgage rates,” said NAR Chief Economist Lawrence Yun. “However, expect sales to pick up again soon since mortgage rates have markedly declined after peaking late last year.”
