### Steps
- Import the get() function from the requests module, BeautifulSoup from bs4, and pandas.
- Assign the address of the web page to a variable named url.
- Request the server the content of the web page by using get(), and store the server’s response in the variable response.
- Print the response text to ensure you have an html page.
- Take a look at the actual web page contents and inspect the source to understand the structure a bit.
- Use BeautifulSoup to parse the HTML into a variable ('soup').
- Identify the key tags you need to extract the data you are looking for.
- Create a dataframe of the data desired.
- Run some summary stats and inspect the data to ensure you have what you wanted.
- Edit the data structure as needed, especially so that one column has all the text you want included in this analysis.
- Create a corpus of the column with the text you want to analyze.
- Store that corpus for use in a future notebook.

In [5]:
from requests import get
from bs4 import BeautifulSoup
import os

In [6]:
url = 'https://codeup.com/data-science/math-in-data-science/'
headers = {'User-Agent': 'Codeup Data Science'} # Some websites don't accept the pyhon-requests default user-agent
response = get(url, headers=headers)

In [7]:
print(response.text[:400])

<!DOCTYPE html>
<html lang="en-US">
<head>
	<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge">
	<link rel="pingback" href="https://codeup.com/xmlrpc.php" />

	<script type="text/javascript">
		document.documentElement.className = 'js';
	</script>
	
	<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin /><script id="diviarea-loader">window.DiviPopupData=wi


In [8]:
# Make a soup variable holding the response content
soup = BeautifulSoup(response.content, 'html.parser')
soup

<!DOCTYPE html>

<html lang="en-US">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<link href="https://codeup.com/xmlrpc.php" rel="pingback"/>
<script type="text/javascript">
		document.documentElement.className = 'js';
	</script>
<link crossorigin="" href="https://fonts.gstatic.com" rel="preconnect"/><script id="diviarea-loader">window.DiviPopupData=window.DiviAreaConfig={"zIndex":1000000,"animateSpeed":400,"triggerClassPrefix":"show-popup-","idAttrib":"data-popup","modalIndicatorClass":"is-modal","blockingIndicatorClass":"is-blocking","defaultShowCloseButton":true,"withCloseClass":"with-close","noCloseClass":"no-close","triggerCloseClass":"close","singletonClass":"single","darkModeClass":"dark","noShadowClass":"no-shadow","altCloseClass":"close-alt","popupSelector":".et_pb_section.popup","initializeOnEvent":"et_pb_after_init_modules","popupWrapperClass":"area-outer-wrap","fullHeightClass":"full-height","openPopupClass":"da-overlay-visible","ove

### Beautiful Soup Methods and Properties

- soup.title.string gets the page's title (the same text in the browser tab for a page, this is the \<title\> element.
- soup.prettify() is useful to print in case you want to see the HTML
- soup.find_all("a") find all the anchor tags, or whatever argument is specified.
- soup.find("h1") finds the first matching element
- soup.get_text() gets the text from within a matching piece of soup/HTML
- The soup.select() method takes in a CSS selector as a string and returns all matching elements. super useful

In [9]:
soup.select('h2')

[<h2>What are the main math principles you need to know to get into Codeup’s Data Science program?</h2>,
 <h2>Latest Blog Articles</h2>,
 <h2 class="et_pb_module_header">Get Program Details &amp; Pricing</h2>]

In [10]:
soup.select('h2')[2].text


'Get Program Details & Pricing'

In [45]:
soup.select('header')

[<header id="dm-header">
 <div class="dm-header-cont">
 <div class="dm-branding">
 <a href="https://codeup.com/">
 <img alt="Codeup" class="main-logo normal-logo" id="dm-logo" src="https://199lj33nqk3p88xz03dvn481-wpengine.netdna-ssl.com/wp-content/uploads/2021/08/CodeupFullColorLogo.png" title="">
 </img></a>
 </div>
 <div class="dm-search">
 <div id="et_top_search_mob">
 <span id="et_search_icon"></span>
 <div class="dm-search-box" style="opacity: 0;">
 <form action="https://codeup.com/" class="et-search-form" method="get" role="search">
 <input class="et-search-field" name="s" placeholder="Search …" title="Search for:" type="search" value=""/> </form>
 <span class="close"></span>
 </div>
 </div>
 </div>
 </div>
 </header>,
 <header class="et-l et-l--header">
 <div class="et_builder_inner_content et_pb_gutters3">
 <div class="et_pb_section et_pb_section_0_tb_header et_pb_with_background et_section_regular">
 <div class="et_pb_row et_pb_row_0_tb_header">
 <div class="et_pb_column et_p