## Webscraping Exercise

The purpose of this exercise is to use *beautifulsoup* to extract out information from this article: https://www.usatoday.com/story/money/business/2018/09/13/mcdonalds-states-most-stores/37748287/.

Unfortunately, the information we want is not stored in a table and is not formatted in a way that makes it easy to extract, so it will take some work before you can do any analysis.

Your objective is to create a pandas DataFrame containing all 50 states and the four metrics from the article (number of McDonald's per 100,000, adult obesity rate, percent consuming vegetables less than daily, and median household income).

In [9]:
import requests
from bs4 import BeautifulSoup as bs
import re
import pandas as pd

**Step 1: Use _requests_ to fetch the contents of the article and convert to soup with _BeautifulSoup_**

In [10]:
# Your code here
url = 'https://www.usatoday.com/story/money/business/2018/09/13/mcdonalds-states-most-stores/37748287/'
response = requests.get(url)

### Step 2: Extract State Names

**A.** Using whatever method you would like, extract out the states as a list named `states`. Do this in the same order that they appear in the article.

In [12]:
# Your code here
soup = bs(response.text)

**B.** Now, extract the other four variables as lists named `McD`, `obesity`, `veggies`, and `income`. Make sure that they are in the same order as states.

In [13]:
# Your code here
print(soup.prettify())

<!DOCTYPE html>
<html class="gnt__njs" lang="en-US">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width,initial-scale=1,minimum-scale=1" name="viewport"/>
  <meta content="#0098FE" name="theme-color"/>
  <title>
   McDonald's lover? Here are the states with the most stores
  </title>
  <meta content="Is your state 'lovin' it'? A look at where the most McDonald's are located in the US" property="og:title"/>
  <link href="https://amp.usatoday.com/amp/37748287" rel="amphtml"/>
  <meta content="Colman Andrews" property="article:author"/>
  <meta content="metered" property="article:content_tier"/>
  <meta content="false" property="article:opinion"/>
  <link href="//user.usatoday.com" rel="dns-prefetch"/>
  <link href="https://user.usatoday.com" rel="preconnect"/>
  <link href="//content-static.gannett.com" rel="dns-prefetch"/>
  <link href="//www.gannett-cdn.com" rel="dns-prefetch"/>
  <link href="//securepubads.g.doubleclick.net" rel="dns-prefetch"/>
  <link href="https:

In [19]:
#h3 class="gnt_ar_b_h2"
soup.find_all('h3', attrs={'class' : "gnt_ar_b_h2"})[0:50]


[<h3 class="gnt_ar_b_h2">50. Rhode Island</h3>,
 <h3 class="gnt_ar_b_h2">49. New Jersey</h3>,
 <h3 class="gnt_ar_b_h2">48. New York</h3>,
 <h3 class="gnt_ar_b_h2">47. California</h3>,
 <h3 class="gnt_ar_b_h2">46. North Dakota</h3>,
 <h3 class="gnt_ar_b_h2">45. South Dakota</h3>,
 <h3 class="gnt_ar_b_h2">44. Massachusetts</h3>,
 <h3 class="gnt_ar_b_h2">43. Washington</h3>,
 <h3 class="gnt_ar_b_h2">42. Idaho</h3>,
 <h3 class="gnt_ar_b_h2">41. Utah</h3>,
 <h3 class="gnt_ar_b_h2">40. Colorado</h3>,
 <h3 class="gnt_ar_b_h2">39. Pennsylvania</h3>,
 <h3 class="gnt_ar_b_h2">38. Delaware</h3>,
 <h3 class="gnt_ar_b_h2">37. Connecticut</h3>,
 <h3 class="gnt_ar_b_h2">36. Oregon</h3>,
 <h3 class="gnt_ar_b_h2">35. New Hampshire</h3>,
 <h3 class="gnt_ar_b_h2">34. Minnesota</h3>,
 <h3 class="gnt_ar_b_h2">33. Nebraska</h3>,
 <h3 class="gnt_ar_b_h2">32. Arizona</h3>,
 <h3 class="gnt_ar_b_h2">31. Vermont</h3>,
 <h3 class="gnt_ar_b_h2">30. Alaska</h3>,
 <h3 class="gnt_ar_b_h2">29. Texas</h3>,
 <h3 class="

In [24]:
for a in soup.find_all('h3', attrs={'class' : "gnt_ar_b_h2"})[0:50]:
  print(a.string)

50. Rhode Island
49. New Jersey
48. New York
47. California
46. North Dakota
45. South Dakota
44. Massachusetts
43. Washington
42. Idaho
41. Utah
40. Colorado
39. Pennsylvania
38. Delaware
37. Connecticut
36. Oregon
35. New Hampshire
34. Minnesota
33. Nebraska
32. Arizona
31. Vermont
30. Alaska
29. Texas
28. Florida
27. Georgia
26. Montana
25. South Carolina
24. Maine
23. Virginia
22. Iowa
21. North Carolina
20. Maryland
19. Nevada
18. Mississippi
17. Tennessee
16. New Mexico
15. Wyoming
14. Alabama
13. Kansas
12. Louisiana
11. Missouri
10. Wisconsin
9. Hawaii
8. Illinois
7. Oklahoma
6. Indiana
5. Ohio
4. Michigan
3. Kentucky
2. West Virginia
1. Arkansas


In [22]:
type(a)

bs4.element.Tag

In [25]:
states = []
for a in soup.find_all('h3', attrs={'class' : "gnt_ar_b_h2"})[0:50]:
    states.append(a.string)
print(states)

['50. Rhode Island', '49. New Jersey', '48. New York', '47. California', '46. North Dakota', '45. South Dakota', '44. Massachusetts', '43. Washington', '42. Idaho', '41. Utah', '40. Colorado', '39. Pennsylvania', '38. Delaware', '37. Connecticut', '36. Oregon', '35. New Hampshire', '34. Minnesota', '33. Nebraska', '32. Arizona', '31. Vermont', '30. Alaska', '29. Texas', '28. Florida', '27. Georgia', '26. Montana', '25. South Carolina', '24. Maine', '23. Virginia', '22. Iowa', '21. North Carolina', '20. Maryland', '19. Nevada', '18. Mississippi', '17. Tennessee', '16. New Mexico', '15. Wyoming', '14. Alabama', '13. Kansas', '12. Louisiana', '11. Missouri', '10. Wisconsin', '9. Hawaii', '8. Illinois', '7. Oklahoma', '6. Indiana', '5. Ohio', '4. Michigan', '3. Kentucky', '2. West Virginia', '1. Arkansas']


In [27]:
type(states)

list

In [28]:
soup.find_all('p', attrs={'class' : "gnt_ar_b_p"})

[<p class="gnt_ar_b_p"><asset-img uniqueid="247WallSt.com-247WS-491026-a0dcc61c"></asset-img></p>,
 <p class="gnt_ar_b_p">McDonald’s (NYSE: MCD) isn’t America’s largest fast food chain – there are only about 14,000 stores around the country, as opposed to more than 25,000 Subway outlets. It is, however, the largest hamburger chain by far – its nearest competitor, Burger King, has only about 7,500 stores – and the most iconic.</p>,
 <p class="gnt_ar_b_p">Using data collected from the McDonald’s website, 24/7 Wall Street has identified the number of McDonald’s locations in every state. The tally ranges from 25 in North Dakota to 1,295 in California. The concentration of McDonald's restaurants also varies considerably.</p>,
 <p class="gnt_ar_b_p">Five of the 10 states with the highest concentrations of McDonald’s per 100,000 residents – West Virginia, Arkansas, Kentucky, Oklahoma, and Michigan – are also among the 10 states with the highest rates of obesity. However, there is virtually no

In [48]:
#soup = BeautifulSoup(page.content)
# mcd = soup('body')[0]
# mcd_no.next_sibling = mcd.find('b',text="No. of McDonald's:")
#print(soup.find(text="No. of McDonald's:").next('strong').contents[0])
mcd = soup.find(text="No. of McDonald's:")
# b_tag = address.parent
# td_tag = b_tag.parent
mcd_no = mcd.next('strong')
print(mcd_no.contents[0])

AttributeError: 'NoneType' object has no attribute 'next'

In [33]:
# mcd = soup.find(text=re.compile(r"No. of McDonald's:"))
# print(mcd.nextSibling)


None


### Step 3: Convert the Result to a pandas DataFrame

Once you have created a DataFrame, take a look at the results and see if there are any significant correlations between the variables.

In [None]:
# Your code here