# Level 2 - Beautiful Soup

---

# The Mission

Your company `SpiderLegion` has just signed a contract with an Analytics Company called `DashItUp`.
`DashItUp` is well known for it's dashboarding capabilities specializing in monitoring website metrics such as views, content shares, new users, and database errors!

While dashboards are nice, `DashItUp` is now wanting to spend some time on a new `summarize` feature. 
`DashItUp` wants to run web crawlers against their dashboards to fetch the `key metrics` and print them off as a single report.

## Key Metrics

* User Count
* Any _system errors_, how recent?
    * System errors can be one of the following: `Database error`, `CPU overload`, or `Out of memory`
* Bounce Rate
* Top and bottom countries by utility
* Most recent user names with links to their profiles
* Name of the user that owns the dashboard

`DashItUp` has _many_ websites that use the same template (they all look the same). 
They believe that if you can write a web crawler for one, they should be able to apply the same code to the other dashboards they own to get similar results.

---

## Fetch The Website Contents

`DashItUp` was kind enough to give us a website to test against.
The website content can be found in the `assets` folder called `website.html`.
We already have some code that is responsible for opening that file, reading it, and saving the contents to a variable called `website_contents`.

(Source HTML code is from the Analytics Template from the website https://www.w3schools.com/w3css/w3css_templates.asp)

In [1]:
with open("../assets/website.html") as website_file:
    website_contents = website_file.read()
    
website_contents

'<!DOCTYPE html>\n<html>\n<title>DashItUp, A Dashboard</title>\n<meta charset="UTF-8">\n<meta name="viewport" content="width=device-width, initial-scale=1">\n<link rel="stylesheet" href="https://www.w3schools.com/w3css/4/w3.css">\n<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Raleway">\n<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">\n<style>\nhtml,body,h1,h2,h3,h4,h5 {font-family: "Raleway", sans-serif}\n</style>\n<body class="w3-light-grey">\n\n<!-- Top container -->\n<div class="w3-bar w3-top w3-black w3-large" style="z-index:4">\n  <button class="w3-bar-item w3-button w3-hide-large w3-hover-none w3-hover-text-light-grey" onclick="w3_open();"><i class="fa fa-bars"></i> \xa0Menu</button>\n  <span class="w3-bar-item w3-right">Logo</span>\n</div>\n\n<!-- Sidebar/menu -->\n<nav class="w3-sidebar w3-collapse w3-white w3-animate-left" style="z-index:3;width:300px;" id="mySidebar"><br>\n  <div class="w

What a jumbled mess!
It is nearly impossible to understand what is going on here without some hardcore `HTML` understanding..
Unless we visualize it!

In [2]:
# In jupyter, you can visualize raw HTML using these two functions!
# It is essentially "embedding" the website content within the notebook
from IPython.core.display import display, HTML

display(HTML(website_contents))

0,1,2
,"New record, over 90 views.",10 mins
,Database error.,15 mins
,"New record, over 40 users.",17 mins
,New comments.,25 mins
,Check transactions.,28 mins
,CPU overload.,35 mins
,New shares.,39 mins

Country,Utility
France,1.5%
UK,15.7%
United States,65%
Russia,5.6%
Spain,2.1%
India,1.9%


---

## Get to Work!

### Import the tools needed

In [3]:
from bs4 import BeautifulSoup
import pandas as pd

---

## Create the Soup!

In [4]:
# code here
soup = BeautifulSoup(website_contents, 'html.parser')
soup.title

<title>DashItUp, A Dashboard</title>

---

## User Count

In [6]:
# code here
# class="w3-row-padding w3-margin-bottom da-dashboardCards"
# attribute value will be a DICTIONARY (curly brackets)
dashboard_cards_soup = soup.find('div', attrs={'class': 'da-dashboardCards'})
dashboard_cards_soup

<div class="w3-row-padding w3-margin-bottom da-dashboardCards">
<div class="w3-quarter">
<div class="w3-container w3-red w3-padding-16">
<div class="w3-left"><i class="fa fa-comment w3-xxxlarge"></i></div>
<div class="w3-right">
<h3 class="da-dashboardCardMetric">52</h3>
</div>
<div class="w3-clear"></div>
<h4 class="da-dashboardCardLabel">Messages</h4>
</div>
</div>
<div class="w3-quarter">
<div class="w3-container w3-blue w3-padding-16">
<div class="w3-left"><i class="fa fa-eye w3-xxxlarge"></i></div>
<div class="w3-right">
<h3 class="da-dashboardCardMetric">99</h3>
</div>
<div class="w3-clear"></div>
<h4 class="da-dashboardCardLabel">Views</h4>
</div>
</div>
<div class="w3-quarter">
<div class="w3-container w3-teal w3-padding-16">
<div class="w3-left"><i class="fa fa-share-alt w3-xxxlarge"></i></div>
<div class="w3-right">
<h3 class="da-dashboardCardMetric">23</h3>
</div>
<div class="w3-clear"></div>
<h4 class="da-dashboardCardLabel">Shares</h4>
</div>
</div>
<div class="w3-quarter"

In [8]:
dashboard_cards = dashboard_cards_soup.findAll(recursive=False)
dashboard_cards

[<div class="w3-quarter">
 <div class="w3-container w3-red w3-padding-16">
 <div class="w3-left"><i class="fa fa-comment w3-xxxlarge"></i></div>
 <div class="w3-right">
 <h3 class="da-dashboardCardMetric">52</h3>
 </div>
 <div class="w3-clear"></div>
 <h4 class="da-dashboardCardLabel">Messages</h4>
 </div>
 </div>,
 <div class="w3-quarter">
 <div class="w3-container w3-blue w3-padding-16">
 <div class="w3-left"><i class="fa fa-eye w3-xxxlarge"></i></div>
 <div class="w3-right">
 <h3 class="da-dashboardCardMetric">99</h3>
 </div>
 <div class="w3-clear"></div>
 <h4 class="da-dashboardCardLabel">Views</h4>
 </div>
 </div>,
 <div class="w3-quarter">
 <div class="w3-container w3-teal w3-padding-16">
 <div class="w3-left"><i class="fa fa-share-alt w3-xxxlarge"></i></div>
 <div class="w3-right">
 <h3 class="da-dashboardCardMetric">23</h3>
 </div>
 <div class="w3-clear"></div>
 <h4 class="da-dashboardCardLabel">Shares</h4>
 </div>
 </div>,
 <div class="w3-quarter">
 <div class="w3-container w3

In [9]:
dashboard_cards[-1]

<div class="w3-quarter">
<div class="w3-container w3-orange w3-text-white w3-padding-16">
<div class="w3-left"><i class="fa fa-users w3-xxxlarge"></i></div>
<div class="w3-right">
<h3 class="da-dashboardCardMetric">50</h3>
</div>
<div class="w3-clear"></div>
<h4 class="da-dashboardCardLabel">Users</h4>
</div>
</div>

In [11]:
dashboard_cards[-1].find('h3', attrs={'class': 'da-dashboardCardMetric'})

<h3 class="da-dashboardCardMetric">50</h3>

In [12]:
dashboard_cards[-1].find('h3', attrs={'class': 'da-dashboardCardMetric'}).text

'50'

---

## Any _system errors_, how recent?
System errors can be one of the following: 

* `Database error`
* `CPU overload`
* `Out of memory`

In [13]:
# code here
feeds = soup.find('div', attrs={'class': 'da-feeds'})
feeds

<div class="w3-twothird da-feeds">
<h5>Feeds</h5>
<table class="w3-table w3-striped w3-white">
<tr>
<td><i class="fa fa-user w3-text-blue w3-large"></i></td>
<td>New record, over 90 views.</td>
<td><i>10 mins</i></td>
</tr>
<tr>
<td><i class="fa fa-bell w3-text-red w3-large"></i></td>
<td>Database error.</td>
<td><i>15 mins</i></td>
</tr>
<tr>
<td><i class="fa fa-users w3-text-yellow w3-large"></i></td>
<td>New record, over 40 users.</td>
<td><i>17 mins</i></td>
</tr>
<tr>
<td><i class="fa fa-comment w3-text-red w3-large"></i></td>
<td>New comments.</td>
<td><i>25 mins</i></td>
</tr>
<tr>
<td><i class="fa fa-bookmark w3-text-blue w3-large"></i></td>
<td>Check transactions.</td>
<td><i>28 mins</i></td>
</tr>
<tr>
<td><i class="fa fa-laptop w3-text-red w3-large"></i></td>
<td>CPU overload.</td>
<td><i>35 mins</i></td>
</tr>
<tr>
<td><i class="fa fa-share-alt w3-text-green w3-large"></i></td>
<td>New shares.</td>
<td><i>39 mins</i></td>
</tr>
</table>
</div>

In [14]:
feeds_dataframes = pd.read_html(str(feeds))
feeds_dataframes

[    0                           1        2
 0 NaN  New record, over 90 views.  10 mins
 1 NaN             Database error.  15 mins
 2 NaN  New record, over 40 users.  17 mins
 3 NaN               New comments.  25 mins
 4 NaN         Check transactions.  28 mins
 5 NaN               CPU overload.  35 mins
 6 NaN                 New shares.  39 mins]

In [22]:
feeds_dataframe = feeds_dataframes[0]
feeds_dataframe

Unnamed: 0,icon,message,minutes
0,,"New record, over 90 views.",10 mins
1,,Database error.,15 mins
2,,"New record, over 40 users.",17 mins
3,,New comments.,25 mins
4,,Check transactions.,28 mins
5,,CPU overload.,35 mins
6,,New shares.,39 mins


In [23]:
feeds_dataframe.columns = ['icon', 'message', 'minutes']

In [24]:
feeds_dataframe = feeds_dataframe.drop(columns=('icon'))

In [25]:

error_messages = ['Database error.', 'CPU overload.', 'out of memory.']

feeds_dataframe[
    feeds_dataframe['message'].isin(error_messages)
]



Unnamed: 0,message,minutes
1,Database error.,15 mins
5,CPU overload.,35 mins


---

## Bounce Rate

In [None]:
# code here

---

## Top and bottom countries by utility

In [None]:
# code here

---

## Most recent user names with links to their profiles

In [27]:
# code here
recent_users = soup.find('div', attrs={'class':'da-recentUsers'})
print(recent_users.prettify())

<div class="w3-container da-recentUsers">
 <h5>
  Recent Users
 </h5>
 <ul class="w3-ul w3-card-4 w3-white">
  <li class="w3-padding-16">
   <a href="#/profile/mike">
    <img class="w3-left w3-circle w3-margin-right" src="../assets/mike.png" style="width:35px"/>
    <span class="w3-xlarge">
     Mike
    </span>
    <br/>
   </a>
  </li>
  <li class="w3-padding-16">
   <a href="#/profile/jill">
    <img class="w3-left w3-circle w3-margin-right" src="../assets/jill.png" style="width:35px"/>
    <span class="w3-xlarge">
     Jill
    </span>
    <br/>
   </a>
  </li>
  <li class="w3-padding-16">
   <a href="#/profile/jane">
    <img class="w3-left w3-circle w3-margin-right" src="../assets/jane.png" style="width:35px"/>
    <span class="w3-xlarge">
     Jane
    </span>
    <br/>
   </a>
  </li>
 </ul>
</div>



In [29]:
recent_users.find_all('li')

[<li class="w3-padding-16">
 <a href="#/profile/mike">
 <img class="w3-left w3-circle w3-margin-right" src="../assets/mike.png" style="width:35px"/>
 <span class="w3-xlarge">Mike</span><br/>
 </a>
 </li>,
 <li class="w3-padding-16">
 <a href="#/profile/jill">
 <img class="w3-left w3-circle w3-margin-right" src="../assets/jill.png" style="width:35px"/>
 <span class="w3-xlarge">Jill</span><br/>
 </a>
 </li>,
 <li class="w3-padding-16">
 <a href="#/profile/jane">
 <img class="w3-left w3-circle w3-margin-right" src="../assets/jane.png" style="width:35px"/>
 <span class="w3-xlarge">Jane</span><br/>
 </a>
 </li>]

In [30]:
for user in recent_users.find_all('li'):
    print(user)
    print()

<li class="w3-padding-16">
<a href="#/profile/mike">
<img class="w3-left w3-circle w3-margin-right" src="../assets/mike.png" style="width:35px"/>
<span class="w3-xlarge">Mike</span><br/>
</a>
</li>

<li class="w3-padding-16">
<a href="#/profile/jill">
<img class="w3-left w3-circle w3-margin-right" src="../assets/jill.png" style="width:35px"/>
<span class="w3-xlarge">Jill</span><br/>
</a>
</li>

<li class="w3-padding-16">
<a href="#/profile/jane">
<img class="w3-left w3-circle w3-margin-right" src="../assets/jane.png" style="width:35px"/>
<span class="w3-xlarge">Jane</span><br/>
</a>
</li>



In [32]:
for user in recent_users.find_all('li'):
    a_tag = user.find('a')
    href = a_tag['href']
    print(href)
    print()

#/profile/mike

#/profile/jill

#/profile/jane



In [34]:
for user in recent_users.find_all('li'):
    span = user.find('span')
    username = span.text
    print(username)
    print()

Mike

Jill

Jane



In [35]:
for user in recent_users.find_all('li'):
    a_tag = user.find('a')
    href = a_tag['href']
    
    span = user.find('span')
    username = span.text
    
    print(username, href)
    print()

Mike #/profile/mike

Jill #/profile/jill

Jane #/profile/jane



---

## Name of the user that owns the dashboard

In [None]:
# code here