# Spotlight - Beautiful Soup
Harish Varma Siravuri<br>
Z1795233
## Contents
* [Introduction](#Introduction)
* [Installation](#Installation)
* [Overview](#Overview)
* [Web Scraping with Beautiful Soup](#scraping)

## Introduction<a class="anchor" id="Introduction"></a>
Beautiful Soup is a Python library that can be used to extract data out of HTML and XML files. The term 'Beautiful Soup' is a reference to the Mock Turtle's song from the 10th chapter of *Alice's Adventures in Wonderland*. This is particularly useful when a user wants to extract useful data from highly unstructured web pages. It helps users get specific data out of web pages, parse them, clean up the HTML code and save relevant data.

Beautiful Soup documentation can be found here.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

## Installation<a class="anchor" id="Installation"></a>
Beautiful soup can be installed by entering the following in the terminal:
```
pip install beautifulsoup4
```
If using the anaconda environment, entering the following in the anaconda prompt will do the trick:
```
conda install -c anaconda beautifulsoup4
```

## Overview<a class="anchor" id="Overview"></a>
### Related Information
#### Components of a Web Page
Whenever a web browser visits a website, it actually makes a `GET` request to the server hosting the webstie. In response, it receives a group of files - most commonly - HTML, CSS, JS and any media files that are displayed. For the process of web scraping we will focus primarily on the HTML component.

#### HTML
HTML is a markup language that tells a web browser how to render text and media resulting in the display of what users typically see when they visit a website. It consists of *tags* that specify various things including which group certain content belongs to and how it should be rendered.

#### requests Library
Since the aim of web scraping is to automate extraction of data from webpages, it would help immensely to programmatically make `GET` requests instead of using a web broswer to get each web page. The *requests* library in Python allows us to do just that.

### BeautifulSoup object
BeautifulSoup is the object we create to store a web page in. For example - 

In [1]:
from bs4 import BeautifulSoup
import requests

html_page = requests.get('https://www.gutenberg.org/')
soup = BeautifulSoup(html_page.text, 'html.parser')
print(soup)

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Free eBooks | Project Gutenberg</title>
<link href="/gutenberg/style.css?v=1.1" rel="stylesheet"/>
<link href="/gutenberg/collapsible.css?1.1" rel="stylesheet"/>
<link href="/gutenberg/new_nav.css?v=1.321231" rel="stylesheet"/>
<link href="/gutenberg/pg-desktop-one.css" rel="stylesheet"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="books, ebooks, free, kindle, android, iphone, ipad" name="keywords">
<meta content="wucOEvSnj5kP3Ts_36OfP64laakK-1mVTg-ptrGC9io" name="google-site-verification"/>
<meta content="4WNaCljsE-A82vP_ih2H_UqXZvM" name="alexaVerifyID"/>
<link href="https://www.gnu.org/copyleft/fdl.html" rel="copyright">
<link href="/gutenberg/favicon.ico?v=1.1" rel="shortcut icon">
<meta content="Project Gutenberg" property="og:title"/>
<meta content="website" property="og:type"/>
<meta content="https://www.gutenberg.org/" property="og:url"/

### Accessing Tags
#### By Tag Names

In [3]:
print(soup.title)
print(soup.p)

<title>Free eBooks | Project Gutenberg</title>
<p><a href="/donate/">Donation</a></p>


#### By searching for occurences of specific Tags

In [4]:
# Finding first occurence of a tag

print(soup.find('title'))
print(soup.find('p'))

<title>Free eBooks | Project Gutenberg</title>
<p><a href="/donate/">Donation</a></p>


In [5]:
# Finding all occurences of a tag

print(soup.find_all('title'))
print(soup.find_all('p'))

[<title>Free eBooks | Project Gutenberg</title>]
[<p><a href="/donate/">Donation</a></p>, <p><strong>This is the new Project Gutenberg site</strong>  See the <a href="/help/new_website">new website</a> page for information about currently known issues, and how to report problems or suggest changes.</p>, <p>There is a *current issue* with the footer bar obscuring search results. We are aware of the issue, and are working to resolve it.</p>, <p>Choose among free epub and Kindle eBooks, download them or read them online. You will find the worldâs great literature here, with focus on older works for which U.S. copyright has expired. Thousands of volunteers digitized and diligently proofread the eBooks, for you to enjoy.</p>, <p><i>Some of our latest eBooks</i> <a href="https://dev.gutenberg.org/browse/recent/last1">Click Here for more latest books!</a></p>, <p><strong>No fee or registration!</strong> Everything from Project Gutenberg is gratis, libre, and completely without cost to reade

### Accessing Tag Content
#### Fetching all text without tags

In [6]:
print(soup.get_text())





Free eBooks | Project Gutenberg



























Menu▾



About
          ▾

▾


About Project Gutenberg
Collection Development
Contact Us
History & Philosophy
Permissions & License
Privacy Policy
Terms of Use



Search and Browse
      	  ▾

▾


Book Search
Bookshelves
Frequently Downloaded
Offline Catalogs



Help
          ▾

▾


All help topics →
Copyright Procedures
Errata, Fixes and Bug Reports
File Formats
Frequently Asked Questions
Policies →
Public Domain eBook Submission
Submitting Your Own Work
Tablets, Phones and eReaders
The Attic →


Donate










Donation







Welcome to Project Gutenberg
Project Gutenberg is a library of over 60,000 free eBooks

This is the new Project Gutenberg site  See the new website page for information about currently known issues, and how to report problems or suggest changes.
There is a *current issue* with the footer bar obscuring search results. We are aware of the issue, and are working to resolve it.

Choose among free epu

#### Fetching all text from a set of tags

In [10]:
for p_text in soup.find_all('p'):
    print(p_text.string)

Donation
None
There is a *current issue* with the footer bar obscuring search results. We are aware of the issue, and are working to resolve it.
Choose among free epub and Kindle eBooks, download them or read them online. You will find the worldâs great literature here, with focus on older works for which U.S. copyright has expired. Thousands of volunteers digitized and diligently proofread the eBooks, for you to enjoy.
None
None
None
Project Gutenberg eBooks may be freely used in the United States because most are not protected by U.S. copyright law. They may not be free of copyright in other countries. Readers outside of the United States must check the copyright terms of their countries before accessing, downloading or redistributing eBooks. We also have a number of copyrighted titles, for which the copyright holder has given permission for unlimited non-commercial worldwide use.
None


In [11]:
for p_text in soup.find_all('p'):
    print(p_text.text)

Donation
This is the new Project Gutenberg site  See the new website page for information about currently known issues, and how to report problems or suggest changes.
There is a *current issue* with the footer bar obscuring search results. We are aware of the issue, and are working to resolve it.
Choose among free epub and Kindle eBooks, download them or read them online. You will find the worldâs great literature here, with focus on older works for which U.S. copyright has expired. Thousands of volunteers digitized and diligently proofread the eBooks, for you to enjoy.
Some of our latest eBooks Click Here for more latest books!
No fee or registration! Everything from Project Gutenberg is gratis, libre, and completely without cost to readers. If you find Project Gutenberg useful, please consider a small donation, to help Project Gutenberg digitize more books, maintain its online presence, and improve Project Gutenberg programs and offerings. Other ways to help include digitizing, pro

#### Fetching all text from a set of tags of a certain class

In [12]:
for div_text in soup.find_all('div', class_='cover_title'):
    print(div_text.text)


Books and Printing; a Treasury for Typophiles


In the Garden of Delight


De spoorzoeker


PellÃ©astres


The Trinity Archive, Vol. I, No. 4, February 1888


Death disarmed of its sting


SÃ¤mtliche Werke 7 8: Der JÃ¼ngling


Flyvefisken Â»PrometheusÂ«


Mirage For Planet X


A Manual of Photographic Chemistry: Including the Practice of the Collodion



#### Fetching all hyperlinks from a webpage

In [13]:
# Finding all links in the page
for link in soup.find_all('a'):
    print(link)

<a class="no-hover" href="/" id="main_logo">
<img alt="Project Gutenberg" draggable="false" src="/gutenberg/pg-logo-129x80.png"/>
</a>
<a href="/about/">About
          <span class="drop-icon">▾</span>
</a>
<a href="/about/">About Project Gutenberg</a>
<a href="/policy/collection_development.html">Collection Development</a>
<a href="/about/contact_information.html">Contact Us</a>
<a href="/about/background/">History &amp; Philosophy</a>
<a href="/policy/permission.html">Permissions &amp; License</a>
<a href="/policy/privacy_policy.html">Privacy Policy</a>
<a href="/policy/terms_of_use.html">Terms of Use</a>
<a href="/ebooks/">Search and Browse
      	  <span class="drop-icon">▾</span>
</a>
<a href="/ebooks/">Book Search</a>
<a href="/ebooks/bookshelf/">Bookshelves</a>
<a href="/browse/scores/top">Frequently Downloaded</a>
<a href="/ebooks/offline_catalogs.html">Offline Catalogs</a>
<a href="/help/">Help
          <span class="drop-icon">▾</span>
</a>
<a href="/help/">All help topics →<

In [14]:
# Finding text from all links in the page
for link in soup.find_all('a'):
    print(link.text)




About
          ▾

About Project Gutenberg
Collection Development
Contact Us
History & Philosophy
Permissions & License
Privacy Policy
Terms of Use
Search and Browse
      	  ▾

Book Search
Bookshelves
Frequently Downloaded
Offline Catalogs
Help
          ▾

All help topics →
Copyright Procedures
Errata, Fixes and Bug Reports
File Formats
Frequently Asked Questions
Policies →
Public Domain eBook Submission
Submitting Your Own Work
Tablets, Phones and eReaders
The Attic →
Donate
Donation
new website






Books and Printing; a Treasury for Typophiles









In the Garden of Delight









De spoorzoeker









PellÃ©astres









The Trinity Archive, Vol. I, No. 4, February 1888









Death disarmed of its sting









SÃ¤mtliche Werke 7 8: Der JÃ¼ngling









Flyvefisken Â»PrometheusÂ«









Mirage For Planet X









A Manual of Photographic Chemistry: Including the Practice of the Collodion



Click Here for more latest books!
Search and browse
Bookshelves
Fre

In [15]:
# Fetching the actual URLs from the anchor tags
for link in soup.find_all('a'):
    print(link.get('href'))

/
/about/
/about/
/policy/collection_development.html
/about/contact_information.html
/about/background/
/policy/permission.html
/policy/privacy_policy.html
/policy/terms_of_use.html
/ebooks/
/ebooks/
/ebooks/bookshelf/
/browse/scores/top
/ebooks/offline_catalogs.html
/help/
/help/
/help/copyright.html
/help/errata.html
/help/file_formats.html
/help/faq.html
/policy/
/help/public_domain_ebook_submission.html
/help/submitting_your_own_work.html
/help/mobile.html
/attic/
/donate/
/donate/
/help/new_website
/ebooks/63730
/ebooks/63729
/ebooks/63728
/ebooks/63726
/ebooks/63725
/ebooks/63724
/ebooks/63723
/ebooks/63722
/ebooks/63721
/ebooks/63710
https://dev.gutenberg.org/browse/recent/last1
/ebooks/
/ebooks/bookshelf/
/browse/scores/top
/ebooks/search/?sort_order=downloads
/ebooks/offline_catalogs.html
/ebooks/search/?query=&submit_search=Search&sort_order=release_date
http://self.gutenberg.org
/help/faq.html
/help/
/help/mobile.html
https://www.pgdp.net
/help/errata.html
https://librivox.

### Traversing BeautifulSoup objects
#### .contents

In [20]:
print(soup.head.contents)

['\n', <meta charset="utf-8"/>, '\n', <title>Free eBooks | Project Gutenberg</title>, '\n', <link href="/gutenberg/style.css?v=1.1" rel="stylesheet"/>, '\n', <link href="/gutenberg/collapsible.css?1.1" rel="stylesheet"/>, '\n', <link href="/gutenberg/new_nav.css?v=1.321231" rel="stylesheet"/>, '\n', <link href="/gutenberg/pg-desktop-one.css" rel="stylesheet"/>, '\n', <meta content="width=device-width, initial-scale=1" name="viewport"/>, '\n', <meta content="books, ebooks, free, kindle, android, iphone, ipad" name="keywords">
<meta content="wucOEvSnj5kP3Ts_36OfP64laakK-1mVTg-ptrGC9io" name="google-site-verification"/>
<meta content="4WNaCljsE-A82vP_ih2H_UqXZvM" name="alexaVerifyID"/>
<link href="https://www.gnu.org/copyleft/fdl.html" rel="copyright">
<link href="/gutenberg/favicon.ico?v=1.1" rel="shortcut icon">
<meta content="Project Gutenberg" property="og:title"/>
<meta content="website" property="og:type"/>
<meta content="https://www.gutenberg.org/" property="og:url"/>
<meta content

#### .children

In [21]:
for child in soup.head.children:
    print(child)



<meta charset="utf-8"/>


<title>Free eBooks | Project Gutenberg</title>


<link href="/gutenberg/style.css?v=1.1" rel="stylesheet"/>


<link href="/gutenberg/collapsible.css?1.1" rel="stylesheet"/>


<link href="/gutenberg/new_nav.css?v=1.321231" rel="stylesheet"/>


<link href="/gutenberg/pg-desktop-one.css" rel="stylesheet"/>


<meta content="width=device-width, initial-scale=1" name="viewport"/>


<meta content="books, ebooks, free, kindle, android, iphone, ipad" name="keywords">
<meta content="wucOEvSnj5kP3Ts_36OfP64laakK-1mVTg-ptrGC9io" name="google-site-verification"/>
<meta content="4WNaCljsE-A82vP_ih2H_UqXZvM" name="alexaVerifyID"/>
<link href="https://www.gnu.org/copyleft/fdl.html" rel="copyright">
<link href="/gutenberg/favicon.ico?v=1.1" rel="shortcut icon">
<meta content="Project Gutenberg" property="og:title"/>
<meta content="website" property="og:type"/>
<meta content="https://www.gutenberg.org/" property="og:url"/>
<meta content="Project Gutenberg is a library of free

#### .parent

In [18]:
print(soup.title.parent)

<head>
<meta charset="utf-8"/>
<title>Free eBooks | Project Gutenberg</title>
<link href="/gutenberg/style.css?v=1.1" rel="stylesheet"/>
<link href="/gutenberg/collapsible.css?1.1" rel="stylesheet"/>
<link href="/gutenberg/new_nav.css?v=1.321231" rel="stylesheet"/>
<link href="/gutenberg/pg-desktop-one.css" rel="stylesheet"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="books, ebooks, free, kindle, android, iphone, ipad" name="keywords">
<meta content="wucOEvSnj5kP3Ts_36OfP64laakK-1mVTg-ptrGC9io" name="google-site-verification"/>
<meta content="4WNaCljsE-A82vP_ih2H_UqXZvM" name="alexaVerifyID"/>
<link href="https://www.gnu.org/copyleft/fdl.html" rel="copyright">
<link href="/gutenberg/favicon.ico?v=1.1" rel="shortcut icon">
<meta content="Project Gutenberg" property="og:title"/>
<meta content="website" property="og:type"/>
<meta content="https://www.gutenberg.org/" property="og:url"/>
<meta content="Project Gutenberg is a library of free eBooks."

#### .parents

In [22]:
for parent in soup.title.parents:
    print(parent)

<head>
<meta charset="utf-8"/>
<title>Free eBooks | Project Gutenberg</title>
<link href="/gutenberg/style.css?v=1.1" rel="stylesheet"/>
<link href="/gutenberg/collapsible.css?1.1" rel="stylesheet"/>
<link href="/gutenberg/new_nav.css?v=1.321231" rel="stylesheet"/>
<link href="/gutenberg/pg-desktop-one.css" rel="stylesheet"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="books, ebooks, free, kindle, android, iphone, ipad" name="keywords">
<meta content="wucOEvSnj5kP3Ts_36OfP64laakK-1mVTg-ptrGC9io" name="google-site-verification"/>
<meta content="4WNaCljsE-A82vP_ih2H_UqXZvM" name="alexaVerifyID"/>
<link href="https://www.gnu.org/copyleft/fdl.html" rel="copyright">
<link href="/gutenberg/favicon.ico?v=1.1" rel="shortcut icon">
<meta content="Project Gutenberg" property="og:title"/>
<meta content="website" property="og:type"/>
<meta content="https://www.gutenberg.org/" property="og:url"/>
<meta content="Project Gutenberg is a library of free eBooks."

## Web Scraping with Beautiful Soup<a class="anchor" id="scraping"></a>
We will now try to scrape weather forecast for DeKalb for the next 7 days from the National Weather Service

In [34]:
# Coordinates of the NIU CS building
url = 'https://forecast.weather.gov/MapClick.php?lat=41.9314&lon=-88.7651#.X636C2hKiUm'
req = requests.get(url)
soup = BeautifulSoup(req.text, 'html.parser')
print(soup)

<!DOCTYPE html>

<html class="no-js">
<head>
<!-- Meta -->
<meta content="width=device-width" name="viewport"/>
<link href="http://purl.org/dc/elements/1.1/" rel="schema.DC"/><title>National Weather Service</title><meta content="National Weather Service" name="DC.title"><meta content="NOAA National Weather Service National Weather Service" name="DC.description"/><meta content="US Department of Commerce, NOAA, National Weather Service" name="DC.creator"/><meta content="" name="DC.date.created" scheme="ISO8601"/><meta content="EN-US" name="DC.language" scheme="DCTERMS.RFC1766"/><meta content="weather, National Weather Service" name="DC.keywords"/><meta content="NOAA's National Weather Service" name="DC.publisher"/><meta content="National Weather Service" name="DC.contributor"/><meta content="http://www.weather.gov/disclaimer.php" name="DC.rights"/><meta content="General" name="rating"/><meta content="index,follow" name="robots"/>
<!-- Icons -->
<link href="./images/favicon.ico" rel="shor

In [57]:
seven_day_forecast_panel = soup.find(id="seven-day-forecast")
print(seven_day_forecast_panel)

<div class="panel panel-default" id="seven-day-forecast">
<div class="panel-heading">
<b>Extended Forecast for</b>
<h2 class="panel-title">
	    	    De Kalb IL	</h2>
</div>
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becoming northwest after midnight. Winds could gust as high as 30 mph.  Chance of precipitation is 60%. New precipitation amounts of less than a tenth of an inch possible. " class="forecast-icon" src="newimages/medium/nshra60.png" title="Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becomi

In [60]:
seven_day_forecast_items = seven_day_forecast_panel.find_all(class_="tombstone-container")
print(seven_day_forecast_items)

[<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becoming northwest after midnight. Winds could gust as high as 30 mph.  Chance of precipitation is 60%. New precipitation amounts of less than a tenth of an inch possible. " class="forecast-icon" src="newimages/medium/nshra60.png" title="Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becoming northwest after midnight. Winds could gust as high as 30 mph.  Chance of precipitation is 60%. New precipitation amounts of less than a tenth of an inch possible. "/></p><p class="short-desc">Showers<br/>Likely and<br/>Blustery</p><p class="temp temp-low">Low: 27 °F</p></div>, <div class="tombstone-container">
<p class="period-name">Friday<br/><br/

In [61]:
forecast_today = seven_day_forecast_items[0]
print(forecast_today.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Tonight
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becoming northwest after midnight. Winds could gust as high as 30 mph.  Chance of precipitation is 60%. New precipitation amounts of less than a tenth of an inch possible. " class="forecast-icon" src="newimages/medium/nshra60.png" title="Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becoming northwest after midnight. Winds could gust as high as 30 mph.  Chance of precipitation is 60%. New precipitation amounts of less than a tenth of an inch possible. "/>
 </p>
 <p class="short-desc">
  Showers
  <br/>
  Likely and
  <br/>
  Blustery
 </p>
 <p class="temp temp-low">
  Low: 27 °F
 </p>
</div>


In [62]:
period_name_today = forecast_today.find(class_="period-name").get_text()
short_desc_today = tonight.find(class_="short-desc").get_text()
temp_today = tonight.find(class_="temp").get_text()
print(period_name_today)
print(short_desc_today)
print(temp_today)

Tonight
ShowersLikely andBlustery
Low: 27 °F


In [63]:
img = forecast_today.find("img")
long_desc_today = img['title']
print(long_desc_today)

Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becoming northwest after midnight. Winds could gust as high as 30 mph.  Chance of precipitation is 60%. New precipitation amounts of less than a tenth of an inch possible. 


In [66]:
period_tags = seven_day_forecast_panel.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods

['Tonight',
 'Friday',
 'FridayNight',
 'Saturday',
 'SaturdayNight',
 'Sunday',
 'SundayNight',
 'Monday',
 'MondayNight']

In [67]:
short_descs = [sd.get_text() for sd in seven_day_forecast_panel.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day_forecast_panel.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day_forecast_panel.select(".tombstone-container img")]
print(short_descs)
print(temps)
print(descs)

['ShowersLikely andBlustery', 'Sunny', 'Mostly Clear', 'Breezy.Slight ChanceShowers thenChanceShowers', 'Breezy.Showers thenChanceShowers', 'Partly Sunnyand Breezy', 'Partly Cloudyand Breezythen PartlyCloudy', 'Sunny', 'Partly Cloudy']
['Low: 27 °F', 'High: 38 °F', 'Low: 24 °F', 'High: 49 °F', 'Low: 41 °F', 'High: 45 °F', 'Low: 28 °F', 'High: 44 °F', 'Low: 27 °F']
['Tonight: Showers likely, mainly before 1am.  Cloudy early, then gradual clearing, with a low around 27. Blustery, with a south southwest wind 10 to 20 mph becoming northwest after midnight. Winds could gust as high as 30 mph.  Chance of precipitation is 60%. New precipitation amounts of less than a tenth of an inch possible. ', 'Friday: Sunny, with a high near 38. Northwest wind 5 to 15 mph, with gusts as high as 25 mph. ', 'Friday Night: Mostly clear, with a low around 24. West southwest wind 5 to 10 mph becoming south southeast after midnight. ', 'Saturday: A 50 percent chance of showers, mainly after noon.  Increasing cl

In [46]:
import pandas as pd
weather = pd.DataFrame({
    "period": periods,
    "short_desc": short_descs,
    "temp": temps,
    "desc":descs
})
weather

Unnamed: 0,period,short_desc,temp,desc
0,Tonight,ShowersLikely andBlustery,Low: 27 °F,"Tonight: Showers likely, mainly before 1am. C..."
1,Friday,Sunny,High: 38 °F,"Friday: Sunny, with a high near 38. Northwest ..."
2,FridayNight,Mostly Clear,Low: 24 °F,"Friday Night: Mostly clear, with a low around ..."
3,Saturday,Breezy.Slight ChanceShowers thenChanceShowers,High: 49 °F,"Saturday: A 50 percent chance of showers, main..."
4,SaturdayNight,Breezy.Showers thenChanceShowers,Low: 41 °F,Saturday Night: Showers and possibly a thunder...
5,Sunday,Partly Sunnyand Breezy,High: 45 °F,"Sunday: Partly sunny, with a high near 45. Bre..."
6,SundayNight,Partly Cloudyand Breezythen PartlyCloudy,Low: 28 °F,"Sunday Night: Partly cloudy, with a low around..."
7,Monday,Sunny,High: 44 °F,"Monday: Sunny, with a high near 44."
8,MondayNight,Partly Cloudy,Low: 27 °F,"Monday Night: Partly cloudy, with a low around..."


In [55]:
temp_nums = weather["temp"].str.split(' ').str[1]
weather["temp_num"] = temp_nums.astype('int')
temp_nums

0    27
1    38
2    24
3    49
4    41
5    45
6    28
7    44
8    27
Name: temp, dtype: object

In [69]:
weather['temp_num'].mean()

35.888888888888886