#Downloading weather data

In [2]:
import requests
from bs4 import BeautifulSoup
page = requests.get("http://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Today
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Today: Patchy fog before 9am.  Otherwise, mostly cloudy, then gradually becoming sunny, with a high near 59. Light west northwest wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="DualImage.php?i=fg&amp;j=few" title="Today: Patchy fog before 9am.  Otherwise, mostly cloudy, then gradually becoming sunny, with a high near 59. Light west northwest wind becoming west 5 to 10 mph in the afternoon. "/>
 </p>
 <p class="short-desc">
  Patchy Fog
  <br/>
  then Sunny
 </p>
 <p class="temp temp-high">
  High: 59 °F
 </p>
</div>


#Extracting information from the page

In [3]:
period = tonight.find(class_="period-name").get_text()
short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()
print(period)
print(short_desc)
print(temp)

Today
Patchy Fogthen Sunny
High: 59 °F


In [4]:
img = tonight.find("img")
desc = img['title']
print(desc)

Today: Patchy fog before 9am.  Otherwise, mostly cloudy, then gradually becoming sunny, with a high near 59. Light west northwest wind becoming west 5 to 10 mph in the afternoon. 


#Extracting all the information from the page
Now that we know how to extract each individual piece of information, we can combine our knowledge with css selectors and list comprehensions to extract everything at once.

In the below code, we:

Select all items with the class period-name inside an item with the class tombstone-container in seven_day.
Use a list comprehension to call the get_text method on each BeautifulSoup object.

In [7]:
period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods

['Today',
 'Tonight',
 'Saturday',
 'SaturdayNight',
 'Sunday',
 'SundayNight',
 'Monday',
 'MondayNight',
 'Tuesday']

As you can see above, our technique gets us each of the period names, in order. We can apply the same technique to get the other 3 fields:

In [12]:
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]
print(short_descs,temps,descs)


['Patchy Fogthen Sunny', 'Mostly Clear', 'Sunny', 'Clear', 'Sunny', 'Mostly Clear', 'Mostly Sunny', 'Mostly Clear', 'Sunny'] ['High: 59 °F', 'Low: 46 °F', 'High: 59 °F', 'Low: 45 °F', 'High: 61 °F', 'Low: 46 °F', 'High: 63 °F', 'Low: 48 °F', 'High: 67 °F'] ['Today: Patchy fog before 9am.  Otherwise, mostly cloudy, then gradually becoming sunny, with a high near 59. Light west northwest wind becoming west 5 to 10 mph in the afternoon. ', 'Tonight: Mostly clear, with a low around 46. West wind 6 to 11 mph. ', 'Saturday: Sunny, with a high near 59. West northwest wind 7 to 15 mph, with gusts as high as 20 mph. ', 'Saturday Night: Clear, with a low around 45. West northwest wind 5 to 15 mph, with gusts as high as 18 mph. ', 'Sunday: Sunny, with a high near 61. Light northwest wind becoming west 8 to 13 mph in the afternoon. ', 'Sunday Night: Mostly clear, with a low around 46.', 'Monday: Mostly sunny, with a high near 63.', 'Monday Night: Mostly clear, with a low around 48.', 'Tuesday: Sun

#Combining our data into a Pandas Dataframe

In [16]:
import pandas as pd
weather = pd.DataFrame({
    "period": periods,
    "short_desc": short_descs,
    "temp": temps,
    "desc":descs
})
weather

Unnamed: 0,period,short_desc,temp,desc
0,Today,Patchy Fogthen Sunny,High: 59 °F,"Today: Patchy fog before 9am. Otherwise, most..."
1,Tonight,Mostly Clear,Low: 46 °F,"Tonight: Mostly clear, with a low around 46. W..."
2,Saturday,Sunny,High: 59 °F,"Saturday: Sunny, with a high near 59. West nor..."
3,SaturdayNight,Clear,Low: 45 °F,"Saturday Night: Clear, with a low around 45. W..."
4,Sunday,Sunny,High: 61 °F,"Sunday: Sunny, with a high near 61. Light nort..."
5,SundayNight,Mostly Clear,Low: 46 °F,"Sunday Night: Mostly clear, with a low around 46."
6,Monday,Mostly Sunny,High: 63 °F,"Monday: Mostly sunny, with a high near 63."
7,MondayNight,Mostly Clear,Low: 48 °F,"Monday Night: Mostly clear, with a low around 48."
8,Tuesday,Sunny,High: 67 °F,"Tuesday: Sunny, with a high near 67."


We can now do some analysis on the data. For example, we can use a regular expression and the Series.str.extract method to pull out the numeric temperature values:

In [17]:
temp_nums = weather["temp"].str.extract("(?P<temp_num>d+)", expand=False)
weather["temp_num"] = temp_nums.astype('int')
temp_nums

0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
Name: temp_num, dtype: object

We could then find the mean of all the high and low temperatures:

In [None]:
weather["temp_num"].mean()