# Downloading weather data


extracting information about the local weather from the National Weather Service website. The first step is to find the page we want to scrape. We’ll extract weather information about downtown San Francisco from https://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168. The page has information about the extended forecast for the next week, including time of day, temperature, and a brief description of the conditions.

Download the page and start parsing it. In the below code, we:

    Download the web page containing the forecast.
    Create a BeautifulSoup class to parse the page.
    Find the div with id seven-day-forecast, and assign to seven_day
    Inside seven_day, find each individual forecast item.
    Extract and print the first forecast item.


In [3]:
import requests
from bs4 import BeautifulSoup

page = requests.get("http://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Today
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Today: Showers and thunderstorms before 1pm, then showers likely and possibly a thunderstorm between 1pm and 2pm, then a chance of showers after 2pm.  High near 56. West wind around 17 mph, with gusts as high as 23 mph.  Chance of precipitation is 80%. New rainfall amounts between a tenth and quarter of an inch, except higher amounts possible in thunderstorms. " class="forecast-icon" src="newimages/medium/shra80.png" title="Today: Showers and thunderstorms before 1pm, then showers likely and possibly a thunderstorm between 1pm and 2pm, then a chance of showers after 2pm.  High near 56. West wind around 17 mph, with gusts as high as 23 mph.  Chance of precipitation is 80%. New rainfall amounts between a tenth and quarter of an inch, except higher amounts possible in thunderstorms. "/>
 </p>
 <p class="short-desc">
  Showers
 </p>
 <p class="temp temp-high">
  High: 56 °F
 </p>
</div

# Extracting information from the page

inside the forecast item tonight is all the information we want. There are 4 pieces of information we can extract:

    The name of the forecast item — in this case, Tonight.
    The description of the conditions — this is stored in the title property of img.
    A short description of the conditions — in this case, Mostly Clear.
    The temperature low — in this case, 49 degrees.

We’ll extract the name of the forecast item, the short description, and the temperature first, since they’re all similar:

In [4]:
period = tonight.find(class_="period-name").get_text()
short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()
print(period)
print(short_desc)
print(temp)

Today
Showers
High: 56 °F


Next, extract the title attribute from the img tag. To do this, we just treat the BeautifulSoup object like a dictionary, and pass in the attribute we want as a key:

In [5]:
img = tonight.find("img")
desc = img['title']
print(desc)

Today: Showers and thunderstorms before 1pm, then showers likely and possibly a thunderstorm between 1pm and 2pm, then a chance of showers after 2pm.  High near 56. West wind around 17 mph, with gusts as high as 23 mph.  Chance of precipitation is 80%. New rainfall amounts between a tenth and quarter of an inch, except higher amounts possible in thunderstorms. 


# Extracting all the information from the page

extract everything at once.

In the below code, we:

    Select all items with the class period-name inside an item with the class tombstone-container in seven_day.
    Use a list comprehension to call the get_text method on each BeautifulSoup object.


In [6]:
period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods

['Today',
 'Tonight',
 'Friday',
 'FridayNight',
 'Saturday',
 'SaturdayNight',
 'Sunday',
 'SundayNight',
 'Monday']

 This gets us each of the period names, in order. We can apply the same technique to get the other 3 fields:

In [9]:
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]
print(short_descs)
print(temps)
print(descs)

['Showers', 'ChanceShowers thenSlight ChanceShowers', 'GradualClearing', 'IncreasingClouds', 'Rain', 'Showers', 'ChanceShowers thenSlight ChanceT-storms', 'Slight ChanceT-storms', 'Sunny']
['High: 56 °F', 'Low: 52 °F', 'High: 60 °F', 'Low: 52 °F', 'High: 58 °F', 'Low: 53 °F', 'High: 59 °F', 'Low: 51 °F', 'High: 61 °F']
['Today: Showers and thunderstorms before 1pm, then showers likely and possibly a thunderstorm between 1pm and 2pm, then a chance of showers after 2pm.  High near 56. West wind around 17 mph, with gusts as high as 23 mph.  Chance of precipitation is 80%. New rainfall amounts between a tenth and quarter of an inch, except higher amounts possible in thunderstorms. ', 'Tonight: A 50 percent chance of showers, mainly before 11pm.  Cloudy, with a low around 52. West wind 13 to 17 mph, with gusts as high as 23 mph.  New precipitation amounts of less than a tenth of an inch possible. ', 'Friday: Cloudy through mid morning, then gradual clearing, with a high near 60. West wind 1

now combine the data into a Pandas DataFrame and analyze it. A DataFrame is an object that can store tabular data, making data analysis easy. If you want to learn more about Pandas, check out our free to start course here.

In order to do this, we’ll call the DataFrame class, and pass in each list of items that we have. We pass them in as part of a dictionary. Each dictionary key will become a column in the DataFrame, and each list will become the values in the column:

In [10]:
import pandas as pd
weather = pd.DataFrame({
"period": periods,
"short_desc": short_descs,
"temp": temps,
"desc":descs
})
weather

Unnamed: 0,period,short_desc,temp,desc
0,Today,Showers,High: 56 °F,"Today: Showers and thunderstorms before 1pm, t..."
1,Tonight,ChanceShowers thenSlight ChanceShowers,Low: 52 °F,"Tonight: A 50 percent chance of showers, mainl..."
2,Friday,GradualClearing,High: 60 °F,"Friday: Cloudy through mid morning, then gradu..."
3,FridayNight,IncreasingClouds,Low: 52 °F,"Friday Night: Increasing clouds, with a low ar..."
4,Saturday,Rain,High: 58 °F,"Saturday: Rain, mainly after 11am. High near ..."
5,SaturdayNight,Showers,Low: 53 °F,"Saturday Night: Rain, mainly before 2am, then ..."
6,Sunday,ChanceShowers thenSlight ChanceT-storms,High: 59 °F,"Sunday: A chance of rain before 11am, then a s..."
7,SundayNight,Slight ChanceT-storms,Low: 51 °F,Sunday Night: A slight chance of showers and t...
8,Monday,Sunny,High: 61 °F,"Monday: Sunny, with a high near 61."


We can now do some analysis on the data. For example, we can use a regular expression and the Series.str.extract method to pull out the numeric temperature values:

In [11]:
temp_nums = weather["temp"].str.extract("(?P<temp_num>\d+)", expand=False)
weather["temp_num"] = temp_nums.astype('int')
temp_nums

0    56
1    52
2    60
3    52
4    58
5    53
6    59
7    51
8    61
Name: temp_num, dtype: object

find the mean of all the high and low temperatures:

In [12]:
weather["temp_num"].mean()

55.77777777777778

Select the rows that happen at night:

In [13]:
is_night = weather["temp"].str.contains("Low")
weather["is_night"] = is_night
is_night

0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
Name: temp, dtype: bool

In [14]:
weather[is_night]

Unnamed: 0,period,short_desc,temp,desc,temp_num,is_night
1,Tonight,ChanceShowers thenSlight ChanceShowers,Low: 52 °F,"Tonight: A 50 percent chance of showers, mainl...",52,True
3,FridayNight,IncreasingClouds,Low: 52 °F,"Friday Night: Increasing clouds, with a low ar...",52,True
5,SaturdayNight,Showers,Low: 53 °F,"Saturday Night: Rain, mainly before 2am, then ...",53,True
7,SundayNight,Slight ChanceT-storms,Low: 51 °F,Sunday Night: A slight chance of showers and t...,51,True
