### In this project we are going to scrape National Weather Service’s website using BeautifulSoup. The website contains up-to-date weather forecasts for every location in the USA.
#### The follow work flow will be used:
##### Request the content (source code) of a specific URL from the server
##### Download the content that is returned
##### Identify the elements of the page that are part of the table we want
##### Extract and (if necessary) reformat those elements into a dataset we can analyze or use in whatever way we require.

In [7]:
#first we are going to install the beautifulsoup library
!pip install beautifulsoup4



In [10]:
#Download the web page containing the forecast.
#Create a BeautifulSoup class to parse the page.
#Find the div with id seven-day-forecast, and assign to seven_day
#Inside seven_day, find each individual forecast item.
#Extract and print the first forecast item.

from bs4 import BeautifulSoup
import requests
page = requests.get("https://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
 <p class="period-name">
  Today
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Today: Rain.  High near 57. Southeast wind 9 to 14 mph becoming light and variable  in the afternoon.  Chance of precipitation is 80%. New precipitation amounts between a quarter and half of an inch possible. " class="forecast-icon" src="newimages/medium/ra80.png" title="Today: Rain.  High near 57. Southeast wind 9 to 14 mph becoming light and variable  in the afternoon.  Chance of precipitation is 80%. New precipitation amounts between a quarter and half of an inch possible. "/>
 </p>
 <p class="short-desc">
  Rain
 </p>
 <p class="temp temp-high">
  High: 57 °F
 </p>
</div>


In [11]:
page

<Response [200]>

In [None]:
#A response of 200 means that the page downloaded successfully. 

In [13]:
#Extracting information from the page

short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()
print(period)
print(short_desc)
print(temp)

Today
Rain
High: 57 °F


In [14]:
#Now, we can extract the title attribute from the img tag

img = tonight.find("img")
desc = img['title']
print(desc)

Today: Rain.  High near 57. Southeast wind 9 to 14 mph becoming light and variable  in the afternoon.  Chance of precipitation is 80%. New precipitation amounts between a quarter and half of an inch possible. 


In [15]:
#Now we will extract each individual piece of information

period_tags = seven_day.select(".tombstone-container .period-name") #Select all items with the class period-name inside an item with the class tombstone-container in seven_day.
periods = [pt.get_text() for pt in period_tags] #Use a list comprehension to call the get_text method on each BeautifulSoup object.
periods

['Today',
 'Tonight',
 'Friday',
 'FridayNight',
 'Saturday',
 'SaturdayNight',
 "NewYear'sDay",
 'SundayNight',
 'Monday']

In [16]:
#We can apply the same technique to get the other three fields:

short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]
print(short_descs)
print(temps)
print(descs)

['Rain', 'Rain', 'Rain', 'Rain', 'Rain andBreezy', 'Chance Rainthen MostlyClear', 'Sunny', 'Partly Cloudythen RainLikely', 'Rain']
['High: 57 °F', 'Low: 54 °F', 'High: 60 °F', 'Low: 55 °F', 'High: 60 °F', 'Low: 47 °F', 'High: 58 °F', 'Low: 46 °F', 'High: 55 °F']
['Today: Rain.  High near 57. Southeast wind 9 to 14 mph becoming light and variable  in the afternoon.  Chance of precipitation is 80%. New precipitation amounts between a quarter and half of an inch possible. ', 'Tonight: Rain.  Steady temperature around 54. South southeast wind 10 to 14 mph becoming west southwest in the evening.  Chance of precipitation is 100%. New precipitation amounts between a quarter and half of an inch possible. ', 'Friday: Rain.  High near 60. South southwest wind 5 to 14 mph, with gusts as high as 21 mph.  Chance of precipitation is 90%. New precipitation amounts between a quarter and half of an inch possible. ', 'Friday Night: Rain.  Low around 55. South southwest wind 15 to 17 mph, with gusts as h

In [17]:
#We can now combine the data into a Pandas DataFrame and analyze it. 

import pandas as pd
weather = pd.DataFrame({
    "period": periods,
    "short_desc": short_descs,
    "temp": temps,
    "desc":descs
})
weather

Unnamed: 0,period,short_desc,temp,desc
0,Today,Rain,High: 57 °F,Today: Rain. High near 57. Southeast wind 9 t...
1,Tonight,Rain,Low: 54 °F,Tonight: Rain. Steady temperature around 54. ...
2,Friday,Rain,High: 60 °F,Friday: Rain. High near 60. South southwest w...
3,FridayNight,Rain,Low: 55 °F,Friday Night: Rain. Low around 55. South sout...
4,Saturday,Rain andBreezy,High: 60 °F,"Saturday: Rain, mainly before 5pm. High near ..."
5,SaturdayNight,Chance Rainthen MostlyClear,Low: 47 °F,Saturday Night: A 30 percent chance of rain be...
6,NewYear'sDay,Sunny,High: 58 °F,"New Year's Day: Sunny, with a high near 58."
7,SundayNight,Partly Cloudythen RainLikely,Low: 46 °F,Sunday Night: Rain likely after 5am. Mostly c...
8,Monday,Rain,High: 55 °F,"Monday: Rain. Cloudy, with a high near 55."


In [18]:
#We can now do some analysis on the data.
#For example, we can use a regular expression and the Series.str.extract method to pull out the numeric temperature values:

temp_nums = weather["temp"].str.extract('(\d+)', expand=True)
weather["temp_num"] = temp_nums.astype('int')
temp_nums

Unnamed: 0,0
0,57
1,54
2,60
3,55
4,60
5,47
6,58
7,46
8,55


In [19]:
#We could then find the mean of all the high and low temperatures:

weather["temp_num"].mean()

54.666666666666664

In [20]:
#We could also only select the rows that happen at night:

is_night = weather["temp"].str.contains("Low")
weather["is_night"] = is_night
is_night

0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
Name: temp, dtype: bool

In [21]:
weather[is_night]

Unnamed: 0,period,short_desc,temp,desc,temp_num,is_night
1,Tonight,Rain,Low: 54 °F,Tonight: Rain. Steady temperature around 54. ...,54,True
3,FridayNight,Rain,Low: 55 °F,Friday Night: Rain. Low around 55. South sout...,55,True
5,SaturdayNight,Chance Rainthen MostlyClear,Low: 47 °F,Saturday Night: A 30 percent chance of rain be...,47,True
7,SundayNight,Partly Cloudythen RainLikely,Low: 46 °F,Sunday Night: Rain likely after 5am. Mostly c...,46,True
