#  Scraping the 10-Day Forecast from Weather.com

This notebook takes the New York City 10 day forecast page (https://weather.com/weather/tenday/l/USNY0996:1:US) and returns a pandas dataframe containing high and low temperature, chance of precipitation, and wind velocity for each day. The code should work for any region's 10 day forecast.

(For some reason, there are more than 10 days listed on the 10 day forecast page)

In [3]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

In [4]:
page = requests.get("https://weather.com/weather/tenday/l/USNY0996:1:US")
page #response value starting with a 2 means success

<Response [200]>

In [5]:
soup = BeautifulSoup(page.content, 'html.parser')

### Looking at the Source Code
By inspecting the web page, I found the key HTML classes and ids that identified different parts of the forecast (e.g. "temp", "wind", "precip")

In [12]:
daily_forecast = soup.find(id="main-DailyForecast-1bbda948-59cc-4040-9a36-d9c1ed37a806")
time_period = daily_forecast.select(".date-time")
periods = [day.get_text() for day in time_period]

periods

['Tonight',
 'Fri',
 'Sat',
 'Sun',
 'Mon',
 'Tue',
 'Wed',
 'Thu',
 'Fri',
 'Sat',
 'Sun',
 'Mon',
 'Tue',
 'Wed',
 'Thu']

In [13]:
temps = soup.find_all(class_ ="temp")
highs = [t.get_text()[0:2] for t in temps][1:]
lows = [t.get_text()[3:-1] for t in temps][1:]

#If the first period is 'tonight', then there won't be a high temperature listed so the indexing changes slightly
if periods[0] == 'Tonight':
    lows[0] = temps[1].get_text()[2:-1] 
highs, lows

(['--',
  '84',
  '81',
  '85',
  '91',
  '94',
  '93',
  '89',
  '79',
  '80',
  '82',
  '83',
  '83',
  '81',
  '81'],
 ['66',
  '67',
  '68',
  '73',
  '76',
  '78',
  '76',
  '72',
  '69',
  '70',
  '70',
  '70',
  '69',
  '68',
  '68'])

In [14]:
precip = soup.find_all(class_ ="precip")
p_chance = [p.get_text() for p in precip][1:]
p_chance

['0%',
 '0%',
 '0%',
 '0%',
 '10%',
 '10%',
 '10%',
 '20%',
 '50%',
 '40%',
 '80%',
 '50%',
 '30%',
 '20%',
 '30%']

In [15]:
wind = soup.find_all(class_ ="wind")
w_velocity = [w.get_text() for w in wind][1:]
w_velocity

['NW 4 mph ',
 'SW 7 mph ',
 'SSW 8 mph ',
 'SSW 10 mph ',
 'WSW 7 mph ',
 'WSW 9 mph ',
 'WSW 10 mph ',
 'WNW 7 mph ',
 'ENE 8 mph ',
 'E 9 mph ',
 'SE 8 mph ',
 'WSW 8 mph ',
 'NW 6 mph ',
 'NNE 6 mph ',
 'E 6 mph ']

In [16]:
weather = pd.DataFrame({"Day":periods, "High_Temp":highs, "Low_Temp":lows, "Precipitation": p_chance, "Wind": w_velocity})
weather

Unnamed: 0,Day,High_Temp,Low_Temp,Precipitation,Wind
0,Tonight,--,66,0%,NW 4 mph
1,Fri,84,67,0%,SW 7 mph
2,Sat,81,68,0%,SSW 8 mph
3,Sun,85,73,0%,SSW 10 mph
4,Mon,91,76,10%,WSW 7 mph
5,Tue,94,78,10%,WSW 9 mph
6,Wed,93,76,10%,WSW 10 mph
7,Thu,89,72,20%,WNW 7 mph
8,Fri,79,69,50%,ENE 8 mph
9,Sat,80,70,40%,E 9 mph


### Using the data

Now that this data is in a pandas dataframe, we can use it for analysis like finding summary statistics or more complex things.

In [17]:
weather['Low_Temp'] = list(map(int,weather['Low_Temp'])) #converting the string values to integers
print(weather['Low_Temp'].describe())

count    15.000000
mean     70.666667
std       3.598942
min      66.000000
25%      68.000000
50%      70.000000
75%      72.500000
max      78.000000
Name: Low_Temp, dtype: float64
