# SOLUTIONS Breakout: Web scraping & web crawling

**Author List**: Alexander Fred Ojala

**Original Sources**: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ & https://www.dataquest.io/blog/web-scraping-tutorial-python/

**License**: Feel free to do whatever you want to with this code

**Compatibility:** Python 2.x and 3.x

---
<a id='sec4'></a>
# Breakout problem


In this week's breakout you should extract live weather data in Berkeley from:

[http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971](http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971)

* Task scrape
    * period / day (as Tonight, Friday, FridayNight etc.
    * the temperature for the period (as Low, High)
    * the long weather description (e.g. Partly cloudy, with a low around 49..)
    
Store the scraped data strings in a Pandas DataFrame



**Hint:** The weather information is found in a div tag with `id='seven-day-forecast'`



# Breakout solution

In [14]:
import requests
import bs4 as bs
import pandas as pd

source = requests.get('http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971').content
soup = bs.BeautifulSoup(source,features='lxml')
soup = bs.BeautifulSoup(source,features='html.parser')

In [15]:
forecast = soup.find(id='seven-day-forecast')

In [3]:
print(forecast.prettify())

<div class="panel panel-default" id="seven-day-forecast">
 <div class="panel-heading">
  <b>
   Extended Forecast for
  </b>
  <h2 class="panel-title">
   Berkeley CA
  </h2>
 </div>
 <div class="panel-body" id="seven-day-forecast-body">
  <div id="seven-day-forecast-container">
   <ul class="list-unstyled" id="seven-day-forecast-list">
    <li class="forecast-tombstone">
     <div class="tombstone-container">
      <p class="period-name">
       Today
       <br/>
       <br/>
      </p>
      <p>
       <img alt="Today: Sunny, with a high near 76. North wind around 14 mph, with gusts as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Today: Sunny, with a high near 76. North wind around 14 mph, with gusts as high as 18 mph. "/>
      </p>
      <p class="short-desc">
       Sunny
      </p>
      <p class="temp temp-high">
       High: 76 °F
      </p>
     </div>
    </li>
    <li class="forecast-tombstone">
     <div class="tombstone-container">
      <

In [16]:
day = [d.text for d in forecast.find_all(class_='period-name')]
temp = [temp.text for temp in forecast.find_all(class_='temp')]
desc = forecast.find_all('img')

In [17]:
print(day)
print()
print(temp)

['ThisAfternoon', 'Tonight', 'Friday', 'FridayNight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight', 'M.L.KingDay']

['High: 52 °F', 'Low: 41 °F', 'High: 54 °F', 'Low: 43 °F', 'High: 55 °F', 'Low: 43 °F', 'High: 59 °F', 'Low: 46 °F', 'High: 58 °F']


In [18]:
# extract weather description
desc_list=list()
for txt in desc:
    print(txt.get('alt'))
    desc_list.append(txt.get('alt'))

This Afternoon: Showers, mainly before 5pm.  High near 52. Southwest wind around 11 mph.  Chance of precipitation is 100%. New precipitation amounts between a tenth and quarter of an inch possible. 
Tonight: A chance of showers before 8pm, then a chance of rain between 8pm and 11pm.  Partly cloudy, with a low around 41. Southwest wind 5 to 8 mph.  Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible. 
Friday: Increasing clouds, with a high near 54. Southeast wind 3 to 5 mph. 
Friday Night: Cloudy, with a low around 43. Light east wind. 
Saturday: Mostly cloudy, with a high near 55. East wind around 6 mph. 
Saturday Night: Mostly cloudy, with a low around 43.
Sunday: Partly sunny, with a high near 59.
Sunday Night: Mostly cloudy, with a low around 46.
M.L.King Day: A slight chance of rain after 11am.  Mostly cloudy, with a high near 58.


In [19]:
pd.set_option('display.max_colwidth', -1) # to print full results
df = pd.DataFrame({'day':day,'temp':temp,'desc':desc_list})
print('Berkeley 7 day weather forecast')
df

Berkeley 7 day weather forecast


Unnamed: 0,day,temp,desc
0,ThisAfternoon,High: 52 °F,"This Afternoon: Showers, mainly before 5pm. High near 52. Southwest wind around 11 mph. Chance of precipitation is 100%. New precipitation amounts between a tenth and quarter of an inch possible."
1,Tonight,Low: 41 °F,"Tonight: A chance of showers before 8pm, then a chance of rain between 8pm and 11pm. Partly cloudy, with a low around 41. Southwest wind 5 to 8 mph. Chance of precipitation is 30%. New precipitation amounts of less than a tenth of an inch possible."
2,Friday,High: 54 °F,"Friday: Increasing clouds, with a high near 54. Southeast wind 3 to 5 mph."
3,FridayNight,Low: 43 °F,"Friday Night: Cloudy, with a low around 43. Light east wind."
4,Saturday,High: 55 °F,"Saturday: Mostly cloudy, with a high near 55. East wind around 6 mph."
5,SaturdayNight,Low: 43 °F,"Saturday Night: Mostly cloudy, with a low around 43."
6,Sunday,High: 59 °F,"Sunday: Partly sunny, with a high near 59."
7,SundayNight,Low: 46 °F,"Sunday Night: Mostly cloudy, with a low around 46."
8,M.L.KingDay,High: 58 °F,"M.L.King Day: A slight chance of rain after 11am. Mostly cloudy, with a high near 58."


In [8]:
pd.options.display.max_colwidth=50 #change back to default max col_width