# Breakout problem


In this breakout you should extract live weather data in Berkeley from:

[Weather Data Berkeley](http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971)

* Task scrape
    * period / day (as Tonight, Friday, FridayNight etc.)
    * the temperature for the period (as Low, High)
    * the short description (e.g. Mostly Clear, Sunny etc.)
    * the long weather description (e.g. Partly cloudy, with a low around 49..)

Store the scraped data strings in a Pandas DataFrame

Hint: The weather information is found in a div tag with id='seven-day-forecast'

The first row of your DataFrame should be similar to the below screenshot (with the same columns):

![image](/Users/sarthak/Downloads/titanic/breakout_example.png)


# Solution

In [1]:
# Loading Libraries

import requests
import bs4 as bs
import numpy as np
import pandas as pd

In [2]:
source = requests.get("http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971") 
source

<Response [200]>

In [3]:
soup = bs.BeautifulSoup(source.content, features='html.parser') 
# print(soup)

In [4]:
print(soup.find('title').text) 

National Weather Service


In [5]:
div_bar = soup.find('div', {"id": "seven-day-forecast"})
# print(div_bar)

# Tombstone Container? Understood through https://www.dataquest.io/blog/web-scraping-python-using-beautiful-soup/
forecast_items = div_bar.find_all(class_="tombstone-container")
# print(forecast_items)

In [6]:
# Checking values coming through BeautifulSoup

for p in forecast_items:
    print(p.find('p', class_='period-name').get_text())
    print(p.find('p', class_='temp').get_text())
    print(p.find('p', class_='short-desc').get_text())
    print([img['alt'] for img in p.find_all('img', alt=True)][0])
    print()


Overnight
Low: 49 °F
Partly Cloudy
Overnight: Partly cloudy, with a steady temperature around 49. East northeast wind 5 to 8 mph. 

Friday
High: 62 °F
Partly Sunny
Friday: Partly sunny, with a high near 62. East northeast wind 7 to 10 mph. 

FridayNight
Low: 45 °F
Mostly Cloudy
Friday Night: Mostly cloudy, with a low around 45. East northeast wind 3 to 5 mph. 

Saturday
High: 63 °F
DecreasingClouds
Saturday: Mostly cloudy, then gradually becoming sunny, with a high near 63. East northeast wind 3 to 6 mph. 

SaturdayNight
Low: 44 °F
Mostly Clear
Saturday Night: Mostly clear, with a low around 44. Light and variable wind. 

Sunday
High: 61 °F
Sunny
Sunday: Sunny, with a high near 61.

SundayNight
Low: 47 °F
Partly Cloudy
Sunday Night: Partly cloudy, with a low around 47.

Monday
High: 60 °F
Partly Sunny
Monday: Partly sunny, with a high near 60.

MondayNight
Low: 44 °F
Partly Cloudy
Monday Night: Partly cloudy, with a low around 44.



# Creating Dataframe

In [7]:
arr = np.array([])

for p in forecast_items:
    listy = []
    listy.append(p.find('p', class_='period-name').get_text())
    listy.append(p.find('p', class_='temp').get_text())
    listy.append(p.find('p', class_='short-desc').get_text())
    listy.append([img['alt'] for img in p.find_all('img', alt=True)][0])
    arr = np.append(arr, listy, axis=0)

arr = arr.reshape(9,4)

In [8]:
df_weather = pd.DataFrame(data = arr, columns = ['day','temp','short-desc','desc'])
df_weather.head(10)


Unnamed: 0,day,temp,short-desc,desc
0,Overnight,Low: 49 °F,Partly Cloudy,"Overnight: Partly cloudy, with a steady temper..."
1,Friday,High: 62 °F,Partly Sunny,"Friday: Partly sunny, with a high near 62. Eas..."
2,FridayNight,Low: 45 °F,Mostly Cloudy,"Friday Night: Mostly cloudy, with a low around..."
3,Saturday,High: 63 °F,DecreasingClouds,"Saturday: Mostly cloudy, then gradually becomi..."
4,SaturdayNight,Low: 44 °F,Mostly Clear,"Saturday Night: Mostly clear, with a low aroun..."
5,Sunday,High: 61 °F,Sunny,"Sunday: Sunny, with a high near 61."
6,SundayNight,Low: 47 °F,Partly Cloudy,"Sunday Night: Partly cloudy, with a low around..."
7,Monday,High: 60 °F,Partly Sunny,"Monday: Partly sunny, with a high near 60."
8,MondayNight,Low: 44 °F,Partly Cloudy,"Monday Night: Partly cloudy, with a low around..."


In [9]:
# print([img['alt'] for img in div_bar.find_all('img', alt=True)][0])

# print([img.get_text() for img in div_bar.find_all('p', class_='period-name')][0])
# print([img.get_text() for img in div_bar.find_all('p', class_='temp')][0])
# print([img.get_text() for img in div_bar.find_all('p', class_='short-desc')][0])

In [10]:
# day - Tonight (class: period-name)
# temp - Low: 52 °F (class: temp)
# short_desc - Rain (class: short_desc)
# desc - Tonight: Rain.  Low around 52. South wind 10 to 15 mph, with gusts as high as 33 mph.  Chance of precipitation is 90%. New precipitation amounts... (img alt)