<a href="https://colab.research.google.com/github/sandheepgopinath/Python/blob/master/webscraper_berkley_weather.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://raw.githubusercontent.com/afo/data-x-plaksha/master/imgsource/dx_logo.png" align="left"></img><br><br><br><br><br><br><br><br>


## SOLUTIONS Breakout: Web scraping & web crawling

**Author List**: Alexander Fred Ojala

**Original Sources**: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ & https://www.dataquest.io/blog/web-scraping-tutorial-python/

**License**: Feel free to do whatever you want to with this code

---
<a id='sec4'></a>
# Breakout problem


In this breakout you should extract live weather data in Berkeley from:

[http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971](http://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971)

* Task scrape
    * period / day (as Tonight, Friday, FridayNight etc.)
    * the temperature for the period (as Low, High)
    * the short description (e.g. Mostly Clear, Sunny etc.)
    * the long weather description (e.g. Partly cloudy, with a low around 49..)
    
Store the scraped data strings in a Pandas DataFrame



**Hint:** The weather information is found in a div tag with `id='seven-day-forecast'`

The first row of your DataFrame should be similar to the below screenshot (with the same columns):

![](data/breakout_example.png)

# Your solution

## Creating a class for scraping the weather website

In [360]:
class weather_scrape:

  def __init__(self,url):
    import requests                                                             # Importing requsts library
    self.source=requests.get(url)
    if self.source.status_code==200:                                            # Checks if the URL read was successful 
      print('Webpage read successfull')
    else:
      print('Webpage read failed')


  def beautify(self,features):
    ''' Function to convert an HTML Object to Beautiful soup object
    It it is able to successfully convert, it will print a successfull statement 
    Otherwise it will send an error message. '''
    import bs4 as bs                                                            # Importing Beautiful soup
    try:
      self.souped=bs.BeautifulSoup(self.source.content,features=features)       # Convert the HTML into Beautiful Soup object
      print('Beautification Successful') 
    except:
      print('Beautification failed')


  def pretty_print(self,list_to_modify):                                        
    """ Function to remove the spaces between words. 
    It searches for Uppercase elements between the word and if found it adds a space
    before it"""
    temp=[]
    for text in list_to_modify:
      count=0
      for i in text:
        if i.isupper():
          if count>0:
            text=text[0:text.find(i)]+' '+text[text.find(i):]
          count+=1
        else:
          count+=1
      temp.append(text)
      self.day_details=temp


  def find_to_text(self,id):
    ''' Function to scrape the data from the webpage and to store the data in lists
    This data will further be used for creating the dataframes'''

    self.search_list=self.souped.find(id=id)                                    # Running a find_all to get the requierd part of HTML Script

    self.day_details=[]                                                         # Initializing the lists for storing dataframe info
    self.temperature=[] 
    self.short_description=[]
    self.description=[]

    for tsc in self.souped.find(id='seven-day-forecast').find_all(class_='tombstone-container'): # Iterating over the tombstone containers to get data for each day

      self.day_details.append(tsc.find(class_='period-name').text)              # Read the day deails

      if tsc.find(class_='temp temp-low')!=None:                                # Read the temperature details. As there can be high and low values, using an exception
        self.temperature.append(tsc.find(class_='temp temp-low').text)
      else:
        self.temperature.append(tsc.find(class_='temp temp-high').text)


      self.short_description.append(tsc.find(class_='short-desc').text)         # Reading the short description

    for tsc in self.souped.find(id='seven-day-forecast').find_all('img'):       # Reading the long description
      self.description.append(tsc['alt'])

    self.pretty_print(self.day_details) #Adding spaces between the words        # Adding spaces to make the day_details look better

    print('Data conversion succesful')

  def to_df(self):
    import pandas as pd
    print('Creating a dataframe')
    df=pd.DataFrame()
    df['day']=self.day_details
    df['temp']=self.temperature
    df['short_desc']=self.short_description
    df['desc']=self.description

    self.dataframe=df
    print('Dataframe created succesfully. Read from classname.dataframe')


## Calling the functions


In [361]:
url='https://forecast.weather.gov/MapClick.php?lat=37.87158815800046&lon=-122.27274583799971'
scrape=weather_scrape(url)
scrape.beautify('html.parser')
scrape.find_to_text('seven-day-forecast')
scrape.to_df()

Webpage read successfull
Beautification Successful
Data conversion succesful
Creating a dataframe
Dataframe created succesfully. Read from classname.dataframe


In [362]:
scrape.dataframe

Unnamed: 0,day,temp,short_desc,desc
0,Overnight,Low: 47 °F,Rain,"Overnight: Rain, mainly before 5am. Steady te..."
1,Tuesday,High: 52 °F,Chance Rain,"Tuesday: A 40 percent chance of rain, mainly b..."
2,Tuesday Night,Low: 46 °F,IncreasingClouds,"Tuesday Night: Increasing clouds, with a stead..."
3,Wednesday,High: 54 °F,Chance Rainthen Rain,"Wednesday: Rain, mainly after 5pm. High near ..."
4,Wednesday Night,Low: 50 °F,Rain,"Wednesday Night: Rain, mainly before 5am. Low..."
5,Thursday,High: 57 °F,Chance Rainthen MostlySunny,Thursday: A 50 percent chance of rain before 1...
6,Thursday Night,Low: 42 °F,Partly Cloudy,"Thursday Night: Partly cloudy, with a low arou..."
7,Friday,High: 54 °F,Sunny,"Friday: Sunny, with a high near 54."
8,Friday Night,Low: 41 °F,Partly Cloudy,"Friday Night: Partly cloudy, with a low around..."
