# Lab 05: Web Scraping
Assignment Goal: Use web scraping [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) to get data from the [national weather service](https://www.weather.gov)

In [145]:
import pandas as pd
from pymongo import MongoClient
import json

### Getting Longitude and Latitude From User Zipcode
Used [pgeocode](https://pypi.org/project/pgeocode/) library to extract US only zipcodes. 

**Need to have ```pgeocode``` library installed for application to run correctly**

Returns a list with indexes pointing to:
1. Latitude
2. Longitude



In [50]:
import pgeocode
def get_lat_long(postal_code):
    nomi = pgeocode.Nominatim('US')
    lat_long = nomi.query_postal_code(postal_code)
    list_lat_long = [lat_long.latitude, lat_long.longitude]
    return list_lat_long

[38.0339, -78.4924]

### Webscraping to get data from national weather service
Intructions for use:
- run the cell and input your zipcode
How the code works:
- converts the zipcode into longitude and latitude
- uses the coordinates to create a url for local weather data from the National Weather Service (will exit if location is invalid)
- scrapes data from 

In [200]:
from bs4 import BeautifulSoup
import requests
import sys
def __get_requests__(zipcode):
    lat_long = get_lat_long(zipcode)
    URL = f"https://forecast.weather.gov/MapClick.php?lat={lat_long[0]}&lon={lat_long[1]}"
    try:   
        r = requests.get(URL)
    except requests.exceptions.RequestException as e:  
        print(e)
        sys.exit(1)
    return r
def get_seven_day_data(zipcode):
    r = __get_requests__(zipcode)
    soup = BeautifulSoup(r.content, 'html5lib')
    sevenday = soup.find('div', attrs = {'id':'seven-day-forecast-body'})
    period_names = [pt.get_text() for pt in sevenday.select(".tombstone-container .period-name")]
    short_desc = [sd.get_text() for sd in sevenday.select(".tombstone-container .short-desc")]
    temperatures = [t.get_text() for t in sevenday.select(".tombstone-container .temp")]
    desc = [d["title"] for d in sevenday.select(".tombstone-container img")]
    df = pd.DataFrame(zip(period_names, short_desc, temperatures, desc), columns=['period', 'short_desc', 'temp', 'desc'])
    return df

In [201]:
def get_tonight_data(zipcode):
    r = __get_requests__(zipcode)
    soup = BeautifulSoup(r.content, 'html5lib')
    data = [data.get_text() for data in soup.select("td")]
    data_dict = {
        'Humidity': data[1],
        'Wind': data[3],
        'Barometer': data[5],
        'Dewpoint': data[7],
        'Visibility': data[9],
        'Last Update': data[11],
    }
    tonight_df = pd.DataFrame(data_dict, index=[0])
    return tonight_df


     Period Short Description Temperature Description
0  Humidity        Wind Speed   Barometer    Dewpoint


### Run the Below Cells for weather information
- when prompted, input zipcode (5 digits only) and run the next two cells to get dataframe outputs from the national weather service

In [205]:
zipcode = input()
df = get_seven_day_data(zipcode)
df

Unnamed: 0,period,short_desc,temp,desc
0,Tonight,Partly Cloudy,Low: 45 °F,"Tonight: Partly cloudy, with a low around 45. ..."
1,Thursday,Mostly Cloudy,High: 65 °F,"Thursday: Mostly cloudy, with a high near 65. ..."
2,ThursdayNight,Cloudy thenShowers,Low: 56 °F,Thursday Night: Showers after 1am. Low around...
3,VeteransDay,Showers,High: 70 °F,Veterans Day: Showers and possibly a thunderst...
4,FridayNight,ShowersLikely,Low: 56 °F,"Friday Night: Showers likely, mainly before 1a..."
5,Saturday,Sunny,High: 66 °F,"Saturday: Sunny, with a high near 66."
6,SaturdayNight,Mostly Cloudy,Low: 39 °F,"Saturday Night: Mostly cloudy, with a low arou..."
7,Sunday,Sunny,High: 49 °F,"Sunday: Sunny, with a high near 49."
8,SundayNight,Mostly Clear,Low: 27 °F,"Sunday Night: Mostly clear, with a low around 27."


In [203]:
tonight_df = get_tonight_data(zipcode)
tonight_df

Unnamed: 0,Period,Short Description,Temperature,Description
0,Humidity,Wind Speed,Barometer,Dewpoint


### Loading Data in MongoDB
How to run the code:
- just run the cells
How the code works:
- creates a local instance of a mongo db
- creates a database in the client connection called 'weather_data'
- inserts json version of above dataframes into local instance of mongo database
- last cell shows an output of what was inserted

In [186]:
host_name = "localhost"
port = "27017"

atlas_cluster_name = "sandbox"
atlas_default_dbname = "local"
conn_str = {
    "local" : f"mongodb://{host_name}:{port}/"
    }

client = MongoClient('localhost', 27017)

print(f"Local Connection String: {conn_str['local']}")
print(client.list_database_names())

Local Connection String: mongodb://localhost:27017/
['admin', 'config', 'local']


In [194]:
db_name = "weather_data"
db = client[db_name] #mydb
db['weather_data'].insert_one(json.loads(get_seven_day_data(zipcode).to_json()))
db['weather_data'].insert_one(json.loads(get_tonight_data(zipcode).to_json()))


['posts']


<pymongo.results.InsertOneResult at 0x12e1eec80>

In [196]:
db['weather_data'].find_one()

{'_id': ObjectId('636c79bccb9063f29f34e4c8'),
 'period': {'0': 'Tonight',
  '1': 'Thursday',
  '2': 'ThursdayNight',
  '3': 'VeteransDay',
  '4': 'FridayNight',
  '5': 'Saturday',
  '6': 'SaturdayNight',
  '7': 'Sunday',
  '8': 'SundayNight'},
 'short_desc': {'0': 'Partly Cloudy',
  '1': 'Mostly Cloudy',
  '2': 'Cloudy thenShowers',
  '3': 'Showers',
  '4': 'ShowersLikely',
  '5': 'Sunny',
  '6': 'Mostly Cloudy',
  '7': 'Sunny',
  '8': 'Mostly Clear'},
 'temp': {'0': 'Low: 45 °F',
  '1': 'High: 65 °F',
  '2': 'Low: 56 °F',
  '3': 'High: 70 °F',
  '4': 'Low: 56 °F',
  '5': 'High: 66 °F',
  '6': 'Low: 39 °F',
  '7': 'High: 49 °F',
  '8': 'Low: 27 °F'},
 'desc': {'0': 'Tonight: Partly cloudy, with a low around 45. Light north wind. ',
  '1': 'Thursday: Mostly cloudy, with a high near 65. Light and variable wind becoming southeast 5 to 7 mph in the afternoon. ',
  '2': 'Thursday Night: Showers after 1am.  Low around 56. Light northeast wind.  Chance of precipitation is 80%. New precipitati