# Web Scraping Homework - Mission to Mars

We will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. Complete the initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

### Import Dependencies

In [1]:
import os
from bs4 import BeautifulSoup
import requests
from splinter import Browser
import pandas as pd

### Setup Splinter

In [2]:
# identify location of chromedriver and store it as a variable
driverPath = !which chromedriver

# Setup configuration variables to enable Splinter to interact with browser
executable_path = {'executable_path': driverPath[0]}
browser = Browser('chrome', **executable_path, headless=True)

## Step 1 - Scraping

**Hint**: Use Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

In [3]:
# URL of page to be scraped
# url_nasa = "https://mars.nasa.gov/news/"
url_nasa = "https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest"

Aparently the lastest news are not available when scraping the page with traditional method

#### Use splinter to inform the browser to visit the page

In [4]:
# Use the browser to visit the url
browser.visit(url_nasa)

In [5]:
# Return the rendered page by the browser
html_nasa = browser.html

In [6]:
# Use beatifulsoup to scrap the page rendered by the browser
soup = BeautifulSoup(html_nasa, 'html.parser')

In [7]:
results = soup.find_all('div', class_="content_title")
news_title = results[1].text
print(f"Title: {news_title}\n")

Title: NASA's Perseverance Rover Is Midway to Mars 



In [8]:
results = soup.find_all('div', class_="article_teaser_body")
new_p = results[0].text
print(f"Paragraph: {new_p}\n")

Paragraph: Sometimes half measures can be a good thing – especially on a journey this long. The agency's latest rover only has about 146 million miles left to reach its destination.



In [9]:
Nasa_News = {"Title":news_title, "Paragraph": new_p}
Nasa_News

{'Title': "NASA's Perseverance Rover Is Midway to Mars ",
 'Paragraph': "Sometimes half measures can be a good thing – especially on a journey this long. The agency's latest rover only has about 146 million miles left to reach its destination."}

In [10]:
browser.quit()

In [11]:
import pymongo

# Use flask_pymongo to set up mongo connection
conn =  "mongodb://localhost:27017/mars_mission_scraping"
client =  pymongo.MongoClient(conn)

# identify the collection and drop any existing data for this demonstration
db = client.mars_mission_scraping
# mars_data.drop()

db.mars_data.insert_many(
    [
        Nasa_News
    ])

<pymongo.results.InsertManyResult at 0x123927370>

In [12]:
query_result = list(db.mars_data.find())

In [13]:
query_result

[{'_id': ObjectId('5f9a5bdf9f998d47db13ef3f'),
  'Title': "NASA's Perseverance Rover Is Midway to Mars ",
  'Paragraph': "Sometimes half measures can be a good thing – especially on a journey this long. The agency's latest rover only has about 146 million miles left to reach its destination."},
 {'_id': ObjectId('5f9a5bdf9f998d47db13ef40'),
  'ImageURL': 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA09113-1920x1200.jpg'},
 {'_id': ObjectId('5f9a5bdf9f998d47db13ef41'),
  'TableHTML': '<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Equatorial Diameter:</td>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Polar Diameter:</td>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Mass:</td>\n      <td>6.39 × 10^23 kg (0.11 Earths)</td>\n    </tr>\n    <tr>\n      <th>3<