# Mission To Mars

https://gt.bootcampcontent.com/GT-Coding-Boot-Camp/GTATL201805DATA3-Class-Repository-DATA/tree/master/13-Web-Scraping-and-Document-Databases/Instructions

* This notebook builds a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page.

* Uses Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

* Uses Pymongo for CRUD applications for your database. It simply overwrites the existing document each time the /scrape url is visited and new data is obtained.

* Uses Bootstrap to structure your HTML template.

## PT 1 - Scraping

In [2]:
# Dependencies
from bs4 import BeautifulSoup
import requests
import pandas as pd
from splinter import Browser
from splinter.exceptions import ElementDoesNotExist
import time
import os

### Scrape 1 - Headlines

* Initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

* NASA Mars News: Scrape the NASA Mars News Site and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.
* NOte: vars: <strong> title, headline </strong>

In [4]:
# Note: if this errors the first time, simply run it again. 
# This is due to a timing issue brought on by visiting, loading and scraping simultaneously.
# https://stackoverflow.com/questions/41706274/beautifulsoup-returns-incomplete-html
# requests.get() is not returning full HTML in soup due to load times. 
# Therefore, I'm using chromedriver for all scrapes
# I attempt to avoid it by using time.sleep to allow the site to load

!which chromedriver
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)
url = 'https://mars.nasa.gov/news/'
browser.visit(url)
time.sleep(3)
soup = BeautifulSoup(browser.html, 'lxml')

title = soup.find('li', {'class':'slide'}).find('div', {'class':'content_title'}).text
headline = soup.find('li', {'class':'slide'}).find('div', {'class':'article_teaser_body'}).text
print(title)
print(headline)

/usr/local/bin/chromedriver
Curiosity Surveys a Mystery Under Dusty Skies
NASA's Curiosity rover surveyed its surroundings on Mars, producing a 360-degree panorama of its current location on Vera Rubin Ridge.


### Scrape 2 - JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image here.
* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called <strong>featured_image_url</strong>.
* Make sure to find the image url to the full size .jpg image.
* Make sure to save a complete url string for this image.

In [5]:
# https://splinter.readthedocs.io/en/latest/drivers/chrome.html
!which chromedriver
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url)

# Soup
html = browser.html
soup = BeautifulSoup(html, 'lxml')
browser.click_link_by_partial_text('FULL IMAGE')
# Re-soup required the code to sleep for the HTML to populate
time.sleep(3)
html = browser.html
soup = BeautifulSoup(html, 'lxml')
featured_image = soup.find('div', {'class':'fancybox-inner'}).find('img',{'class':'fancybox-image'})
partial_url = (featured_image['src'])
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages'+partial_url
print(featured_image_url)

/usr/local/bin/chromedriver
https://www.jpl.nasa.gov/spaceimages/spaceimages/images/mediumsize/PIA17932_ip.jpg


### Scrape 3 - HeadlinesMars Weather

* Visit the Mars Weather twitter account here and scrape the latest Mars weather tweet from the page. 
* Save the tweet text for the weather report as a variable called mars_weather.
* var: <strong>mars_weather</strong>

In [6]:
url = 'https://twitter.com/marswxreport?lang=en'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
mars_weather = soup.find('p',class_='TweetTextSize--normal').text
print(mars_weather)

Congrats to NASA/JPL for the Emmy Award for Outstanding Original Interactive Program for its coverage of the Cassini mission's Grand Finale at Saturn. https://www.jpl.nasa.gov/news/news.php?feature=7232 …https://twitter.com/veronicamcg/status/1039221529005813762 …


### Scrape 4 - HeadlinesMars Facts

* Visit the Mars Facts webpage here and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
* Use Pandas to convert the data to a HTML table string.
* var: <strong>mars_facts</strong>

In [10]:
url = 'http://space-facts.com/mars/'
tables = pd.read_html(url)
df = tables[0]
df = df.rename(columns={0:'Profile',1:'Value'})
mars_df = df.set_index('Profile')
# convert into a dictionary for the master dictionary used in the Flask app
mars_facts = df.to_dict('records')
# show df
mars_df

Unnamed: 0_level_0,Value
Profile,Unnamed: 1_level_1
Equatorial Diameter:,"6,792 km"
Polar Diameter:,"6,752 km"
Mass:,6.42 x 10^23 kg (10.7% Earth)
Moons:,2 (Phobos & Deimos)
Orbit Distance:,"227,943,824 km (1.52 AU)"
Orbit Period:,687 days (1.9 years)
Surface Temperature:,-153 to 20 °C
First Record:,2nd millennium BC
Recorded By:,Egyptian astronomers


### Scrape 5 - HeadlinesMars Hemispheres

* Visit the USGS Astrogeology site here to obtain high resolution images for each of Mar's hemispheres.
* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.
* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.
* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [11]:
# Set up the driver
!which chromedriver
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)
url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(url)

# Launch the driver
html = browser.html
soup = BeautifulSoup(html, 'lxml')
items = soup.find_all()
spheres = ['Cerberus Hemisphere Enhanced','Schiaparelli Hemisphere Enhanced','Syrtis Major Hemisphere Enhanced','Valles Marineris Hemisphere Enhanced']
hemisphere_list = []

# Looping through the hemispheres
for sphere in spheres:
    html = browser.html
    soup = BeautifulSoup(html, 'lxml')
    browser.click_link_by_partial_text(sphere)
    time.sleep(1)
    browser.click_link_by_partial_text('Open')
    time.sleep(1)
    html = browser.html
    soup = BeautifulSoup(html, 'lxml')
    partial_url = soup.find('img', {'class':'wide-image'})['src']
    content = soup.find('div',{'class':'content'}).find('h2',{'class':'title'})
    title = content.text
    image_url = 'https://astrogeology.usgs.gov/'+partial_url

    entry = {
        'img_url': image_url,
        'title': title 
    }
    
    hemisphere_list.append(entry)
    time.sleep(1)
    browser.click_link_by_partial_text('Close')
    time.sleep(1)
    browser.click_link_by_partial_text('Back')

print(hemisphere_list)

/usr/local/bin/chromedriver
[{'img_url': 'https://astrogeology.usgs.gov//cache/images/cfa62af2557222a02478f1fcd781d445_cerberus_enhanced.tif_full.jpg', 'title': 'Cerberus Hemisphere Enhanced'}, {'img_url': 'https://astrogeology.usgs.gov//cache/images/3cdd1cbf5e0813bba925c9030d13b62e_schiaparelli_enhanced.tif_full.jpg', 'title': 'Schiaparelli Hemisphere Enhanced'}, {'img_url': 'https://astrogeology.usgs.gov//cache/images/ae209b4e408bb6c3e67b6af38168cf28_syrtis_major_enhanced.tif_full.jpg', 'title': 'Syrtis Major Hemisphere Enhanced'}, {'img_url': 'https://astrogeology.usgs.gov//cache/images/7cf2da4bf549ed01c17f206327be4db7_valles_marineris_enhanced.tif_full.jpg', 'title': 'Valles Marineris Hemisphere Enhanced'}]


## Master Dictionary

In [12]:
mars_dict = {
    'news_title': title,
    'news_headline': headline,
    'featured_image': featured_image_url,
    'weather': mars_weather,
    'stats': mars_facts,
    'hemispheres': hemisphere_list
}
print(mars_dict)

{'news_title': 'Valles Marineris Hemisphere Enhanced', 'news_headline': "NASA's Curiosity rover surveyed its surroundings on Mars, producing a 360-degree panorama of its current location on Vera Rubin Ridge.", 'featured_image': 'https://www.jpl.nasa.gov/spaceimages/spaceimages/images/mediumsize/PIA17932_ip.jpg', 'weather': "Congrats to NASA/JPL for the Emmy Award for Outstanding Original Interactive Program for its coverage of the Cassini mission's Grand Finale at Saturn. https://www.jpl.nasa.gov/news/news.php?feature=7232\xa0…https://twitter.com/veronicamcg/status/1039221529005813762\xa0…", 'stats': [{'Profile': 'Equatorial Diameter:', 'Value': '6,792 km'}, {'Profile': 'Polar Diameter:', 'Value': '6,752 km'}, {'Profile': 'Mass:', 'Value': '6.42 x 10^23 kg (10.7% Earth)'}, {'Profile': 'Moons:', 'Value': '2 (Phobos & Deimos)'}, {'Profile': 'Orbit Distance:', 'Value': '227,943,824 km (1.52 AU)'}, {'Profile': 'Orbit Period:', 'Value': '687 days (1.9 years)'}, {'Profile': 'Surface Temperat

## PT 2 - MongoDB and Flask Application

* Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

* Start by converting your Jupyter notebook into a Python script called scrape_mars.py with a function called scrape that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

* Next, create a route called /scrape that will import your scrape_mars.py script and call your scrape function.

* Store the return value in Mongo as a Python dictionary.

* Create a root route / that will query your Mongo database and pass the mars data into an HTML template to display the data.

* Create a template HTML file called index.html that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.

## Flask App
Make sure to run mongod server before running the Flask app below

In [4]:
# Is there a way to start mongod in Jupyter Notebook and/or Python?
# os.system("mongod")

In [5]:
# This block calls the Flask app inline, so you don't have to leave Jupyter for Mars -_-
# This will not run if an instance of mongod isn't already running in the background
# !python app.py 