# Mission to Mars
This notebook contains code to build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. 

### Step 1 - Scraping

Complete scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

In [1]:
# Dependencies
from bs4 import BeautifulSoup
import requests
import json
from splinter import Browser
# Initialize Tweepy for Twitter Analysis
import tweepy
import json
# Importing Pandas for Pandas Scraping
import pandas as pd
import time

#### NASA Mars News

Scrape the NASA Mars News Site and collect the latest News Title and Paragragh Text.

In [2]:
# URL of page to be scraped
url = 'https://mars.nasa.gov/news/'

In [3]:
# Retrieve page with the requests module
response = requests.get(url)

In [4]:
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(response.text, 'html.parser')
# Examine the results, then determine element that contains sought info
# print(soup.prettify())

In [5]:
# results is the first matching result/latest news title, get the smallest repetable tag from html
news_title = soup.find('div', class_="content_title").text.strip()
news_title 

'Witness First Mars Launch from West Coast'

In [6]:
# Get the latest news para
news_p = soup.find('div', class_="rollover_description_inner").text.strip()
news_p 

"NASA invites digital creators to apply for social media credentials to cover the launch of the InSight mission to Mars, May 3-5, at California's Vandenberg Air Force Base."

#### JPL Mars Space Images - Featured Image

Use splinter to navigate the site and find the image url for the current Featured Mars Image

In [7]:
#Use splinter to visit the url for scraping
browser = Browser('chrome', headless=False)
url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url)

In [8]:
# HTML object
html = browser.html
# Parse HTML with Beautiful Soup
soup = BeautifulSoup(html, 'html.parser')
# Click the 'FULL IMAGE' button on the featured image on the home page
browser.click_link_by_partial_text('FULL IMAGE')

In [9]:
# Click the 'more info' button on the featured image on second page to open a large image
#? Looks like we have to create a new html object for each scraping on a new page. Check?
html = browser.html
soup = BeautifulSoup(html, 'html.parser')
browser.click_link_by_partial_text('more info')

In [10]:
#Get the image url from the third page
html = browser.html
soup = BeautifulSoup(html, 'html.parser')
imagesrc = soup.find('figure', class_='lede').find('img')['src']

In [11]:
#Get the Complete url for the featured image
featured_image_url = "https://www.jpl.nasa.gov" + imagesrc
featured_image_url

'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16192_hires.jpg'

#### Mars Weather

Visit the Mars Weather twitter account and scrape the latest Mars weather tweet from the page. 
Save the tweet text for the weather report as a variable called mars_weather.

In [12]:
# Twitter API Keys
consumer_key = "wNC3lUy34Zm8YYOkzhgRWXo0A"
consumer_secret = "Tgs4zKsSW05KmVhuZdvtXy3V7KXbGZdJJsG6BeWv3BwOl1CjGL"
access_token = "943239523047235584-OhbdaN1nJYFHzrpVBbFzOMTxbQaRDaj"
access_token_secret = "Rinpous00RQFCnvVMNWKxOgQZE5H0wxJ2GtjWSHGgq0uW"

# Setup Tweepy API Authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())

In [13]:
#Get the Latest Tweet tweeted from the timeline
target_user = "@MarsWxReport"
public_tweets = api.user_timeline(target_user,count=1)
mars_weather = public_tweets[0]["text"]
mars_weather 

'Sol 1985 (March 07, 2018), Sunny, high -6C/21F, low -77C/-106F, pressure at 7.23 hPa, daylight 05:35-17:24'

#### Mars Facts

Visit the Mars Facts webpage to scrape the table containing facts about the planet including Diameter, Mass, etc. Use Pandas to convert the data to a HTML table string.

In [14]:
marsurl = "https://space-facts.com/mars/"

In [15]:
tables = pd.read_html(marsurl)
tables

[                      0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.42 x 10^23 kg (10.7% Earth)
 3                Moons:            2 (Phobos & Deimos)
 4       Orbit Distance:       227,943,824 km (1.52 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                  -153 to 20 °C
 7         First Record:              2nd millennium BC
 8          Recorded By:           Egyptian astronomers]

In [16]:
#Get the first element from the list
df = tables[0]
#Name the columns and check the dataframe
df.columns = ['Parameter', 'Value']
df.set_index('Parameter', inplace=True)
df.head()

Unnamed: 0_level_0,Value
Parameter,Unnamed: 1_level_1
Equatorial Diameter:,"6,792 km"
Polar Diameter:,"6,752 km"
Mass:,6.42 x 10^23 kg (10.7% Earth)
Moons:,2 (Phobos & Deimos)
Orbit Distance:,"227,943,824 km (1.52 AU)"


In [17]:
# Convert to a HTML table
html_table = df.to_html()
html_table

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Value</th>\n    </tr>\n    <tr>\n      <th>Parameter</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>Equatorial Diameter:</th>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>Polar Diameter:</th>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>Mass:</th>\n      <td>6.42 x 10^23 kg (10.7% Earth)</td>\n    </tr>\n    <tr>\n      <th>Moons:</th>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <th>Orbit Distance:</th>\n      <td>227,943,824 km (1.52 AU)</td>\n    </tr>\n    <tr>\n      <th>Orbit Period:</th>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <th>Surface Temperature:</th>\n      <td>-153 to 20 °C</td>\n    </tr>\n    <tr>\n      <th>First Record:</th>\n      <td>2nd millennium BC</td>\n    </tr>\n    <tr>\n      <th>Recorded By:</th>\n      <td>Egyptian astronomers</td>\n    </tr>\n

#### Mars Hemispheres

Visit the USGS Astrogeology site to obtain high resolution images for each of Mar's hemispheres.

Click each of the links to the hemispheres in order to find the image url to the full resolution image.

Save both the image url string for the full resolution hemipshere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.

Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [17]:
urlmh = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"
browser = Browser('chrome', headless=False)
browser.visit(urlmh)

In [18]:
# HTML object
html = browser.html
# Parse HTML with Beautiful Soup
soup = BeautifulSoup(html, 'html.parser')
# Get the links to all the four hemispheres from the home page
hemlinks = soup.find_all('div', class_='description')
hemlinks

[<div class="description"><a class="itemLink product-item" href="/search/map/Mars/Viking/cerberus_enhanced"><h3>Cerberus Hemisphere Enhanced</h3></a><span class="subtitle" style="float:left">image/tiff 21 MB</span><span class="pubDate" style="float:right"></span><br/><p>Mosaic of the Cerberus hemisphere of Mars projected into point perspective, a view similar to that which one would see from a spacecraft. This mosaic is composed of 104 Viking Orbiter images acquired…</p></div>,
 <div class="description"><a class="itemLink product-item" href="/search/map/Mars/Viking/schiaparelli_enhanced"><h3>Schiaparelli Hemisphere Enhanced</h3></a><span class="subtitle" style="float:left">image/tiff 35 MB</span><span class="pubDate" style="float:right"></span><br/><p>Mosaic of the Schiaparelli hemisphere of Mars projected into point perspective, a view similar to that which one would see from a spacecraft. The images were acquired in 1980 during early northern…</p></div>,
 <div class="description"><a 

In [19]:
hemisphere_image_urls = []
for link in hemlinks:
    title = link.find('h3').text.strip(' Enhanced')
    print(title)
    urlh = "https://astrogeology.usgs.gov" + link.find('a')['href']
    print(urlh)
    browser.visit(urlh)
    # Wait for 5 seconds
    time.sleep(5)
    html = browser.html
    soup = BeautifulSoup(html, 'html.parser')
    imageurl = soup.find('div', class_='downloads').find('li').find('a')['href']
    print(imageurl)
    hemisphere_image_urls.append({"title": title,"img_url": imageurl})
    urlh = ""   

Cerberus Hemispher
https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg
Schiaparelli Hemispher
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg
Syrtis Major Hemispher
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg
Valles Marineris Hemispher
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg


In [20]:
hemisphere_image_urls

[{'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg',
  'title': 'Cerberus Hemispher'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg',
  'title': 'Schiaparelli Hemispher'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg',
  'title': 'Syrtis Major Hemispher'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg',
  'title': 'Valles Marineris Hemispher'}]