# Mission to Mars
## Section I:  Data Collection Design
Use Pandas, Splinter, and Selenium to collect Mars data from the given data sources.

#### Researcher(s):  Kirpatrick Dorsey

#### Date:  March 7, 2020

#### Data Source(s):
- [News - NASA's Mars Exploration Program](https://mars.nasa.gov/news/)
- [Space Images/Mars - Nasa Jet Propulsion Laboratory (JPL)](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars)
- [Twitter Latest Mars Weather Tweet](https://twitter.com/marswxreport?lang=en)
- [Space-Facts.com/Mars](https://space-facts.com/mars/)
- [US Geological Survey (USGS)](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars)

#### Summary
This notebook is <b>One of Four</b> components used to build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page.

* <b>Section I:</b>  Data Collection Design (Jupyter Notebook)
* <b>Section II:</b>  [Data Collection Script (Python Script)](./scrape_mars.py)
* <b>Section III:</b>  [Data Collection Application (Flask)](./app.py)
* <b>Section IV:</b>  [Web Page Creation (HTML)](../index.html)

In [1]:
# Inport Data Collection Dependencies
from bs4 import BeautifulSoup as bs
import requests

### Collect the latest information about NASA's Mars exploration
Source:  [News - NASA's Mars Exploration Program](https://mars.nasa.gov/news/)

In [2]:
# Create variables to hold news information
news_title = ""
news_text = ""

# URL of page to be scraped
url = 'https://mars.nasa.gov/news/'

# Retrieve page with the requests module
response = requests.get(url)

# Create BeautifulSoup object; parse with 'html.parser'
soup = bs(response.text, 'html.parser')

# Examine the results, then determine element that contains sought info
#print(soup.prettify())

# Save news title of first article
news_text = soup.find('div', class_="rollover_description_inner").text.strip()

# Save news text teaser of first article
news_title = soup.find('div', class_="content_title").text.strip()

print(f"News Title:  {news_title}\nNews Text:  {news_text}")

News Title:  Virginia Middle School Student Earns Honor of Naming NASA's Next Mars Rover
News Text:  NASA chose a seventh-grader from Virginia as winner of the agency's "Name the Rover" essay contest. Alexander Mather's entry for "Perseverance" was voted tops among 28,000 entries.


###  Find the image url for the current Featured Mars Image. 
Source:  [Space Images/Mars - Nasa Jet Propulsion Laboratory (JPL)](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars)
- Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url.
- Make sure to find the image url to the full size .jpg image.
- Make sure to save a complete url string for this image.

In [3]:
# Import Splinter and setup environment variables
from splinter import Browser

# WINDOWS - Uncomment the next line
executable_path = {'executable_path': 'chromedriver.exe'}

# MAC - Uncomment the next two lines
#!which chromedriver
#executable_path = {'executable_path': '/usr/local/bin/chromedriver'}

In [4]:
# Start automated browsing session
browser = Browser('chrome', **executable_path, headless=False)

# Create a variable to hold complete url string for the 'full size' .jpg image
# https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA13664_hires.jpg
featured_image_url = ""

# URL of page to be scraped
url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url)

# Click 'FULL IMAGE' link on landing page
browser.click_link_by_partial_text('FULL IMAGE')

# Click 'more info' link on Featured Image page
browser.click_link_by_partial_text('more info')

# Capture page as html
html = browser.html

# Store html as soup object
soup = bs(html, 'html.parser')

# Capture image download links
image_dowload_links = soup.find_all('div', class_='download_tiff')

# Store the second download link on the page; Full-Res JPG
featured_image_url = image_dowload_links[1].a['href']

browser.quit()

featured_image_url



'//photojournal.jpl.nasa.gov/jpeg/PIA11777.jpg'

### Scrape the latest Mars weather tweet from Twitter
Source:  [Twitter Latest Mars Weather Tweet](https://twitter.com/marswxreport?lang=en)

In [5]:
# Create a variable to save the latest Mars weather report tweet from Twitter
mars_weather = ""

# URL of page to be scraped
url = 'https://twitter.com/marswxreport?lang=en'

# Retrieve page with the requests module
response = requests.get(url)

# Create BeautifulSoup object; parse with 'html.parser'
soup = bs(response.text, 'html.parser')

# Examine the results, then determine element that contains sought info
#print(soup.prettify())

# Capture weather tweets
weather_tweets = soup.find_all('div', class_="js-tweet-text-container")

# Isolate a-tag data for removal from weather tweet
a_text = weather_tweets[0].a.text.strip()
#a_text

# Store most recent Mars weather tweet
mars_weather = weather_tweets[0].p.text.strip(a_text)
mars_weather

'InSight sol 453 (2020-03-05) low -95.1ºC (-139.1ºF) high -10.8ºC (12.6ºF)\nwinds from the SSW at 6.0 m/s (13.3 mph) gusting to 21.4 m/s (47.9 mph)\npressure at 6.30 hPa'

### Scrape the the Mars Facts webpage for facts about the planet including Diameter, Mass, etc.
Source:  [Space-Facts.com/Mars](https://space-facts.com/mars/)

In [6]:
# Create a variable to scrape the table containing facts about the planet including Diameter, Mass, etc.
# Convert the data to an HTML table string
mars_facts_html = ""

# URL of page to be scraped
url = 'https://space-facts.com/mars/'

# Retrieve page with the requests module
response = requests.get(url)

# Create BeautifulSoup object; parse with 'html.parser'
soup = bs(response.text, 'html.parser')

# Examine the results, then determine element that contains sought info
#print(soup.prettify())

# Capture mars facts html tables
mars_facts_tables = soup.find_all('table', class_="tablepress tablepress-id-p-mars")
#mars_facts_tables

mars_facts_html = mars_facts_tables[1]
mars_facts_html

<table class="tablepress tablepress-id-p-mars" id="tablepress-p-mars-no-2"><tbody><tr class="row-1 odd"><td class="column-1"><strong>Equatorial Diameter:</strong></td><td class="column-2">6,792 km<br/></td></tr><tr class="row-2 even"><td class="column-1"><strong>Polar Diameter:</strong></td><td class="column-2">6,752 km<br/></td></tr><tr class="row-3 odd"><td class="column-1"><strong>Mass:</strong></td><td class="column-2">6.39 × 10^23 kg<br> (0.11 Earths)</br></td></tr><tr class="row-4 even"><td class="column-1"><strong>Moons:</strong></td><td class="column-2">2 (<a href="https://space-facts.com/moons/phobos/">Phobos</a> &amp; <a href="https://space-facts.com/moons/deimos/">Deimos</a>)</td></tr><tr class="row-5 odd"><td class="column-1"><strong>Orbit Distance:</strong></td><td class="column-2">227,943,824 km<br> (1.38 AU)</br></td></tr><tr class="row-6 even"><td class="column-1"><strong>Orbit Period:</strong></td><td class="column-2">687 days (1.9 years)<br/></td></tr><tr class="row-7

### Visit the SGS Astrogeology site to obtain high resolution images for each of Mar's hemispheres.
Source:  [US Geological Survey (USGS)](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars)

In [7]:
# Create a Python dictionary to store the data using the keys 'img_url' and 'title'.
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]

# Find urls for all images in the dictionary
for image in range(len(hemisphere_image_urls)):
    
    # Start automated browsing session
    browser = Browser('chrome', **executable_path, headless=False)

    # URL of page to be scraped
    url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
    browser.visit(url)
    
    current_image = hemisphere_image_urls[image]["title"]

    # Click 'FULL IMAGE' link on landing page
    browser.click_link_by_partial_text(current_image)

    # Capture page as html
    html = browser.html

    # Store html as soup object
    soup = bs(html, 'html.parser')
    #soup

    # Capture image download links
    image_dowloads = soup.find_all('div', class_='downloads')

    # Store the first download link in the container
    hemisphere_image_urls[image]["img_url"] = image_dowloads[0].a['href']
    #hemisphere_image_urls[0]["img_url"]
    
    # Print image capture confirmation
    print(f"Captured image URL for {current_image}")
    print('-----------')
    
    # Close the browser
    browser.quit()


Captured image URL for Valles Marineris Hemisphere
-----------
Captured image URL for Cerberus Hemisphere
-----------
Captured image URL for Schiaparelli Hemisphere
-----------
Captured image URL for Syrtis Major Hemisphere
-----------


In [8]:
# Confirm image capture
for image in range(len(hemisphere_image_urls)):
    print(hemisphere_image_urls[image]["title"])
    print(hemisphere_image_urls[image]["img_url"])
    print('-----------')

Valles Marineris Hemisphere
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg
-----------
Cerberus Hemisphere
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg
-----------
Schiaparelli Hemisphere
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg
-----------
Syrtis Major Hemisphere
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg
-----------
