# Mission to Mars

#### Unit 10 Web Scraping Assignment
Author: Jose Tomines <br>
Date: 2019-03-05


In this assignment, you will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. The following outlines what you need to do.

## Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

Create a Jupyter Notebook file called mission_to_mars.ipynb and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.


### NASA Mars News
Scrape the NASA Mars News Site and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

#### Example:
news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet"

news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast."

### JPL Mars Space Images - Featured Image

- Visit the url for JPL Featured Space Image here.
- Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url.
- Make sure to find the image url to the full size .jpg image.
- Make sure to save a complete url string for this image.

#### Example:
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA17357_ip.jpg'


### Mars Weather
- Visit the Mars Weather twitter account here and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called mars_weather.


#### Example:
mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55'


### Mars Facts
- Visit the Mars Facts webpage here and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
- Use Pandas to convert the data to a HTML table string.


### Mars Hemispheres
- Visit the USGS Astrogeology site here to obtain high resolution images for each of Mar's hemispheres.
- You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.
- Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.
- Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.


#### Example:
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]

In [1]:
# depndencies
from splinter import Browser
from bs4 import BeautifulSoup as bs
import pandas as pd
import requests
from pprint import pprint

## NASA Mars News

In [2]:
# path to chromedriver
executable_path = {'executable_path':'chromedriver.exe'}

In [3]:
# define function to remove ending substring from string
def remove_substring(string, substring):
    if substring in string:
        return string[:-len(substring)]
    return string

In [4]:
# url with latest NASA stories
browser = Browser('chrome', **executable_path, headless=False)
url1 = 'https://mars.nasa.gov/news/'
browser.visit(url1)

# set up parser
html = browser.html
soup = bs(html, 'lxml')

# get latest news articles
dateLatest = soup.find('div', class_='list_date').text
titleLatest = soup.find('div', class_='content_title').text
summaryLatest = soup.find('div', class_='article_teaser_body').text

# close browser
browser.quit()

# create news dictionary
newsDict = {"Date":dateLatest,
            "Title": titleLatest,
            "Summary": summaryLatest}

# view news information
pprint(newsDict)

{'Date': 'April  5, 2019',
 'Summary': 'Nominees include four JPL projects: the solar system and climate '
            'websites, InSight social media, and a 360-degree Earth video. '
            'Public voting closes April 18, 2019.',
 'Title': 'NASA Garners 7 Webby Award Nominations'}


## JPL Mars Space Images

In [5]:
# visit 2nd URL
browser = Browser('chrome', **executable_path, headless=False)
url2 = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url2)

# click FULL IMAGE BUTTON
fullImageButton = browser.find_by_id('full_image')
fullImageButton.click()

# click the "more info" button
browser.is_element_present_by_text('more info', wait_time=1)
moreInfoButton = browser.find_link_by_partial_text('more info')
moreInfoButton.click()

# set up parser
html = browser.html
soup = bs(html, 'lxml')

# find featured image title
imgTitle = soup.find('div', class_='fancybox-title')

# find featured image url
imgRelativeUrl = soup.select_one('figure.lede a img').get("src") 
imgUrl = f'https://www.jpl.nasa.gov{imgRelativeUrl}'

#close browser
browser.quit()

# create image dictionary
imgDict = {"Title": imgTitle,
         "Source": imgUrl}

# view image information
pprint(imgDict)

{'Source': 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16092_hires.jpg',
 'Title': None}


## Mars Weather

In [6]:
# visitng Mars Weather Twitter page
browser = Browser('chrome', **executable_path, headless=False)
url3 = 'https://twitter.com/marswxreport?lang=en'
browser.visit(url3)

# set up parser
html = browser.html
soup = bs(html, 'lxml')

# finding first tweet with data-name 'Mars Weather'
marsWeatherTweet = soup.find('div', attrs={"class": "tweet", "data-name": "Mars Weather"})

# search for p tag within tweet text
marsWeather = marsWeatherTweet.find('p', 'tweet-text').get_text()

# create Mars weather dictionary
marsWeatherDict = {"Mars Weather": marsWeather}

# close browser
browser.quit()

# view Mars weather information
pprint(marsWeatherDict)

{'Mars Weather': 'InSight sol 130 (2019-04-08) low -98.0ºC (-144.4ºF) high '
                 '-15.5ºC (4.1ºF)\n'
                 'winds from the SW at 4.1 m/s (9.3 mph) gusting to 11.7 m/s '
                 '(26.2 mph)\n'
                 'pressure at 7.30 hPapic.twitter.com/awJfx8w2YE'}


## Mars Facts

In [7]:
# read Mars facts from url
url4 = 'https://space-facts.com/mars/'
facts = pd.read_html(url4)
marsFacts = facts[0]
marsFacts.columns = ['Mars Planet Profile', 'Fact Value']

# create Mars facts dictionary
marsFactsDict = marsFacts.set_index('Mars Planet Profile').to_dict()['Fact Value']

# view Mars facts
pprint(marsFactsDict)

{'Equatorial Diameter:': '6,792 km',
 'First Record:': '2nd millennium BC',
 'Mass:': '6.42 x 10^23 kg (10.7% Earth)',
 'Moons:': '2 (Phobos & Deimos)',
 'Orbit Distance:': '227,943,824 km (1.52 AU)',
 'Orbit Period:': '687 days (1.9 years)',
 'Polar Diameter:': '6,752 km',
 'Recorded By:': 'Egyptian astronomers',
 'Surface Temperature:': '-153 to 20 °C'}


## Mars Hemispheres

In [8]:
# visit USGS Astrogeology Science Center
browser = Browser('chrome', **executable_path, headless=False)
url5='https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(url5)

# create hemisphere list
hemisphereUrls = []

# get a list of all hemispheres
links = browser.find_by_css("a.product-item h3")

# Next, loop through those links, click the link, find the sample anchor, return the href
for i in range(len(links)):
    hemisphere = {}
    
    # list all elements
    browser.find_by_css("a.product-item h3")[i].click()
    
    # find the Sample image anchor tag to get the href
    sampleImgATag = browser.find_link_by_text('Sample').first
    hemisphere['img_url'] = sampleImgATag['href']
    
    # get Hemisphere title
    hemisphere['title'] = browser.find_by_css("h2.title").text
    
    # Append hemisphere to list
    hemisphereUrls.append(hemisphere)
    
    # Finally, we navigate backwards
    browser.back()

# close browser
browser.quit()

hemisphereUrls

[{'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg',
  'title': 'Cerberus Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg',
  'title': 'Schiaparelli Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg',
  'title': 'Syrtis Major Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg',
  'title': 'Valles Marineris Hemisphere Enhanced'}]