# Step 1 - Scraping

## NASA Mars News

- Scrape the NASA Mars News Site and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

In [1]:
# Dependencies
from bs4 import BeautifulSoup
from splinter import Browser

In [2]:
# URL of page to be scaped
url = 'https://mars.nasa.gov/news/'
executable_path = {"executable_path": "./chromedriver"}
browser = Browser("chrome", **executable_path, headless=False)

In [3]:
# Visit the NASA news URL
browser.visit(url)

In [4]:
# Scrape page using beautiful soup
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

In [5]:
# See details in html 
#print (soup.prettify())

In [6]:
# Find the latest news, including title, paragraph and date
latest_news = soup.find("div", class_="list_text")
#print (latest_news)
latest_news_date = latest_news.find("div", class_="list_date").text
latest_news_title = latest_news.find("div", class_="content_title").text
latest_news_paragraph = latest_news.find("div", class_="article_teaser_body").text

print ("=======================")
print ("[The latest news]")
print (f"Title: {latest_news_title}")
print (f"Date: {latest_news_date}")
print (f"Article: {latest_news_paragraph}")
print ("=======================")

[The latest news]
Title: MarCO Makes Space for Small Explorers
Date: September 13, 2018
Article: A pair of NASA CubeSats flying to Mars are opening a new frontier for small spacecraft.


## JPL Mars Space Images - Featured Image

- Visit the url for JPL Featured Space Image
- Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`
- Make sure to find the image url to the full size `.jpg` image
- Make sure to save a complete url string for this image

In [7]:
# New URL
# executable_path = {"executable_path": "./chromedriver"}
# browser = Browser("chrome", **executable_path, headless=False)
url2 = "https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars"
browser.visit(url2)
html = browser.html
soup = BeautifulSoup(html, 'html.parser')
#print (soup.prettify())

In [8]:
featured_image = soup.find("ul", class_="articles")
featured_image_hires = featured_image.find("a", class_="fancybox")["data-fancybox-href"]
featured_image_url = "https://www.jpl.nasa.gov"+featured_image_hires
print (featured_image_url)

https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22710_hires.jpg


## Mars Weather

- Visit the Mars Weather twitter account ('https://twitter.com/marswxreport?lang=en') and scrape the latest Mars weather tweet from the page. 
- Save the tweet text for the weather report as a variable called `mars_weather`

In [9]:
# New URL
# executable_path = {"executable_path": "./chromedriver"}
# browser = Browser("chrome", **executable_path, headless=False)
url3 = "https://twitter.com/marswxreport?lang=en"
browser.visit(url3)
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

In [10]:
latest_tweet = soup.find("div", class_="js-tweet-text-container")
mars_weather = latest_tweet.find("p", class_="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text").text
print (mars_weather)

Sol 2169 (2018-09-12), high -10C/14F, low -70C/-93F, pressure at 8.82 hPa, daylight 05:41-17:58


## Mars Facts

- Visit the Mars Facts webpage (http://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
- Use Pandas to convert the data to a HTML table string.

In [11]:
import pandas as pd
url4 = "https://space-facts.com/mars"
table_html = pd.read_html(url4)[0]
table_html.columns = ['Object','Mars']
table_html

Unnamed: 0,Object,Mars
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.42 x 10^23 kg (10.7% Earth)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.52 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-153 to 20 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


## Mars Hemispheres

- Visit the USGS Astrogeology site (https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

- You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

- Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

- Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [12]:
# executable_path = {"executable_path": "./chromedriver"}
# browser = Browser("chrome", **executable_path, headless=False)
url5 = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"
browser.visit(url5)
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

In [13]:
MarsHemis = []
Hemis = soup.findAll("div", class_="item")
for hemi in Hemis:
    downloadpage = hemi.find("a")["href"]
    browser.visit("https://astrogeology.usgs.gov"+downloadpage)
    html = browser.html
    soup2 = BeautifulSoup(html,'html.parser')
    image = soup2.find("div", class_="downloads")
    image_full = image.find("a")["href"]
    MarsHemis.append({'title': hemi.find("h3").text,
                      'img_url':image_full})
MarsHemis

[{'title': 'Cerberus Hemisphere Enhanced',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg'},
 {'title': 'Schiaparelli Hemisphere Enhanced',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg'},
 {'title': 'Syrtis Major Hemisphere Enhanced',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg'},
 {'title': 'Valles Marineris Hemisphere Enhanced',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg'}]