# Mission to Mars

Using BeautifulSoup, Pandas, and Requests/Splinter, will perform a web scraping of the following:
- [NASA Mars News Sites](https://mars.nasa.gov/news/) to collect the latest News Title and Paragraph Text and store for later use;
- [JPL Mars Space Images](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars) to collect the full size .jpg of the featured image;
- [Mars Weather Twitter](https://twitter.com/marswxreport?lang=en) to scrape the latest Mars weather tweets from the page;
- [Mars Facts page](https://space-facts.com/mars/) - using Pandas to scrape table containing facts about the planet such as Diameter, Mass, etc.; and
- [USGS Astrology Site](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to collect high resolution impages for each of Mar's hemispheres

### Import dependencies

In [None]:
from splinter import Browser
from bs4 import BeautifulSoup
import requests
import os
import pandas as pd

### Setup Splinter configuration variables

In [None]:
# use if os join doesn't work: '../resources/chromedriver.exe'
executable_path = {'executable_path': os.path.join("..","Resources","chromedriver.exe")}
browser = Browser('chrome', **executable_path, headless=False)

### Define variables for each URL to scrape

In [None]:
url_nasa = 'https://mars.nasa.gov/news/'
url_jpl = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
url_weather = 'https://twitter.com/marswxreport?lang=en'
url_facts = 'https://space-facts.com/mars/'
url_USGS = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

### NASA Mars News

##### Pass through nasa url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [None]:
browser.visit(url_nasa)

In [None]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Establish variables to capture first news title and associated p tags for latest news info

In [None]:
div = soup.find('div', attrs={'class': 'content_title'})
news_title = div.find('a').text

In [None]:
news_p = soup.find('div', attrs={'class': 'article_teaser_body'})
#news_p = div.find.text

### JPL Mars Featured Image

##### Pass through jpl url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [None]:
browser.visit(url_jpl)

In [None]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Click through full size image button feature image and then establish variable to capture full size Feature Image

In [None]:
#need to click full size image first and then scrape - go back to class activity to find instruction on clicking but this is my first pass and will also need to join page url to src to complete full url so hopefully os works on urls
feature_image_button = driver.find_element_by_id('full_image')
feature_image_button.click()
feature_image = soup.find('img')
featured_image_url = os.path.join(url_jpl, img)

### Mars Weather

##### Pass through weather url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [None]:
browser.visit(url_weather)

In [None]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Establish variables to capture first news title and associated p tags for latest news info

In [None]:
div = soup.find('div', attrs={'class': 'js-tweet-text-container'})
mars_weather = div.find('p').text

### Mars Facts

##### Using Pandas read_html function using the facts url variable to brower to visit site and capture table

In [None]:
tables = pd.read_html(url_facts)
len(tables)

In [None]:
df = tables[0]
df.columns = ['Metric', 'Value']
df.head()

##### Using Pandas to_html function convert to HTML table string

In [None]:
df.to_html

### Mars Hemispheres using USGS Astrology Site

##### Pass through USGS url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [None]:
browser.visit(url_USGS)

In [None]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Using a for loop, identify the div class containing the hemisphere data and link to image download and store as dictionary

In [None]:
hemisphere_image_urls = []
div = soup.find_all('div', attrs={'class': 'description'})

#hemisphere_links = []
#link = soup.find_all('a', attrs={'class': 'itemLink product-item'}, href=True)[i]

for in range (0,3):
    title = div.find_all('h3')[i].text
    link = div.find_all('a', attrs={'class': 'itemLink product-item'})[i].get('href')
    browser.visit(link)
    download = soup.find('div', attrs={'class': 'downloads'})
    img_url = download.find_all('a')[1].get('href')
    hemisphere_image_urls.append({'title': title, 'img_url': img_url})
    browser.visit(url_USGS)