# Mission to Mars

Using BeautifulSoup, Pandas, and Requests/Splinter, will perform a web scraping of the following:
- [NASA Mars News Sites](https://mars.nasa.gov/news/) to collect the latest News Title and Paragraph Text and store for later use;
- [JPL Mars Space Images](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars) to collect the full size .jpg of the featured image;
- [Mars Weather Twitter](https://twitter.com/marswxreport?lang=en) to scrape the latest Mars weather tweets from the page;
- [Mars Facts page](https://space-facts.com/mars/) - using Pandas to scrape table containing facts about the planet such as Diameter, Mass, etc.; and
- [USGS Astrology Site](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to collect high resolution impages for each of Mar's hemispheres

### Import dependencies

In [30]:
from splinter import Browser
from bs4 import BeautifulSoup
import requests
import os
from urllib.parse import urljoin
import pandas as pd

### Setup Splinter configuration variables

In [2]:
# use if os join doesn't work: '../resources/chromedriver.exe'
executable_path = {'executable_path': os.path.join("..","Resources","chromedriver.exe")}
browser = Browser('chrome', **executable_path, headless=False)

### Define variables for each URL to scrape

In [3]:
url_nasa = 'https://mars.nasa.gov/news/'
url_jpl = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
url_weather = 'https://twitter.com/marswxreport?lang=en'
url_facts = 'https://space-facts.com/mars/'
url_USGS = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

### NASA Mars News

##### Pass through nasa url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [4]:
browser.visit(url_nasa)

In [5]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Establish variables to capture first news title and associated p tags for latest news info

In [6]:
div = soup.find('div', attrs={'class': 'content_title'})
news_title = div.find('a').text

In [7]:
news_p = soup.find('div', attrs={'class': 'article_teaser_body'}).text

In [8]:
print(news_title)
print(news_p)

Media Get a Close-Up of NASA's Mars 2020 Rover
The clean room at NASA's Jet Propulsion Laboratory was open to the media to see NASA's next Mars explorer before it leaves for Florida in preparation for a summertime launch.


### JPL Mars Featured Image

##### Pass through jpl url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [60]:
browser.visit(url_jpl)

In [61]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Click through full size image button feature image, expand image to get to img src and then establish variable to capture full size Feature Image partial path and use urljoin to merge partial to absolute path.

In [63]:
browser.click_link_by_id('full_image')
try:
    expand = browser.find_by_css('a.fancybox-expand')
    expand.click()
    image_html = browser.html
    image_soup = BeautifulSoup(image_html, 'html.parser')
    img_partialpath = image_soup.find('img', class_='fancybox-image')['src']
except ElementNotVisibleException:
    print(e)
        
featured_image_url = urljoin(url_jpl, img_partialpath)

In [64]:
print(featured_image_url)

https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA07137_ip.jpg


### Mars Weather

##### Pass through weather url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [34]:
browser.visit(url_weather)

In [35]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Establish variables to capture first news title and associated p tags for latest news info

In [36]:
div = soup.find('div', attrs={'class': 'js-tweet-text-container'})
mars_weather = div.find('p').text

In [37]:
print(mars_weather)

InSight sol 400 (2020-01-11) low -99.1ºC (-146.5ºF) high -15.7ºC (3.8ºF)
winds from the SSE at 5.5 m/s (12.3 mph) gusting to 22.3 m/s (49.9 mph)
pressure at 6.40 hPapic.twitter.com/xYQHT9cdn5


### Mars Facts

##### Using Pandas read_html function using the facts url variable to brower to visit site and capture table

In [38]:
tables = pd.read_html(url_facts)
len(tables)

3

In [39]:
df = tables[0]
df.columns = ['Metric', 'Value']
df.head()

Unnamed: 0,Metric,Value
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 × 10^23 kg (0.11 Earths)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.38 AU)"


##### Using Pandas to_html function convert to HTML table string

In [43]:
html_table = df.to_html()
html_table

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Metric</th>\n      <th>Value</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Equatorial Diameter:</td>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Polar Diameter:</td>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Mass:</td>\n      <td>6.39 × 10^23 kg (0.11 Earths)</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Moons:</td>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Orbit Distance:</td>\n      <td>227,943,824 km (1.38 AU)</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Orbit Period:</td>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Surface Temperature:</td>\n      <td>-87 to -5 °C</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>First Record:</td>\n      <td>2nd millennium BC</t

In [44]:
html_table.replace('\n', '')

'<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>Metric</th>      <th>Value</th>    </tr>  </thead>  <tbody>    <tr>      <th>0</th>      <td>Equatorial Diameter:</td>      <td>6,792 km</td>    </tr>    <tr>      <th>1</th>      <td>Polar Diameter:</td>      <td>6,752 km</td>    </tr>    <tr>      <th>2</th>      <td>Mass:</td>      <td>6.39 × 10^23 kg (0.11 Earths)</td>    </tr>    <tr>      <th>3</th>      <td>Moons:</td>      <td>2 (Phobos &amp; Deimos)</td>    </tr>    <tr>      <th>4</th>      <td>Orbit Distance:</td>      <td>227,943,824 km (1.38 AU)</td>    </tr>    <tr>      <th>5</th>      <td>Orbit Period:</td>      <td>687 days (1.9 years)</td>    </tr>    <tr>      <th>6</th>      <td>Surface Temperature:</td>      <td>-87 to -5 °C</td>    </tr>    <tr>      <th>7</th>      <td>First Record:</td>      <td>2nd millennium BC</td>    </tr>    <tr>      <th>8</th>      <td>Recorded By:</td>      <td>Egyptian astronomers</

In [45]:
df.to_html('table.html')

### Mars Hemispheres using USGS Astrology Site

##### Pass through USGS url variable to browser to visit site and establish variables to capture underlying HTML and pass back to BeautifulSoup

In [149]:
browser.visit(url_USGS)

In [150]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

##### Using a for loop, identify the div class containing the hemisphere data and link to image download and store as dictionary

In [167]:
hemisphere_image_urls = []

for i in range (0,4):
    hemisphere_image = {}
    hemisphere = soup.find_all('h3')[i].text.strip('Enhanced')
    hemisphere_image["title"] = hemisphere.strip()
    
    link_name = soup.find_all('h3')[i].text.strip('Hemisphere Enhanced')
    
    try:
        browser.click_link_by_partial_text(link_name)
        browser.click_link_by_partial_text('Open')
        hemi_html = browser.html
        hemi_soup = BeautifulSoup(hemi_html, 'html.parser')
        hemisphere_img = hemi_soup.body.find('img', class_='wide-image')
        hemi_partialpath = hemisphere_img['src']
    except:
        print("nope")
        hemi_partialpath ="/#"
        
    hemisphere_image["img_url"] = urljoin(url_USGS, hemi_partialpath)
    browser.visit(url_USGS)
    
    hemisphere_image_urls.append(hemisphere_image)

In [168]:
print(hemisphere_image_urls)

[{'title': 'Cerberus Hemisphere', 'img_url': 'https://astrogeology.usgs.gov/cache/images/cfa62af2557222a02478f1fcd781d445_cerberus_enhanced.tif_full.jpg'}, {'title': 'Schiaparelli Hemisphere', 'img_url': 'https://astrogeology.usgs.gov/cache/images/3cdd1cbf5e0813bba925c9030d13b62e_schiaparelli_enhanced.tif_full.jpg'}, {'title': 'Syrtis Major Hemisphere', 'img_url': 'https://astrogeology.usgs.gov/cache/images/ae209b4e408bb6c3e67b6af38168cf28_syrtis_major_enhanced.tif_full.jpg'}, {'title': 'Valles Marineris Hemisphere', 'img_url': 'https://astrogeology.usgs.gov/cache/images/7cf2da4bf549ed01c17f206327be4db7_valles_marineris_enhanced.tif_full.jpg'}]
