# Mission To Mars
A demonstration of how BeautifulSoup, Splinter, and Pandas can be used to scrape information and images from live websites.
<hr>
<b>Submitted by:</b> &nbsp;&nbsp; Ricardo G. Mora, Jr.  12/16/2021

In [1]:
# Import Splinter, BeautifulSoup, and Pandas
from splinter import Browser
from bs4 import BeautifulSoup as soup
import pandas as pd
from webdriver_manager.chrome import ChromeDriverManager

In [2]:
# Set up Splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)



Current google-chrome version is 96.0.4664
Get LATEST chromedriver version for 96.0.4664 google-chrome
Trying to download new driver from https://chromedriver.storage.googleapis.com/96.0.4664.45/chromedriver_win32.zip
Driver has been saved in cache [C:\Users\ricar\.wdm\drivers\chromedriver\win32\96.0.4664.45]


## Section 1: 
### Scrape the most recent headline article from the NASA mars news site:

In [3]:
# Visit the Mars news site
url = 'https://redplanetscience.com/'
browser.visit(url)

# Optional delay for loading the page
browser.is_element_present_by_css('div.list_text', wait_time=1)

True

In [4]:
# Convert the browser html to a soup object
html = browser.html
news_soup = soup(html, 'html.parser')

slide_elem = news_soup.select_one('div.list_text')
print(slide_elem)

<div class="list_text">
<div class="list_date">December 16, 2021</div>
<div class="content_title">The MarCO Mission Comes to an End</div>
<div class="article_teaser_body">The pair of briefcase-sized satellites made history when they sailed past Mars in 2019.</div>
</div>


In [5]:
#display the current title content
print(slide_elem.find("div", class_="content_title"))

<div class="content_title">The MarCO Mission Comes to an End</div>


In [6]:
# Use the parent element to find the first a tag and save it as `news_title`
news_title = slide_elem.find("div", class_="content_title").text.strip()
print(news_title)

The MarCO Mission Comes to an End


In [7]:
# Use the parent element to find the paragraph text
news_p = slide_elem.find("div", class_="article_teaser_body").text.strip()
print(news_p)

The pair of briefcase-sized satellites made history when they sailed past Mars in 2019.


## Section 2:
### Scrape the featured image from the JPL space images site:

In [8]:
# Visit URL
url = 'https://spaceimages-mars.com'
browser.visit(url)

In [9]:
# Find and click the full image button
browser.links.find_by_partial_text("FULL IMAGE").click()

In [10]:
# Parse the resulting html with soup
html2 = browser.html
image_soup = soup(html2, 'html.parser')
image_elem = image_soup.find("img", class_="headerimage")
print(image_elem)

<img class="headerimage fade-in" src="image/featured/mars3.jpg"/>


In [11]:
# find the relative image url
img_url_rel = image_elem["src"]
print(img_url_rel)

image/featured/mars3.jpg


In [12]:
# Use the base url to create an absolute url
img_url = url + "/" + img_url_rel
print(img_url)

https://spaceimages-mars.com/image/featured/mars3.jpg


## Section 3:
### Scrape the Mars-Earth comparison table from the Mars Facts site:

In [13]:
# Visit URL
url = 'https://galaxyfacts-mars.com'
browser.visit(url)

In [14]:
# Use Pandas to grab the first table
df = pd.read_html(url, header=0)[0]
df

Unnamed: 0,Mars - Earth Comparison,Mars,Earth
0,Diameter:,"6,779 km","12,742 km"
1,Mass:,6.39 × 10^23 kg,5.97 × 10^24 kg
2,Moons:,2,1
3,Distance from Sun:,"227,943,824 km","149,598,262 km"
4,Length of Year:,687 Earth days,365.24 days
5,Temperature:,-87 to -5 °C,-88 to 58°C


In [15]:
# Use Pandas to set the first column as the index column
df = df.rename(columns={"Mars - Earth Comparison": "Attributes"})
df = df.set_index("Attributes")
df

Unnamed: 0_level_0,Mars,Earth
Attributes,Unnamed: 1_level_1,Unnamed: 2_level_1
Diameter:,"6,779 km","12,742 km"
Mass:,6.39 × 10^23 kg,5.97 × 10^24 kg
Moons:,2,1
Distance from Sun:,"227,943,824 km","149,598,262 km"
Length of Year:,687 Earth days,365.24 days
Temperature:,-87 to -5 °C,-88 to 58°C


In [16]:
# Use Pandas to create an html string from the dataframe
html_string = df.to_html()
print(html_string)

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Mars</th>
      <th>Earth</th>
    </tr>
    <tr>
      <th>Attributes</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Diameter:</th>
      <td>6,779 km</td>
      <td>12,742 km</td>
    </tr>
    <tr>
      <th>Mass:</th>
      <td>6.39 × 10^23 kg</td>
      <td>5.97 × 10^24 kg</td>
    </tr>
    <tr>
      <th>Moons:</th>
      <td>2</td>
      <td>1</td>
    </tr>
    <tr>
      <th>Distance from Sun:</th>
      <td>227,943,824 km</td>
      <td>149,598,262 km</td>
    </tr>
    <tr>
      <th>Length of Year:</th>
      <td>687 Earth days</td>
      <td>365.24 days</td>
    </tr>
    <tr>
      <th>Temperature:</th>
      <td>-87 to -5 °C</td>
      <td>-88 to 58°C</td>
    </tr>
  </tbody>
</table>


## Section 4:
### Scrape the four hemisphere images from the Mars Hemispheres site:

In [17]:
# Visit URL
url = 'https://marshemispheres.com/'
browser.visit(url)

In [18]:
# Create a list to hold the images and titles.
hemisphere_image_urls = []

# Get a list of all of the hemispheres
links = browser.find_by_css('a.product-item img')

# Next, loop through those links, click the link, find the sample anchor, return the href
for i in range(len(links)):
    
    
    # We have to find the elements on each loop to avoid a stale element exception
    links = browser.find_by_css('a.product-item img')
    links[i].click()
    temp_soup = soup(browser.html, 'html.parser')
    
    # Next, we find the Sample image anchor tag and extract the href
    image_url = url + temp_soup.find("a", text="Sample")["href"]
    
    # Get Hemisphere title
    hemisphere_title = temp_soup.find("h2", class_="title").text.strip()
    
    # Append hemisphere object to list
    image_url_dict = {
        "title": hemisphere_title,
        "img_url": image_url
    }
    hemisphere_image_urls.append(image_url_dict)
    
    # Finally, we navigate backwards
    browser.back()

In [19]:
# Display the completed list
hemisphere_image_urls

[{'title': 'Cerberus Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/full.jpg'},
 {'title': 'Schiaparelli Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/schiaparelli_enhanced-full.jpg'},
 {'title': 'Syrtis Major Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/syrtis_major_enhanced-full.jpg'},
 {'title': 'Valles Marineris Hemisphere Enhanced',
  'img_url': 'https://marshemispheres.com/images/valles_marineris_enhanced-full.jpg'}]

In [20]:
# Close the browser
browser.quit()