## Mission to Mars

![mission_to_mars](Images/mission_to_mars.png)

In this assignment, you will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. The following outlines what you need to do.

## Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

* Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.

In [1]:
# Dependencies
import pandas as pd
import re
import requests
import pymongo
from splinter import Browser
from bs4 import BeautifulSoup


In [2]:

## Step 1 - Scraping

### NASA Mars News

# Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) 
# And collect the latest News Title and Paragraph Text. 
# Assign the text to variables that you can reference later.




In [3]:
# Obtain html of Mars website
mars_news_url = 'https://mars.nasa.gov/news/'
mars_news_html = requests.get(mars_news_url)

In [4]:
# Parse html file with BeautifulSoup
mars_soup = BeautifulSoup(mars_news_html.text, 'html.parser')

In [5]:
# Print body of html
print(mars_soup.body.prettify())

<body id="news">
 <svg display="none" height="0" width="0">
  <symbol height="30" id="circle_plus" viewbox="0 0 30 30" width="30">
   <g fill-rule="evenodd" transform="translate(1 1)">
    <circle cx="14" cy="14" fill="#fff" fill-opacity=".1" fill-rule="nonzero" r="14" stroke="inherit" stroke-width="1">
    </circle>
    <path class="the_plus" d="m18.856 12.96v1.738h-4.004v3.938h-1.848v-3.938h-4.004v-1.738h4.004v-3.96h1.848v3.96z" fill="inherit" stroke-width="0">
    </path>
   </g>
  </symbol>
  <symbol height="30" id="circle_arrow" viewbox="0 0 30 30" width="30" xmlns="http://www.w3.org/2000/svg">
   <g transform="translate(1 1)">
    <circle cx="14" cy="14" fill="#fff" fill-opacity=".1" r="14" stroke="inherit" stroke-width="1">
    </circle>
    <path class="the_arrow" d="m8.5 15.00025h7.984l-2.342 2.42c-.189.197-.189.518 0 .715l.684.717c.188.197.494.197.684 0l4.35-4.506c.188-.199.188-.52 0-.717l-4.322-4.48c-.189-.199-.496-.199-.684 0l-.684.716c-.189.197-.189.519 0 .716l2.341 2.419h

In [6]:
# Find article titles
article_titles = mars_soup.find_all('div', class_='content_title')
article_titles

[<div class="content_title">
 <a href="/news/8442/nasas-curiosity-mars-rover-finds-a-clay-cache/">
 NASA's Curiosity Mars Rover Finds a Clay Cache
 </a>
 </div>, <div class="content_title">
 <a href="/news/8436/why-this-martian-full-moon-looks-like-candy/">
 Why This Martian Full Moon Looks Like Candy
 </a>
 </div>, <div class="content_title">
 <a href="/news/8426/nasa-garners-7-webby-award-nominations/">
 NASA Garners 7 Webby Award Nominations
 </a>
 </div>, <div class="content_title">
 <a href="/news/8413/nasas-opportunity-rover-mission-on-mars-comes-to-end/">
 NASA's Opportunity Rover Mission on Mars Comes to End
 </a>
 </div>, <div class="content_title">
 <a href="/news/8402/nasas-insight-places-first-instrument-on-mars/">
 NASA's InSight Places First Instrument on Mars
 </a>
 </div>, <div class="content_title">
 <a href="/news/8387/nasa-announces-landing-site-for-mars-2020-rover/">
 NASA Announces Landing Site for Mars 2020 Rover
 </a>
 </div>]

In [7]:
# Loop to get article titles
for article in article_titles:
    title = article.find('a')
    title_text = title.text
    print(title_text)


NASA's Curiosity Mars Rover Finds a Clay Cache


Why This Martian Full Moon Looks Like Candy


NASA Garners 7 Webby Award Nominations


NASA's Opportunity Rover Mission on Mars Comes to End


NASA's InSight Places First Instrument on Mars


NASA Announces Landing Site for Mars 2020 Rover



In [8]:
# Find paragraph text
paragraphs = mars_soup.find_all('div', class_='rollover_description')
paragraphs

[<div class="rollover_description">
 <div class="rollover_description_inner">
 The rover recently drilled two samples, and both showed the highest levels of clay ever found during the mission.
 </div>
 <div class="overlay_arrow">
 <img alt="More" src="/assets/overlay-arrow.png"/>
 </div>
 </div>, <div class="rollover_description">
 <div class="rollover_description_inner">
 For the first time, NASA's Mars Odyssey orbiter has caught the Martian moon Phobos during a full moon phase. Each color in this new image represents a temperature range detected by Odyssey's infrared camera.
 </div>
 <div class="overlay_arrow">
 <img alt="More" src="/assets/overlay-arrow.png"/>
 </div>
 </div>, <div class="rollover_description">
 <div class="rollover_description_inner">
 Nominees include four JPL projects: the solar system and climate websites, InSight social media, and a 360-degree Earth video. Public voting closes April 18, 2019.
 </div>
 <div class="overlay_arrow">
 <img alt="More" src="/assets/ov

In [9]:
# Loop through paragraph texts
for paragraph in paragraphs:
    p_text = paragraph.find('div')
    news_p = p_text.text
    print(news_p)


The rover recently drilled two samples, and both showed the highest levels of clay ever found during the mission.


For the first time, NASA's Mars Odyssey orbiter has caught the Martian moon Phobos during a full moon phase. Each color in this new image represents a temperature range detected by Odyssey's infrared camera.


Nominees include four JPL projects: the solar system and climate websites, InSight social media, and a 360-degree Earth video. Public voting closes April 18, 2019.


NASA's Opportunity Mars rover mission is complete after 15 years on Mars. Opportunity's record-breaking exploration laid the groundwork for future missions to the Red Planet.


In deploying its first instrument onto the surface of Mars, the lander completes a major mission milestone.


After a five-year search, NASA has chosen Jezero Crater as the landing site for its upcoming Mars 2020 rover mission.



### JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

In [10]:
# Open browser of Mars space images
mars_images_browser = Browser('chrome', headless=False)
nasa_url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
mars_images_browser.visit(nasa_url)

In [11]:
# Parse html file with BeautifulSoup
mars_images_html = mars_images_browser.html
nasa_soup = BeautifulSoup(mars_images_html, 'html.parser')

In [12]:
# Print body of html
print(nasa_soup.body.prettify())

<body class="dark_background logged_out mobile_menu" id="images" style="">
 <!--[if lt IE 9]>
      <div class='browsehappy' style='font-size: 30px; color: white; position:absolute; top: 0; margin: 0; height: 3000px; width: 100%; background: #000; z-index: 10000; padding: 5%;'>
        You are using an
        <strong>outdated</strong>
        browser. Please
        <a href='http://browsehappy.com/'>click here</a>
        to upgrade or change your browser.
      </div>
    <![endif]-->
 <div id="main_container">
  <div id="site_body">
   <div class="site_header_area">
    <header class="site_header">
     <div class="brand_area">
      <div class="brand1">
       <a class="nasa_logo" href="http://www.nasa.gov" title="NASA">
        NASA
       </a>
      </div>
      <div class="brand2">
       <div class="jpl_logo">
        <a href="//www.jpl.nasa.gov/" id="jpl_logo" title="Jet Propulsion Laboratory">
         Jet Propulsion Laboratory
        </a>
       </div>
       <div class="ca

In [13]:
# Find image link with BeautifulSoup
images = nasa_soup.find_all('div', class_='carousel_items')
images

[<div class="carousel_items">
 <article alt="The Case of the Warped Galactic Ring" class="carousel_item" style="background-image: url('/spaceimages/images/wallpaper/PIA14400-1920x1200.jpg');">
 <div class="default floating_text_area ms-layer">
 <h2 class="category_title">
 </h2>
 <h2 class="brand_title">
 				  FEATURED IMAGE
 				</h2>
 <h1 class="media_feature_title">
 				  The Case of the Warped Galactic Ring				</h1>
 <div class="description">
 </div>
 <footer>
 <a class="button fancybox" data-description="This image from ESA's Herschel Space Observatory reveals a suspected ring at the center of our galaxy is warped for reasons scientists cannot explain. The ring is twisted so that part of it rises above and below the plane of our Milky Way galaxy." data-fancybox-group="images" data-fancybox-href="/spaceimages/images/mediumsize/PIA14400_ip.jpg" data-link="/spaceimages/details.php?id=PIA14400" data-title="The Case of the Warped Galactic Ring" id="full_image">
 					FULL IMAGE
 				

In [14]:
# Loop through images
for nasa_image in images:
    image = nasa_image.find('article')
    background_image = image.get('style')
    # print(background_image)
    
    # Use regular expression to extract url - match anything after (.)
    re_background_image = re.search("'(.+?)'", background_image)
    # print(re_background_image)
    
    # Convert match object (url link) to string
    # group(0) includes quotations
    # group(1) gets the url link
    search_background_image = re_background_image.group(1)
    # print(search_background_image)
    
    featured_image_url = f'https://www.jpl.nasa.gov{search_background_image}'
    print(featured_image_url)

https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA14400-1920x1200.jpg


In [15]:

### Mars Weather

# Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) 
# and scrape the latest Mars weather tweet from the page. 
# Save the tweet text for the weather report as a variable called `mars_weather`.



In [16]:
# Get weather tweets with splinter
twitter_browser = Browser('chrome', headless=False)
twitter_url = 'https://twitter.com/marswxreport?lang=en'
twitter_browser.visit(twitter_url)

In [17]:
# Parse html file with BeautifulSoup
twitter_html = twitter_browser.html
twitter_soup = BeautifulSoup(twitter_html, 'html.parser')

In [18]:
# Print body of html
print(twitter_soup.body.prettify())

 <div class="visuallyhidden" id="kb-shortcuts-msg">
  <h2>
   Keyboard Shortcuts
  </h2>
  <p>
   Keyboard shortcuts are available for common actions and site navigation.
   <button id="show-shortcuts-btn" tabindex="-1" type="button">
    View Keyboard Shortcuts
   </button>
   <button id="dismiss-shortcuts-btn" tabindex="-1" type="button">
    Dismiss this message
   </button>
  </p>
 </div>
 <script id="swift_loading_indicator" nonce="">
  document.body.className=document.body.className+" "+document.body.getAttribute("data-fouc-class-names");
 </script>
 <noscript>
  <form action="https://mobile.twitter.com/i/nojs_router?path=%2Fmarswxreport&amp;lang=en" class="NoScriptForm" method="POST">
   <input name="authenticity_token" type="hidden" value="1599c99fb3fdf6c8e7330582029a48e32b265171"/>
   <div class="NoScriptForm-content">
    <span class="NoScriptForm-logo Icon Icon--logo Icon--extraLarge">
    </span>
    <p>
     We've detected that JavaScript is disabled in your browser. Would

In [19]:
# Find weather tweets with BeautifulSoup
mars_weather_tweets = twitter_soup.find_all('p', class_='TweetTextSize')
mars_weather_tweets

[<p class="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text" data-aria-label-part="0" lang="en">InSight sol 222 (2019-07-12) low -99.7ºC (-147.5ºF) high -24.8ºC (-12.6ºF)
 winds from the SSE at 4.2 m/s (9.4 mph) gusting to 15.6 m/s (34.8 mph)
 pressure at 7.60 hPa<a class="twitter-timeline-link u-hidden" data-pre-embedded="true" dir="ltr" href="https://t.co/8Q8lyB6SjM">pic.twitter.com/8Q8lyB6SjM</a></p>,
 <p class="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text" data-aria-label-part="0" lang="en">InSight sol 221 (2019-07-11) low -99.4ºC (-147.0ºF) high -23.8ºC (-10.9ºF)
 winds from the SSE at 4.1 m/s (9.1 mph) gusting to 14.1 m/s (31.6 mph)
 pressure at 7.60 hPa</p>,
 <p class="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text" data-aria-label-part="0" lang="en">InSight sol 220 (2019-07-10) low -101.2ºC (-150.2ºF) high -25.9ºC (-14.7ºF)
 winds from the SSE at 4.1 m/s (9.3 mph) gusting to 16.0 m/s (35.8 mph)
 pressure at 7.60 hPa</p>,
 <p class="Twee

In [20]:
# Get tweets that begin with 'Sol' which indicate weather tweets
weather_text = 'sol'

for tweet in mars_weather_tweets:
    if weather_text in tweet.text:
        mars_weather = tweet.text
        print(tweet.text)

InSight sol 222 (2019-07-12) low -99.7ºC (-147.5ºF) high -24.8ºC (-12.6ºF)
winds from the SSE at 4.2 m/s (9.4 mph) gusting to 15.6 m/s (34.8 mph)
pressure at 7.60 hPapic.twitter.com/8Q8lyB6SjM
InSight sol 221 (2019-07-11) low -99.4ºC (-147.0ºF) high -23.8ºC (-10.9ºF)
winds from the SSE at 4.1 m/s (9.1 mph) gusting to 14.1 m/s (31.6 mph)
pressure at 7.60 hPa
InSight sol 220 (2019-07-10) low -101.2ºC (-150.2ºF) high -25.9ºC (-14.7ºF)
winds from the SSE at 4.1 m/s (9.3 mph) gusting to 16.0 m/s (35.8 mph)
pressure at 7.60 hPa
InSight sol 219 (2019-07-09) low -100.3ºC (-148.5ºF) high -24.9ºC (-12.8ºF)
winds from the SE at 4.5 m/s (10.0 mph) gusting to 16.1 m/s (36.1 mph)
pressure at 7.60 hPapic.twitter.com/zJkUifSPvt
InSight sol 218 (2019-07-08) low -100.2ºC (-148.4ºF) high -26.1ºC (-15.1ºF)
winds from the SE at 4.6 m/s (10.2 mph) gusting to 16.0 m/s (35.8 mph)
pressure at 7.60 hPa
InSight sol 216 (2019-07-06) low -102.5ºC (-152.5ºF) high -24.9ºC (-12.8ºF)
winds from the SSE at 4.6 m/s (10.


### Mars Facts

#Visit the Mars Facts webpage [here](https://space-facts.com/mars/) 

#And use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

#Use Pandas to convert the data to a HTML table string.





In [21]:
# Url to Mars facts website
mars_facts_url = 'https://space-facts.com/mars/'

In [22]:
# Get table from url
mars_facts_table = pd.read_html(mars_facts_url)
mars_facts_table[0]

Unnamed: 0,Mars - Earth Comparison,Mars,Earth
0,Diameter:,"6,779 km","12,742 km"
1,Mass:,6.39 × 10^23 kg,5.97 × 10^24 kg
2,Moons:,2,1
3,Distance from Sun:,"227,943,824 km","149,598,262 km"
4,Length of Year:,687 Earth days,365.24 days
5,Temperature:,-153 to 20 °C,-88 to 58°C


In [23]:
# Select table
mars_facts_df = mars_facts_table[1]
mars_facts_df

Unnamed: 0,0,1
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 × 10^23 kg (0.11 Earths)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.38 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-87 to -5 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [24]:

mars_facts_df.columns = ["Facts", "Value"]
mars_facts_df.set_index(["Facts"])
mars_facts_df

Unnamed: 0,Facts,Value
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 × 10^23 kg (0.11 Earths)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.38 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-87 to -5 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [25]:
# Print dataframe in html format

print(mars_facts_df.to_html())

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Facts</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Equatorial Diameter:</td>
      <td>6,792 km</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Polar Diameter:</td>
      <td>6,752 km</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Mass:</td>
      <td>6.39 × 10^23 kg (0.11 Earths)</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Moons:</td>
      <td>2 (Phobos &amp; Deimos)</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Orbit Distance:</td>
      <td>227,943,824 km (1.38 AU)</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Orbit Period:</td>
      <td>687 days (1.9 years)</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Surface Temperature:</td>
      <td>-87 to -5 °C</td>
    </tr>
    <tr>
      <th>7</th>
      <td>First Record:</td>
      <td>2nd millennium BC</td>
    </tr>
    <tr>
      <th>8</th>
      <td>

### Mars Hemispheres

* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [26]:
# Use splinter to get image and title links of each hemisphere
usgs_browser = Browser('chrome', headless=False)
usgs_url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
usgs_browser.visit(usgs_url)

In [27]:
# Parse html file with BeautifulSoup
mars_hemispheres_html = usgs_browser.html
mars_hemispheres_soup = BeautifulSoup(mars_hemispheres_html, 'html.parser')

In [28]:
# Print body of html
print(mars_hemispheres_soup.body.prettify())

<body id="results">
 <header>
  <!--
			<h1>Astrogeology Science Center</h1>
-->
  <a href="https://www.usgs.gov/centers/astrogeo-sc" style="float:right;margin-top:10px;">
   <img alt="USGS: Science for a Changing World" class="logo" height="60" src="/images/usgs_logo_main_2x.png"/>
  </a>
  <a href="https://nasa.gov" style="float:right;margin-top:5px;margin-right:20px;">
   <img alt="NASA" class="logo" height="65" src="/images/logos/nasa-logo-web-med.png"/>
  </a>
  <a href="https://pds-imaging.jpl.nasa.gov/" style="float:right;margin-top:5px;margin-right: 10px;">
   <img alt="PDS Cartography and Imaging Science Node" class="logo" height="65" src="/images/pds_logo-invisible-web.png"/>
  </a>
 </header>
 <div class="wrapper">
  <!--
			<nav>
				<a id="nav-toggle" href="#" title="Navigation Menu">Menu</a>
<ul class="dropdown dropdown-horizontal" id="yw0">
<li><a href="/">Home</a></li>
<li><a href="/about">About</a>
<ul>
<li><a href="/about/careers">Careers</a></li>
<li><a href="/contac

In [29]:
# Find hemisphere image link and title
mars_hemispheres = mars_hemispheres_soup.find_all('div', class_='description')
mars_hemispheres

[<div class="description"><a class="itemLink product-item" href="/search/map/Mars/Viking/cerberus_enhanced"><h3>Cerberus Hemisphere Enhanced</h3></a><span class="subtitle" style="float:left">image/tiff 21 MB</span><span class="pubDate" style="float:right"></span><br/><p>Mosaic of the Cerberus hemisphere of Mars projected into point perspective, a view similar to that which one would see from a spacecraft. This mosaic is composed of 104 Viking Orbiter images acquired…</p></div>,
 <div class="description"><a class="itemLink product-item" href="/search/map/Mars/Viking/schiaparelli_enhanced"><h3>Schiaparelli Hemisphere Enhanced</h3></a><span class="subtitle" style="float:left">image/tiff 35 MB</span><span class="pubDate" style="float:right"></span><br/><p>Mosaic of the Schiaparelli hemisphere of Mars projected into point perspective, a view similar to that which one would see from a spacecraft. The images were acquired in 1980 during early northern…</p></div>,
 <div class="description"><a 

In [30]:
# Create list of dictionaries to hold all hemisphere titles and image urls
hemisphere_image_urls = []

# Loop through each link of hemispheres on page
for image in mars_hemispheres:
    hemisphere_url = image.find('a', class_='itemLink')
    hemisphere = hemisphere_url.get('href')
    hemisphere_link = 'https://astrogeology.usgs.gov' + hemisphere
    print(hemisphere_link)

    # Visit each link that you just found (hemisphere_link)
    usgs_browser.visit(hemisphere_link)
    
    # Create dictionary to hold title and image url
    hemisphere_image_dict = {}
    
    # Need to parse html again
    mars_hemispheres_html = usgs_browser.html
    mars_hemispheres_soup = BeautifulSoup(mars_hemispheres_html, 'html.parser')
    
    # Get image link
    hemisphere_link = mars_hemispheres_soup.find('a', text='Original').get('href')
    
    # Get title text
    hemisphere_title = mars_hemispheres_soup.find('h2', class_='title').text.replace(' Enhanced', '')
    
    # Append title and image urls of hemisphere to dictionary
    hemisphere_image_dict['title'] = hemisphere_title
    hemisphere_image_dict['img_url'] = hemisphere_link
    
    # Append dictionaries to list
    hemisphere_image_urls.append(hemisphere_image_dict)

print(hemisphere_image_urls)

https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced
[{'title': 'Cerberus Hemisphere', 'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif'}, {'title': 'Schiaparelli Hemisphere', 'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif'}, {'title': 'Syrtis Major Hemisphere', 'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif'}, {'title': 'Valles Marineris Hemisphere', 'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif'}]


In [60]:
# Convert this jupyter notebook file to a python script called 'scrape_mars.py'
! jupyter nbconvert --to script --template basic mission_to_mars.ipynb --output scrape_mars



[NbConvertApp] Converting notebook mission_to_mars.ipynb to script
[NbConvertApp] Writing 8046 bytes to scrape_mars.py
