# Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.


Create a Jupyter Notebook file called mission_to_mars.ipynb and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.



## NASA Mars News


Scrape the NASA Mars News Site and collect the latest News Title and Paragragh Text. Assign the text to variables that you can reference later.

JPL Mars Space Images - Featured Image


Visit the url for JPL's Featured Space Image here.
Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url.
Make sure to find the image url to the full size .jpg image.
Make sure to save a complete url string for this image.

## Mars Weather


Visit the Mars Weather twitter account here and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called mars_weather.


## Mars Facts


Visit the Mars Facts webpage here and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
Use Pandas to convert the data to a HTML table string.



## Mars Hemisperes


Visit the USGS Astrogeology site here to obtain high resolution images for each of Mar's hemispheres.
You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.
Save both the image url string for the full resolution hemipshere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.
Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [1]:
# https://splinter.readthedocs.io/en/latest/drivers/chrome.html
from splinter import Browser
from bs4 import BeautifulSoup
import pandas as pd
import time

In [2]:
# Mars NEWs
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)
url = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'
browser.visit(url)

In [3]:
# HTML object
html = browser.html
# Parse HTML with Beautiful Soup
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all elements that contain book information
articles = soup.find_all('div', class_='list_text')

# Iterate through each news
for article in articles:
        
        # Use Beautiful Soup's find() method to navigate and retrieve attributes
        # Collect the latest News Title and Paragragh Text. Assign the text to variables that can be referenced
        news_p = article.find ('div', class_='article_teaser_body').text
    
        link = article.find('a')
        href = link['href']
        news_title = link.text
        print('-----------')
        print(news_title)
        print('...........')
        print(news_p)
        print('https://mars.nasa.gov/' + href)

-----------
NASA’s First Mission to Study the Interior of Mars Awaits May 5 Launch
...........
All systems are go for NASA’s next launch to the Red Planet. 
https://mars.nasa.gov//news/8332/nasas-first-mission-to-study-the-interior-of-mars-awaits-may-5-launch/
-----------
Vice President Pence Visits JPL, Previews NASA’s Next Mars Mission Launch
...........
A week before NASA's next Mars launch, Vice President Mike Pence toured the birthplace of the InSight Mars Lander and numerous other past, present and future space missions.
https://mars.nasa.gov//news/8331/vice-president-pence-visits-jpl-previews-nasas-next-mars-mission-launch/
-----------
NASA Sets Sights on May 5 Launch of InSight to Mars
...........
NASA’s next mission to Mars, InSight, is scheduled to launch Saturday, May 5, on a first-ever mission to study the heart of the Red Planet.
https://mars.nasa.gov//news/8330/nasa-sets-sights-on-may-5-launch-of-insight-to-mars/
-----------
Results of Heat Shield Testing
...........
A po

In [4]:
# Use splinter to navigate JPL and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url 
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

In [5]:
url ='https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url)

In [6]:
# browser.click_link_by_id('full image')
browser.click_link_by_partial_text('FULL IMAGE')
time.sleep(5)

In [7]:
browser.click_link_by_partial_text('more info')

In [8]:
link = browser.find_link_by_partial_href("/spaceimages/images/largesize/")

In [9]:
featured_image_url = link["href"]
featured_image_url

'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA18904_hires.jpg'

In [10]:
# Mars Weather Twitter account
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)
url = 'https://twitter.com/marswxreport'
browser.visit(url)

In [11]:
# HTML object
html = browser.html
# Parse HTML with Beautiful Soup
soup = BeautifulSoup(html, 'html.parser')

In [12]:
# Scrape the latest Mars weather tweet from the page. 
# Save the tweet text for the weather report as a variable called mars_weather.
news = soup.find_all('div', class_= "js-tweet-text-container")
for a_news in news:   
    mars_weather = a_news.find('p', class_= "TweetTextSize TweetTextSize--normal js-tweet-text tweet-text").text
    print('-----------')
    print(mars_weather)

-----------
Sol 2037 (April 30, 2018), Sunny, high -2C/28F, low -75C/-103F, pressure at 7.25 hPa, daylight 05:24-17:20
-----------
Sol 2036 (April 29, 2018), Sunny, high -5C/23F, low -72C/-97F, pressure at 7.28 hPa, daylight 05:24-17:20
-----------
Sol 2033 (April 25, 2018), Sunny, high -10C/14F, low -71C/-95F, pressure at 7.23 hPa, daylight 05:24-17:20
-----------
Sol 2030 (April 22, 2018), Sunny, high -4C/24F, low -73C/-99F, pressure at 7.21 hPa, daylight 05:25-17:21
-----------
Sol 2029 (April 21, 2018), Sunny, high -11C/12F, low -72C/-97F, pressure at 7.22 hPa, daylight 05:25-17:21
-----------
Sol 2026 (April 18, 2018), Sunny, high -6C/21F, low -73C/-99F, pressure at 7.19 hPa, daylight 05:26-17:21
-----------
Sol 2024 (April 16, 2018), Sunny, high -7C/19F, low -76C/-104F, pressure at 7.20 hPa, daylight 05:26-17:21
-----------
Sol 2022 (April 14, 2018), Sunny, high -4C/24F, low -73C/-99F, pressure at 7.19 hPa, daylight 05:27-17:21
-----------
Sol 2019 (April 11, 2018), Sunny, high -

In [13]:
# Visit the Mars Facts webpage here and use Pandas to scrape the table containing facts
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)
url = 'https://space-facts.com/mars/'
# Use Pandas to convert the data to a HTML table string`
tables = pd.read_html(url)
tables

[                      0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.42 x 10^23 kg (10.7% Earth)
 3                Moons:            2 (Phobos & Deimos)
 4       Orbit Distance:       227,943,824 km (1.52 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                  -153 to 20 °C
 7         First Record:              2nd millennium BC
 8          Recorded By:           Egyptian astronomers]

In [14]:
df = tables [0]
df

Unnamed: 0,0,1
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.42 x 10^23 kg (10.7% Earth)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.52 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-153 to 20 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [15]:
html = df.to_html(index=False)
html

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th>0</th>\n      <th>1</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>Equatorial Diameter:</td>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <td>Polar Diameter:</td>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <td>Mass:</td>\n      <td>6.42 x 10^23 kg (10.7% Earth)</td>\n    </tr>\n    <tr>\n      <td>Moons:</td>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <td>Orbit Distance:</td>\n      <td>227,943,824 km (1.52 AU)</td>\n    </tr>\n    <tr>\n      <td>Orbit Period:</td>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <td>Surface Temperature:</td>\n      <td>-153 to 20 °C</td>\n    </tr>\n    <tr>\n      <td>First Record:</td>\n      <td>2nd millennium BC</td>\n    </tr>\n    <tr>\n      <td>Recorded By:</td>\n      <td>Egyptian astronomers</td>\n    </tr>\n  </tbody>\n</table>'

In [16]:
fact_html = html.replace("\n", "")
fact_html

'<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>0</th>      <th>1</th>    </tr>  </thead>  <tbody>    <tr>      <td>Equatorial Diameter:</td>      <td>6,792 km</td>    </tr>    <tr>      <td>Polar Diameter:</td>      <td>6,752 km</td>    </tr>    <tr>      <td>Mass:</td>      <td>6.42 x 10^23 kg (10.7% Earth)</td>    </tr>    <tr>      <td>Moons:</td>      <td>2 (Phobos &amp; Deimos)</td>    </tr>    <tr>      <td>Orbit Distance:</td>      <td>227,943,824 km (1.52 AU)</td>    </tr>    <tr>      <td>Orbit Period:</td>      <td>687 days (1.9 years)</td>    </tr>    <tr>      <td>Surface Temperature:</td>      <td>-153 to 20 °C</td>    </tr>    <tr>      <td>First Record:</td>      <td>2nd millennium BC</td>    </tr>    <tr>      <td>Recorded By:</td>      <td>Egyptian astronomers</td>    </tr>  </tbody></table>'

In [19]:
# Visit the USGS Astrogeology site to obtain high resolution images for each of Mar's hemispheres.
# Define a function to get img_url and title
def hemisphere_info():
    executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
    browser = Browser('chrome', **executable_path, headless=False)
    url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
    browser.visit(url)
    
    # HTML object
    html = browser.html 
    # Parse HTML with Beautiful Soup
    soup = BeautifulSoup(html, 'html.parser')
    individual_links= soup.find_all('div', class_= "description")
    
    title_list =[]
    link_list = []
    
    # Make a for loop to get hemisphere title
    for individual_link in individual_links:
        link = individual_link.a["href"]
        href = "https://astrogeology.usgs.gov/" + link
        link_list.append(href)
    
        title= individual_link.h3.text
        clean_title = title.replace("Enhanced", "")
        title_list.append(clean_title)
    
    hemisphere_img_urls = []
    dic ={}
    
    # Make a for loop to get hemisphere_img_url 
    for i in range (len(link_list)):
            executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
            browser = Browser('chrome', **executable_path, headless=False)
            url = link_list[i]
            browser.visit(url)
            image_link = browser.find_link_by_text("Sample")
            full_image = image_link["href"]
            
            # Create a render list of dictionary for future html
            
            
            dic= {
                  "title_list": title_list[i],
                  "hemisphere_img_url" : full_image,
                 }
            hemisphere_img_urls.append(dic)
    
    return hemisphere_img_urls

In [20]:
# Assign a variable, image_urls, to the function, hemisphere_info
hemisphere_img_urls=hemisphere_info()
hemisphere_img_urls

[{'hemisphere_img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg',
  'title_list': 'Cerberus Hemisphere '},
 {'hemisphere_img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg',
  'title_list': 'Schiaparelli Hemisphere '},
 {'hemisphere_img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg',
  'title_list': 'Syrtis Major Hemisphere '},
 {'hemisphere_img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg',
  'title_list': 'Valles Marineris Hemisphere '}]

# Step 2 - MongoDB and Flask Application

Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.


Start by converting your Jupyter notebook into a Python script called scrape_mars.py with a function called scrape that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

Next, create a route called /scrape that will import your scrape_mars.py script and call your scrape function.


Store the return value in Mongo as a Python dictionary.


Create a root route / that will query your Mongo database and pass the mars data into an HTML template to display the data.
Create a template HTML file called index.html that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.