![Mission to Mars](https://www.intrepidmuseum.org/The-Intrepid-Experience/Past-Exhibitions/Mission-to-Mars-(Mars-Rover-exhibit)/images/mars_banner.aspx?width=676&height=268)

In [1]:
# Import the necessary libraries
# Beautiful Soup is a Python library for pulling data out of HTML files
from bs4 import BeautifulSoup 
# Requests library is the de facto standard for making HTTP requests in Python
import requests
# Splinter is an open source tool for testing web applications using Python
from splinter import Browser
from splinter.exceptions import ElementDoesNotExist
import time

## NASA Mars News

In [2]:
# Instantiate browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False) # otherwise defaults to FireFox

In [3]:
# URL of page to be scraped
url_to_scrape = "https://mars.nasa.gov/news/"
# Visit the url using browser.visit method
browser.visit(url_to_scrape)
# Time delay of 2 sec to make sure the browser loads
time.sleep(2)

In [4]:
html = browser.html
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(html, 'html.parser')

In [5]:
# Inspecting the title of the web page
print('\033[1m'+"Webpage Title: \n {}".format(soup.title.string) + '\033[0m')

[1mWebpage Title: 
 News  – NASA’s Mars Exploration Program [0m


### Scraping the first five news titles

In [6]:
# Inspecting the website we see that news title are wrapped as <div class="content_title">
results = soup.find_all('div', class_="content_title", limit = 5)

In [7]:
# Let's define lists to hold titles & links
titles = []
# Loop through returned results
for result in results:
    # Error handling
    try:
        # Identify and return title of listing
        title = result.find('a').text
        # Print results only if title is available
        if (title):
            print("--------------------------------------------------------")
            print(title)
            titles.append(title) # Append the titles list            
    except AttributeError as e:
        print(e)

'NoneType' object has no attribute 'text'
--------------------------------------------------------
NASA's MAVEN Observes Martian Night Sky Pulsing in Ultraviolet Light
--------------------------------------------------------
8 Martian Postcards to Celebrate Curiosity's Landing Anniversary
--------------------------------------------------------
NASA, ULA Launch Mars 2020 Perseverance Rover Mission to Red Planet
--------------------------------------------------------
NASA's Perseverance Rover Will Carry First Spacesuit Materials to Mars


In [8]:
print('\033[1m'+"The first title: \n {}".format(titles[0]) + '\033[0m') 

[1mThe first title: 
 NASA's MAVEN Observes Martian Night Sky Pulsing in Ultraviolet Light[0m


### Scraping the first five paragraph texts

In [9]:
# Inspecting the website we see that paragraph text are wrapped as <div class="article_teaser_body">
results = soup.find_all('div', class_="article_teaser_body", limit = 5)

In [10]:
# Let's define lists to hold paragraph texts
texts = []
# Loop through returned results
for result in results:
    # Error handling
    try:
        # Identify and return the paragraph
        news_p = result.text # Append the paragraph list
        # Print results only if paragraph is available
        if news_p :
            print("--------------------------------------------------------")
            print(news_p)
            texts.append(news_p)
    except AttributeError as e:
        print(e)

--------------------------------------------------------
Vast areas of the Martian night sky pulse in ultraviolet light, according to images from NASA’s MAVEN spacecraft. The results are being used to illuminate complex circulation patterns in the Martian atmosphere.
--------------------------------------------------------
The NASA rover touched down eight years ago, on Aug. 5, 2012, and will soon be joined by a second rover, Perseverance.
--------------------------------------------------------
The agency's Mars 2020 mission is on its way. It will land at Jezero Crater in about seven months, on Feb. 18, 2021. 
--------------------------------------------------------
In a Q&A, spacesuit designer Amy Ross explains how five samples, including a piece of helmet visor, will be tested aboard the rover, which is targeting a July 30 launch. 
--------------------------------------------------------
With a targeted launch date of July 30, the next robotic scientist NASA is sending to the to the

In [11]:
print('\033[1m'+"The first paragraph: \n {}".format(texts[0]) + '\033[0m') 

[1mThe first paragraph: 
 Vast areas of the Martian night sky pulse in ultraviolet light, according to images from NASA’s MAVEN spacecraft. The results are being used to illuminate complex circulation patterns in the Martian atmosphere.[0m


In [12]:
# Closing browser using browser.quit:
browser.quit()

## JPL Mars Space Images - Featured Image

In [13]:
# Instantiate browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False) 

In [14]:
# URL of page to be scraped
url_to_scrape = "https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars"
# Visit the url using browser.visit method
response = browser.visit(url_to_scrape)
# Time delay of 2 sec to make sure the browser loads
time.sleep(2)
button = browser.find_by_id("full_image")
button.click()

In [15]:
html = browser.html
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(html, 'html.parser')

In [16]:
# Inspecting the title of the web page
print('\033[1m'+"Webpage Title: \n {}".format(soup.title.string) + '\033[0m')

[1mWebpage Title: 
 Space Images[0m


### Featured image url

In [17]:
# Uncomment the following line if cssutils is not installed
#!pip install cssutils
import cssutils
# Inspecting the website we see that background image is wrapped as <article class="carousel_item">
article_style = soup.find('article')['style']
url = cssutils.parseStyle(article_style)['background-image']
print(url)

url(/spaceimages/images/wallpaper/PIA14106-1920x1200.jpg)


In [18]:
# Remove extra stuff from the url
url = url.replace('url(','').replace(')','')
# Combine with base_url to create image url
base_url = 'https://www.jpl.nasa.gov'
# Create the url for the background image
image_url = base_url + url
print('\033[1m'+"Featured image url: \n {}".format(image_url) + '\033[0m') # Display the url

[1mFeatured image url: 
 https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA14106-1920x1200.jpg[0m


In [19]:
import requests
import shutil
from IPython.display import Image
print(image_url)
# Using the requests library to download and save the image from the featured image url
response = requests.get(image_url, stream=True)
with open('Screenshots/Featured_image.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
# Display the image with IPython.display
Image(url='Screenshots/Featured_image.png')

https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA14106-1920x1200.jpg


In [20]:
# Closing browser using browser.quit:
browser.quit()

## Mars Weather

In [21]:
# Instantiate browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False) 

In [22]:
# URL of page to be scraped
url_to_scrape = "https://twitter.com/marswxreport?lang=en"
# Visit the url using browser.visit method
browser.visit(url_to_scrape)
# Time delay of 2 sec to make sure the browser loads
time.sleep(2)

In [23]:
html = browser.html
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(html, 'html.parser')

In [24]:
# Inspecting the title of the web page
print('\033[1m'+"Webpage Title: \n {}".format(soup.title.string) + '\033[0m')

[1mWebpage Title: 
 Mars Weather (@MarsWxReport) / Twitter[0m


### Scraping the first five tweets

In [25]:
results = soup.find_all('div',attrs={"data-testid":"tweet"}, limit = 5)

In [26]:
# Let's define a list to hold the tweets
tweets = []
# Loop through returned results
for result in results:
    # Error handling
    try:
        # Identify and return the tweet
        tweet = result.text
        tweets.append(tweet) # Append to the list
        # Print results only if tweet is available
        if tweet :
            print("--------------------------------------------------------")
            print(tweet)
    except AttributeError as e:
        print(e)

--------------------------------------------------------
Mars Weather@MarsWxReport·12hInSight sol 602 (2020-08-05) low -91.5ºC (-132.6ºF) high -9.5ºC (14.9ºF)
winds from the W at 5.4 m/s (12.2 mph) gusting to 17.0 m/s (38.1 mph)
pressure at 7.90 hPa221
--------------------------------------------------------
Mars Weather@MarsWxReport·Aug 5InSight sol 601 (2020-08-05) low -91.6ºC (-132.9ºF) high -10.6ºC (12.9ºF)
winds from the W at 6.0 m/s (13.4 mph) gusting to 16.0 m/s (35.7 mph)
pressure at 7.80 hPa1416
--------------------------------------------------------
Mars Weather@MarsWxReport·Aug 5InSight sol 600 (2020-08-03) low -107.6ºC (-161.7ºF) high -5.7ºC (21.7ºF)
winds from the W at 5.6 m/s (12.5 mph) gusting to 15.2 m/s (34.0 mph)
pressure at 7.90 hPa618
--------------------------------------------------------
Mars Weather@MarsWxReport·Aug 4InSight sol 599 (2020-08-02) low -91.8ºC (-133.2ºF) high -42.6ºC (-44.8ºF)
winds from the WNW at 5.1 m/s (11.5 mph) gusting to 15.6 m/s (34.8 mph)

In [27]:
# Let's sure that the first tweet contains weather information
for tweet in tweets:
    if 'sol' and 'low' in tweet:
        first_tweet = tweet
        break
    else:
        pass

In [28]:
# Remove the headers from the first tweet
first_tweet = 'I'+ first_tweet.split('I')[1]

In [29]:
print('\033[1m'+"The first tweet: \n {}".format(first_tweet) + '\033[0m') 

[1mThe first tweet: 
 InSight sol 602 (2020-08-05) low -91.5ºC (-132.6ºF) high -9.5ºC (14.9ºF)
winds from the W at 5.4 m/s (12.2 mph) gusting to 17.0 m/s (38.1 mph)
pressure at 7.90 hPa221[0m


In [30]:
# Closing browser using browser.quit:
browser.quit()

## Mars Facts

In [31]:
# Instantiate browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False) 

In [32]:
# URL of page to be scraped
url_to_scrape = "https://space-facts.com/mars/"
# Visit the url using browser.visit method
browser.visit(url_to_scrape)
# Time delay of 2 sec to make sure the browser loads
time.sleep(2)

In [33]:
html = browser.html
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(html, 'html.parser')

In [34]:
# Inspecting the title of the web page
print('\033[1m'+"Webpage Title: \n {}".format(soup.title.string) + '\033[0m')

[1mWebpage Title: 
 Mars Facts - Interesting Facts about Planet Mars[0m


In [35]:
# Inspecting the website we see that table is wrapped as <table class="content_title">
mars_table = soup.find('table', attrs={"id": "tablepress-p-mars"})

In [36]:
# Let's check the rows in mars_table
table_rows = mars_table.find_all('tr')
print('\033[1m'+"Total number of rows in the table: {}".format(len(table_rows))+'\033[0m')

[1mTotal number of rows in the table: 9[0m


In [37]:
import numpy as np
import pandas as pd 
# Let's check the elements in mars_table
table_elements = mars_table.find_all('td')
# Initiate an array row_values
row_values = []
# Fill the array row_values
for rows in table_rows:
     data = rows.find_all('td') # finding the elements in each row
     values = [rows.text.strip() for rows in data if rows.text.strip()]
     if values:
        row_values.append(values) # Adding elements
# Initiate column names for the dataframe
column_names = ["Mars Facts", "Value"]
# Create the initial pandas dataframe 
mars_df = pd.DataFrame(row_values, columns=column_names)
mars_df

Unnamed: 0,Mars Facts,Value
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 × 10^23 kg (0.11 Earths)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.38 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-87 to -5 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [38]:
# Create html table
html_table = mars_df.to_html(classes='table table-striped',index=False)
# Uncomment the following line to see the html_table
#print(html_table)

In [39]:
# Closing browser using browser.quit:
browser.quit()

## Mars Hemispheres

In [40]:
# Instantiate browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False) 

In [41]:
# Base url for the webpage
base_url = "https://astrogeology.usgs.gov"
# URL of page to be scraped
url_to_scrape = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"
# Visit the url using browser.visit method
browser.visit(url_to_scrape)
# Time delay of 2 sec to make sure the browser loads
time.sleep(2)

In [42]:
html = browser.html
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(html, 'html.parser')

In [43]:
# Inspecting the title of the web page
print('\033[1m'+"Webpage Title: \n {}".format(soup.title.string) + '\033[0m')

[1mWebpage Title: 
 Astropedia Search Results | USGS Astrogeology Science Center[0m


In [44]:
# Let's check the number of links in the webpage
Number_of_links = soup.find_all('a')
print('\033[1m'+"Total number of links in the webpage: {}".format(len(Number_of_links)) + '\033[0m')

[1mTotal number of links in the webpage: 24[0m


In [45]:
# Inspecting the website we see that images are wrapped in <div class="item">
results = soup.find_all('div', class_="item")

### Title and links to the Mars Hemisphere images

In [46]:
# Define lists to store the title and urls
titles = []
link_urls = []
# Loop through returned results
for result in results:
    # Error handling
    try:
        # Identify and return text of image headline
        title = result.find('h3').text
        # Identify and return link of image
        link = result.find('a', class_='itemLink product-item')['href']
        # Create image url by appending to the base url
        image_url = base_url + link
        # Print results only if title and link are available
        if (title and link):
            print("--------------------------------------------------------")
            print(title)
            titles.append(title)
            print("--------------------------------------------------------")
            print(image_url)
            link_urls.append(image_url)
            
    except AttributeError as e:
        print(e)

--------------------------------------------------------
Cerberus Hemisphere Enhanced
--------------------------------------------------------
https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
--------------------------------------------------------
Schiaparelli Hemisphere Enhanced
--------------------------------------------------------
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
--------------------------------------------------------
Syrtis Major Hemisphere Enhanced
--------------------------------------------------------
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
--------------------------------------------------------
Valles Marineris Hemisphere Enhanced
--------------------------------------------------------
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced


### Mars Hemisphere Images

In [47]:
import time
# Create a list to hold the image urls
title_urls = []
for url in link_urls:
    # Visit the url using browser.visit method
    browser.visit(url)
    # Instantiate button click
    button = browser.find_by_id("wide-image-toggle")
    button.click()
    # Set delay for 1s to make sure the webpage loads correctly
    time.sleep(1)
    # Visit the url with wide-image
    html = browser.html
    # Create BeautifulSoup object; parse with 'html.parser'
    soup = BeautifulSoup(html, 'html.parser')
    # Find the url for the wide-image
    img_url = soup.find('img',class_="wide-image")['src']
    # Combine with the base_url to create the correct url
    image_url = base_url + img_url
    print(image_url)
    # Append the list
    title_urls.append(image_url)

https://astrogeology.usgs.gov/cache/images/f5e372a36edfa389625da6d0cc25d905_cerberus_enhanced.tif_full.jpg
https://astrogeology.usgs.gov/cache/images/3778f7b43bbbc89d6e3cfabb3613ba93_schiaparelli_enhanced.tif_full.jpg
https://astrogeology.usgs.gov/cache/images/555e6403a6ddd7ba16ddb0e471cadcf7_syrtis_major_enhanced.tif_full.jpg
https://astrogeology.usgs.gov/cache/images/b3c7c6c9138f57b4756be9b9c43e3a48_valles_marineris_enhanced.tif_full.jpg


In [48]:
# Lets create a list of dictionaries
hemisphere_urls = [] 
item_dict = {} 
for i in range(len(titles)):
    item_dict["title"] = titles[i]
    item_dict["img_url"] = title_urls[i]
    hemisphere_urls.append(item_dict.copy())
# Display the list of dictionaries
print(hemisphere_urls)

[{'title': 'Cerberus Hemisphere Enhanced', 'img_url': 'https://astrogeology.usgs.gov/cache/images/f5e372a36edfa389625da6d0cc25d905_cerberus_enhanced.tif_full.jpg'}, {'title': 'Schiaparelli Hemisphere Enhanced', 'img_url': 'https://astrogeology.usgs.gov/cache/images/3778f7b43bbbc89d6e3cfabb3613ba93_schiaparelli_enhanced.tif_full.jpg'}, {'title': 'Syrtis Major Hemisphere Enhanced', 'img_url': 'https://astrogeology.usgs.gov/cache/images/555e6403a6ddd7ba16ddb0e471cadcf7_syrtis_major_enhanced.tif_full.jpg'}, {'title': 'Valles Marineris Hemisphere Enhanced', 'img_url': 'https://astrogeology.usgs.gov/cache/images/b3c7c6c9138f57b4756be9b9c43e3a48_valles_marineris_enhanced.tif_full.jpg'}]


In [49]:
import requests
import shutil
from IPython.display import Image
# Use the requests library to download and save the image from the first url
response = requests.get(title_urls[0], stream=True)
with open('Screenshots/Cerberus.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
# Display the image with IPython.display
Image(url='Screenshots/Cerberus.png')

In [50]:
# Use the requests library to download and save the image from the first url
response = requests.get(title_urls[1], stream=True)
with open('Screenshots/Schiaparelli.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
# Display the image with IPython.display
Image(url='Screenshots/Schiaparelli.png')

In [51]:
# Use the requests library to download and save the image from the first url
response = requests.get(title_urls[2], stream=True)
with open('Screenshots/Syrtis_major.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
# Display the image with IPython.display
Image(url='Screenshots/Syrtis_major.png')

In [52]:
# Use the requests library to download and save the image from the first url
response = requests.get(title_urls[3], stream=True)
with open('Screenshots/Valles_marineris.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
# Display the image with IPython.display
Image(url='Screenshots/Valles_marineris.png')

In [53]:
# Closing browser using browser.quit:
browser.quit()

## End of notebook