# Web Scraping - Mission to Mars

## NASA Mars News

In [40]:
from bs4 import BeautifulSoup as bs
import requests
import pymongo

In [11]:
conn = "mongodb://localhost:27017"
client = pymongo.MongoClient(conn)

In [12]:
#Defining database and collection
db = client.mars_db
collection = db.articles

In [13]:
# URL of page to be scraped
news_url = "https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest"


In [29]:
response = requests.get(news_url)
soup = bs(response.text, "html")


In [28]:
results = soup.find_all("div", class_ = "slide")
results

[<div class="slide">
 <div class="image_and_description_container">
 <a href="/news/8695/the-launch-is-approaching-for-nasas-next-mars-rover-perseverance/">
 <div class="rollover_description">
 <div class="rollover_description_inner">
 The Red Planet's surface has been visited by eight NASA spacecraft. The ninth will be the first that includes a roundtrip ticket in its flight plan. 
 </div>
 <div class="overlay_arrow">
 <img alt="More" src="/assets/overlay-arrow.png"/>
 </div>
 </div>
 <img alt="The Launch Is Approaching for NASA's Next Mars Rover, Perseverance" class="img-lazy" data-lazy="/system/news_items/list_view_images/8695_24732_PIA23499-226.jpg" src="/assets/loading_320x240.png"/>
 </a>
 </div>
 <div class="content_title">
 <a href="/news/8695/the-launch-is-approaching-for-nasas-next-mars-rover-perseverance/">
 The Launch Is Approaching for NASA's Next Mars Rover, Perseverance
 </a>
 </div>
 </div>, <div class="slide">
 <div class="image_and_description_container">
 <a href="/n

In [31]:
# loop over results to get news data
for result in results:
    # scrape the article title and the paragraph
    news_title = result.find('div', class_='content_title').text    
    news_p = result.find('div', class_='rollover_description_inner').text

    # print article data
    print('-----------------')
    print(news_title)
    print(news_p)

    # Dictionary to be inserted into MongoDB
    post = {
        "title": news_title,
        "paragraph": news_p
    }
    
    # Inserting dictionary into MongoDB as a document
    collection.insert_one(post)

-----------------


The Launch Is Approaching for NASA's Next Mars Rover, Perseverance



The Red Planet's surface has been visited by eight NASA spacecraft. The ninth will be the first that includes a roundtrip ticket in its flight plan. 

-----------------


NASA to Hold Mars 2020 Perseverance Rover Launch Briefing



Learn more about the agency's next Red Planet mission during a live event on June 17.

-----------------


Alabama High School Student Names NASA's Mars Helicopter



Vaneeza Rupani's essay was chosen as the name for the small spacecraft, which will mark NASA's first attempt at powered flight on another planet.

-----------------


Mars Helicopter Attached to NASA's Perseverance Rover



The team also fueled the rover's sky crane to get ready for this summer's history-making launch.

-----------------


NASA's Perseverance Mars Rover Gets Its Wheels and Air Brakes



After the rover was shipped from JPL to Kennedy Space Center, the team is getting closer to finalizing t

In [32]:
# Displaying articles
articles = db.articles.find()
for article in articles:
    print(article)

{'_id': ObjectId('5ef27fed2696e10300f23811'), 'title': "\n\nThe Launch Is Approaching for NASA's Next Mars Rover, Perseverance\n\n", 'paragraph': "\nThe Red Planet's surface has been visited by eight NASA spacecraft. The ninth will be the first that includes a roundtrip ticket in its flight plan. \n"}
{'_id': ObjectId('5ef27fed2696e10300f23812'), 'title': '\n\nNASA to Hold Mars 2020 Perseverance Rover Launch Briefing\n\n', 'paragraph': "\nLearn more about the agency's next Red Planet mission during a live event on June 17.\n"}
{'_id': ObjectId('5ef27fed2696e10300f23813'), 'title': "\n\nAlabama High School Student Names NASA's Mars Helicopter\n\n", 'paragraph': "\nVaneeza Rupani's essay was chosen as the name for the small spacecraft, which will mark NASA's first attempt at powered flight on another planet.\n"}
{'_id': ObjectId('5ef27fed2696e10300f23814'), 'title': "\n\nMars Helicopter Attached to NASA's Perseverance Rover\n\n", 'paragraph': "\nThe team also fueled the rover's sky crane

## JPL Mars Space Images - Featured image

In [10]:
from splinter import Browser

In [11]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

In [12]:
browser.visit("https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars")

In [13]:
browser.click_link_by_partial_text("FULL IMAGE")
browser.click_link_by_partial_text("more info")



In [14]:
html_image = browser.html
soup_image = bs(html_image, "lxml")
image = soup_image.find_all("img", class_ = "main_image")
src = image[0]["src"]

In [15]:
base_url = "https://www.jpl.nasa.gov"
featured_image_url = base_url + src

print("The URL of the largest size featured image is:")
print(featured_image_url)

The URL of the largest size featured image is:
https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA17832_hires.jpg


## Mars Weather

In [6]:
from bs4 import BeautifulSoup as bs
import requests

In [7]:
# URL of page to be scraped
weather_url = "https://twitter.com/marswxreport?lang=en"

In [8]:
response_w = requests.get(weather_url)
soup_w = bs(response_w.text, "html")

In [9]:
results_w = soup_w.find_all("span")
results_w

[<span class="css-901oao css-16my406" dir="auto">Something went wrong, but don‚Äôt fret ‚Äî let‚Äôs give it another shot.</span>]

In [5]:
soup_w

<!DOCTYPE html>
<html dir="ltr" lang="en">
<head><meta charset="utf-8"/>
<meta content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0,viewport-fit=cover" name="viewport"/>
<link href="//abs.twimg.com" rel="preconnect"/>
<link href="//api.twitter.com" rel="preconnect"/>
<link href="//pbs.twimg.com" rel="preconnect"/>
<link href="//t.co" rel="preconnect"/>
<link href="//video.twimg.com" rel="preconnect"/>
<link href="//abs.twimg.com" rel="dns-prefetch"/>
<link href="//api.twitter.com" rel="dns-prefetch"/>
<link href="//pbs.twimg.com" rel="dns-prefetch"/>
<link href="//t.co" rel="dns-prefetch"/>
<link href="//video.twimg.com" rel="dns-prefetch"/>
<link as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/web/polyfills.675e3184.js" nonce="MzM1ZGVjM2EtNDI5MC00MWU1LTljMWEtOWY3ZGJiNjM1ZmU3" rel="preload"/>
<link as="script" crossorigin="anonymous" href="https://abs.twimg.com/responsive-web/web/vendors~main.805584a4.js" nonce="MzM1ZGVjM2EtNDI5MC00

In [None]:
mars_weather = ""

## Mars Facts

In [26]:
import pandas as pd

In [27]:
url = "https://space-facts.com/mars/"

In [28]:
tables = pd.read_html(url)
data_table = tables[0]

In [29]:
data_table

Unnamed: 0,0,1
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.39 √ó 10^23 kg (0.11 Earths)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.38 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-87 to -5 ¬∞C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [32]:
data_table.to_html("fact_table.html", index = False)

None


## Mars Hemispheres

In [17]:
from bs4 import BeautifulSoup as bs
import requests

In [18]:
# URL's 
base_url = "https://astrogeology.usgs.gov"
hemis_url = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"

In [19]:
response = requests.get(hemis_url)
soup = bs(response.text, "html")

In [20]:
# Getting the links of each image
image_url = []
images = soup.find_all("a", class_ = "itemLink product-item")

for link in images:
    image_url.append(link.get('href'))

In [21]:
# Array to store names and the url's of each hemisphere
hemisphere_image_urls = []

# Getting the name and url's of the four images
for i in range(len(images)):
    page_2 = base_url + image_url[i]
    
    # Going to the image webpage
    response_2 = requests.get(page_2)
    soup_2 = bs(response_2.text, "html")
    
    # Getting the name of each hemisphere
    name = soup_2.find("h2").text
    name = name.rsplit(' ', 1)[0]
    
    # Getting the image url
    image_url_2 = soup_2.find("img", class_ = "wide-image")
    full_image_url = base_url + image_url_2["src"]
    
    # Storing the dictionary of name and link in an array
    temp_dict = {"name": name, "link": full_image_url}
    hemisphere_image_urls.append(temp_dict)


In [25]:
print("The list of names and url's pf each marcian hemisphere is")
hemisphere_image_urls

The list of names and url's pf each marcian hemisphere is


[{'name': 'Cerberus Hemisphere',
  'link': 'https://astrogeology.usgs.gov/cache/images/f5e372a36edfa389625da6d0cc25d905_cerberus_enhanced.tif_full.jpg'},
 {'name': 'Schiaparelli Hemisphere',
  'link': 'https://astrogeology.usgs.gov/cache/images/3778f7b43bbbc89d6e3cfabb3613ba93_schiaparelli_enhanced.tif_full.jpg'},
 {'name': 'Syrtis Major Hemisphere',
  'link': 'https://astrogeology.usgs.gov/cache/images/555e6403a6ddd7ba16ddb0e471cadcf7_syrtis_major_enhanced.tif_full.jpg'},
 {'name': 'Valles Marineris Hemisphere',
  'link': 'https://astrogeology.usgs.gov/cache/images/b3c7c6c9138f57b4756be9b9c43e3a48_valles_marineris_enhanced.tif_full.jpg'}]

In [33]:
a = [{'title': "\n\nThe Launch Is Approaching for NASA's Next Mars Rover, Perseverance\n\n", 'paragraph': "\nThe Red Planet's surface has been visited by eight NASA spacecraft. The ninth will be the first that includes a roundtrip ticket in its flight plan. \n"}, {'title': '\n\nNASA to Hold Mars 2020 Perseverance Rover Launch Briefing\n\n', 'paragraph': "\nLearn more about the agency's next Red Planet mission during a live event on June 17.\n"}, {'title': "\n\nAlabama High School Student Names NASA's Mars Helicopter\n\n", 'paragraph': "\nVaneeza Rupani's essay was chosen as the name for the small spacecraft, which will mark NASA's first attempt at powered flight on another planet.\n"}, {'title': "\n\nMars Helicopter Attached to NASA's Perseverance Rover\n\n", 'paragraph': "\nThe team also fueled the rover's sky crane to get ready for this summer's history-making launch.\n"}, {'title': "\n\nNASA's Perseverance Mars Rover Gets Its Wheels and Air Brakes\n\n", 'paragraph': '\nAfter the rover was shipped from JPL to Kennedy Space Center, the team is getting closer to finalizing the spacecraft for launch later this summer.\n'}, {'title': "\n\n10.9 Million Names Now Aboard NASA's Perseverance Mars Rover\n\n", 'paragraph': "\nAs part of NASA's 'Send Your Name to Mars' campaign, they've been stenciled onto three microchips along with essays from NASA's 'Name the Rover' contest. Next stop: Mars.\n"}, 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16239_hires.jpg', [{'name': 'Cerberus Hemisphere', 'link': 'https://astrogeology.usgs.gov/cache/images/f5e372a36edfa389625da6d0cc25d905_cerberus_enhanced.tif_full.jpg'}, {'name': 'Schiaparelli Hemisphere', 'link': 'https://astrogeology.usgs.gov/cache/images/3778f7b43bbbc89d6e3cfabb3613ba93_schiaparelli_enhanced.tif_full.jpg'}, {'name': 'Syrtis Major Hemisphere', 'link': 'https://astrogeology.usgs.gov/cache/images/555e6403a6ddd7ba16ddb0e471cadcf7_syrtis_major_enhanced.tif_full.jpg'}, {'name': 'Valles Marineris Hemisphere', 'link': 'https://astrogeology.usgs.gov/cache/images/b3c7c6c9138f57b4756be9b9c43e3a48_valles_marineris_enhanced.tif_full.jpg'}]]

In [49]:
a[5]

{'title': "\n\n10.9 Million Names Now Aboard NASA's Perseverance Mars Rover\n\n",
 'paragraph': "\nAs part of NASA's 'Send Your Name to Mars' campaign, they've been stenciled onto three microchips along with essays from NASA's 'Name the Rover' contest. Next stop: Mars.\n"}