# Web Scraping Homework - Mission to Mars

We will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. Complete the initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

### Import Dependencies

In [1]:
import os
from bs4 import BeautifulSoup
import requests
from splinter import Browser

### Setup Splinter

In [2]:
# identify location of chromedriver and store it as a variable
driverPath = !which chromedriver

# Setup configuration variables to enable Splinter to interact with browser
executable_path = {'executable_path': driverPath[0]}
browser = Browser('chrome', **executable_path, headless=False)

## Step 1 - Scraping

**Hint**: Use Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

In [3]:
# URL of page to be scraped
# url_nasa = "https://mars.nasa.gov/news/"
url_nasa = "https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest"

# Retrieve page with the requests module
response = requests.get(url_nasa)

# Create BeautifulSoup object¶
soup = BeautifulSoup(response.text, 'html.parser')

# Extract title text
title = soup
print(title)

<!DOCTYPE html>

<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<!-- Always force latest IE rendering engine or request Chrome Frame -->
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<!-- Responsiveness -->
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<!-- Favicon -->
<link href="/apple-touch-icon.png" rel="apple-touch-icon" sizes="180x180"/>
<link href="/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>
<link href="/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>
<link href="/manifest.json" rel="manifest"/>
<link color="#e48b55" href="/safari-pinned-tab.svg" rel="mask-icon"/>
<meta content="#000000" name="theme-color"/>
<meta content="authenticity_token" name="csrf-param">
<meta content="mb6EzCiAjS86aKkwQqAISR83fw89ms+ccbGmILQWDLrPrNsZzsr+8q7/6h+a6zln1M4KQiUUpePCU6whVmfMjQ==" name="csrf-token">
<title>News  – NASA’s M

Aparently the lastest news are not available when scraping the page with traditional method

#### Use splinter to inform the browser to visit the page

In [4]:
# Use the browser to visit the url
browser.visit(url_nasa)

In [5]:
# Use beatifulsoup to scrap the page rendered by the browser
html_nasa = browser.html
soup = BeautifulSoup(html_nasa, 'html.parser')

In [6]:
# Print the body result to search by the headlines
# print(soup.body)

In [7]:
# Print the li that contatins the first headline
results = soup.find('li', class_="slide")
results
# print(results.prettify())

<li class="slide"><div class="image_and_description_container"><a href="/news/8782/sensors-on-mars-2020-spacecraft-answer-long-distance-call-from-earth/" target="_self"><div class="rollover_description"><div class="rollover_description_inner">Instruments tailored to collect data during the descent of NASA's next rover through the Red Planet's atmosphere have been checked in flight.</div><div class="overlay_arrow"><img alt="More" src="/assets/overlay-arrow.png"/></div></div><div class="list_image"><img alt="Mars 2020 heat shield and back shell prior to launch" src="/system/news_items/list_view_images/8782_PIA-23989-320.jpg"/></div><div class="bottom_gradient"><div><h3>Sensors on Mars 2020 Spacecraft Answer Long-Distance Call From Earth</h3></div></div></a><div class="list_text"><div class="list_date">October 22, 2020</div><div class="content_title"><a href="/news/8782/sensors-on-mars-2020-spacecraft-answer-long-distance-call-from-earth/" target="_self">Sensors on Mars 2020 Spacecraft An

In [8]:
# Assign the text to variables that you can reference later
news_title = results.find('h3').text
print(f"Title: {news_title}\n")

news_p = results.find('div', class_='article_teaser_body').text
print(f"{news_p}")

Title: Sensors on Mars 2020 Spacecraft Answer Long-Distance Call From Earth

Instruments tailored to collect data during the descent of NASA's next rover through the Red Planet's atmosphere have been checked in flight.


### JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars). 

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called featured_image_url.

* Find the image url to the full size .jpg image. Make sure to save a complete url string for this image.

In [9]:
# URL for JPL Nasa websit
url_jpl = "https://www.jpl.nasa.gov"

# The url for JPL Featured Space Image
space_images = "/spaceimages/?search=&category=Mars"

# Full url
url_jpl_space_images = f"{url_jpl}{space_images}"

# Use the browser to visit the url
browser.visit(url_jpl_space_images)

https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars


In [10]:
# Use beatifulsoup to scrap the page rendered by the browser
html_jpl = browser.html
soup = BeautifulSoup(html_jpl, 'html.parser')

In [11]:
# Print the body result to search by the featured image
# print(soup.body)

In [12]:
# Featured image is in the div class="carousel_container"
results = soup.find('div', class_ = "carousel_container")
print(results)

<div class="carousel_container">
<div class="carousel_items">
<article alt="Installing Hubble's New Camera" class="carousel_item" style="background-image: url('/spaceimages/images/wallpaper/PIA22911-1920x1200.jpg');">
<div class="default floating_text_area ms-layer">
<h2 class="category_title">
</h2>
<h2 class="brand_title">
				  FEATURED IMAGE
				</h2>
<h1 class="media_feature_title">
				  Installing Hubble's New Camera				</h1>
<div class="description">
</div>
<footer>
<a class="button fancybox" data-description="This image of NASAs Hubble Space Telescope shows Astronaut Jeffrey Hoffman and Story Musgrave installing the Wide Field and Planetary Camera 2 (WFPC2) on the Hubble Space Telescope, during SM1 in December, 1993." data-fancybox-group="images" data-fancybox-href="/spaceimages/images/mediumsize/PIA22911_ip.jpg" data-link="/spaceimages/details.php?id=PIA22911" data-title="Installing Hubble's New Camera" id="full_image">
					FULL IMAGE
				  </a>
</footer>
</div>
<div class="

In [13]:
# Find the article
article = results.find('article')

# Grab the style string and split
style = article['style'].split("(")

# Retrieve the url strig location
image_location = style[1].split(")")[0][1:-1]

# Compose the full url of the image
featured_image_url = f"{url_jpl}{image_location}"
print(featured_image_url)

https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA22911-1920x1200.jpg


### Mars Facts

### Mars Hemispheres