# Scraping and Analysis Tasks

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/)
* Collect the latest News Title and Paragraph Text
* Assign the text to variables that you can reference later.

### Data Sources: 

   #### All the data were scraped from the following websites:

* [NASA Mars News Site](https://mars.nasa.gov/news/) -Scraped the latest News Title and Paragraph Text
* [JPL Featured Space Image](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars) - Scraped the image url for the current Featured Mars Image
* [Mars Weather twitter account](https://twitter.com/marswxreport?lang=en) - Scraped the latest Mars weather tweet
* [USGS Astrogeology site](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) - Scraped high resolution images for each of Mar's hemispheres
* [Mars Facts webpage](https://space-facts.com/mars/) -Scraped the table containing facts about the planet including Diameter, Mass, etc

In [184]:
# Import Dependencies & Setup
from webdriver_manager.chrome import ChromeDriverManager
from flask import Flask, render_template, redirect
from bs4 import BeautifulSoup
from splinter import Browser
from pprint import pprint
from selenium import webdriver
import pandas as pd
import requests
import time
import pymongo

#### Window users (long and old way)

In [185]:
# Set Executable Path & Initialize Chrome Browser

# executable_path = {'executable_path': 'chromedriver.exe'}
# browser = Browser('chrome', **executable_path, headless=False)

#### MAC users  (long and old way)

In [186]:
# Set Executable Path & Initialize Chrome Browser
# executable_path = {"executable_path": "/usr/local/bin/chromedriver"}
# browser = Browser("chrome", **executable_path, headless=False)

### For Chrome

In [187]:
# Setup splinter: Looks/Scrape throu many pages instead of just one page 
# In order to run our selenium webdriver automation scripts on chrome and firefox browser, need to have browser drivers
executable_path = {'executable_path': ChromeDriverManager().install()} 
browser = Browser('chrome', **executable_path, headless=False)

[WDM] - Current google-chrome version is 87.0.4280
[WDM] - Get LATEST driver version for 87.0.4280
[WDM] - Driver [C:\Users\cache\.wdm\drivers\chromedriver\win32\87.0.4280.88\chromedriver.exe] found in cache


 


### For FireFox

In [188]:
# from webdriver_manager.firefox import GeckoDriverManager
# driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())

# NASA Mars News

* Scrape the NASA Mars News Site and collect the latest:
    * News Title
    * Paragraph Text 
* Assign the text to variables that you can reference later.

In [189]:
# Set up url
nasa_url = 'https://mars.nasa.gov/news/'
browser.visit(nasa_url)

In [192]:
# Parse Results HTML with BeautifulSoup
nasa_html = browser.html
nasa_soup = BeautifulSoup(nasa_html, 'lxml')

# print(nasa_soup.prettify())

In [193]:
# Scrape HTML & Finding Everything inside ul
nasa_results = nasa_soup.find('ul', class_='item_list')

#nasa_results

### Scrape and collect the latest News "TITLE"

In [110]:
# Getting the Title
news_title = nasa_results.find("div", class_ = "content_title").text


# Printing Title
print("\n------------------------------------------------------------------------------------------------\n")
print(f"Title:\n\n{news_title}")
print("\n-------------------------------------------------------------------------------------------------\n")


------------------------------------------------------------------------------------------------

Title:

MOXIE Could Help Future Rockets Launch Off Mars

-------------------------------------------------------------------------------------------------



###  Scrape and collect the "Paragraph Text"

In [113]:
news_parag = nasa_results.find("div", class_ = "article_teaser_body").text

# Printing paragraph text
print("\n-------------------------------------------------------------------------------------------------\n")
print(f"Paragraph:\n\n{news_parag}")
print("\n-------------------------------------------------------------------------------------------------\n")


-------------------------------------------------------------------------------------------------

Paragraph:

NASA's Perseverance rover carries a device to convert Martian air into oxygen that, if produced on a larger scale, could be used not just for breathing, but also for fuel.

-------------------------------------------------------------------------------------------------



# JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image 

* Use splinter to navigate the site & find the image url for the current Featured Mars Image 
* assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.


In [213]:
# Setup splinter: Looks/Scrape throu many pages instead of just one page 
# In order to run our selenium webdriver automation scripts on chrome and firefox browser, need to have browser drivers
executable_path = {'executable_path': ChromeDriverManager().install()} 
browser = Browser('chrome', **executable_path, headless=False)

[WDM] - Current google-chrome version is 87.0.4280
[WDM] - Get LATEST driver version for 87.0.4280
[WDM] - Driver [C:\Users\cache\.wdm\drivers\chromedriver\win32\87.0.4280.88\chromedriver.exe] found in cache


 


In [214]:
# Set up Url- Add url provided, create variable "featured_image_url", open browser

featured_image_url = "https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars"
browser.visit(featured_image_url)

In [215]:
response = requests.get(featured_image_url)
response

<Response [200]>

In [216]:
# Parse Results image HTML with BeautifulSoup
img_html = browser.html
img_soup = BeautifulSoup(img_html, 'lxml')

In [217]:
# Set up url
jpl_url = 'https://www.jpl.nasa.gov'

images = soup.select('img', class_='fancybox-image')
images


[<img alt="" class="print_only print_logo" src="/assets/logo_mars_trio_black@2x.png"/>,
 <img alt="expand arrow" class="arrow_expand" src="/assets/arrow_down.png"/>,
 <img alt="More" src="/assets/overlay-arrow.png"/>,
 <img alt="Mars 2020 Perseverance Rover" class="mission_image" src="/system/missions/list_view_images/23_PIA23764-RoverNamePlateonMars-320x240.jpg"/>,
 <img alt="More" src="/assets/overlay-arrow.png"/>,
 <img alt="Curiosity Rover" class="mission_image" src="/system/missions/list_view_images/2_PIA14175-thmfeat.jpg"/>,
 <img alt="More" src="/assets/overlay-arrow.png"/>,
 <img alt="InSight Lander" class="mission_image" src="/system/missions/list_view_images/21_PIA22743-320x240.jpg"/>,
 <img alt="More" src="/assets/overlay-arrow.png"/>,
 <img alt="MAVEN" class="mission_image" src="/system/missions/list_view_images/6_maven_320x240.jpg"/>,
 <img alt="More" src="/assets/overlay-arrow.png"/>,
 <img alt="Mars Reconnaissance Orbiter" class="mission_image" src="/system/missions/list

In [218]:
list_of_images = []
for i in images:
    list_of_images.append(i['src'])
print(list_of_images)

['/assets/logo_mars_trio_black@2x.png', '/assets/arrow_down.png', '/assets/overlay-arrow.png', '/system/missions/list_view_images/23_PIA23764-RoverNamePlateonMars-320x240.jpg', '/assets/overlay-arrow.png', '/system/missions/list_view_images/2_PIA14175-thmfeat.jpg', '/assets/overlay-arrow.png', '/system/missions/list_view_images/21_PIA22743-320x240.jpg', '/assets/overlay-arrow.png', '/system/missions/list_view_images/6_maven_320x240.jpg', '/assets/overlay-arrow.png', '/system/missions/list_view_images/8_MRO_320x240.jpg', '/assets/overlay-arrow.png', '/system/missions/list_view_images/5_mars_odyssey320x240.jpg', '/assets/overlay-arrow.png', '/system/news_items/list_view_images/8805_1-MOXIE-PIA24176-320.gif', '/assets/overlay-arrow.png', '/system/news_items/list_view_images/8801_20201118_mars2020-320x240.jpg', '/assets/overlay-arrow.png', '/system/news_items/list_view_images/8798_PIA22109-320.jpg', '/assets/overlay-arrow.png', '/system/news_items/list_view_images/8797_maven_illo_v7-320.jp

In [219]:
mount = list_of_images[3]
mount

'/system/missions/list_view_images/23_PIA23764-RoverNamePlateonMars-320x240.jpg'

In [225]:
image_link = requests.get('https://www.jpl.nasa.gov/system/missions/list_view_images/23_PIA23764-RoverNamePlateonMars-320x240.jpg')

<img src = 'https://www.jpl.nasa.gov/system/missions/list_view_images/23_PIA23764-RoverNamePlateonMars-320x240.jpg'>