# Mars News Mongo Web Scraper

* In this assignment, you will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. The following outlines what you need to do.

### Step 1 - Scraping

    1) NASA Mars News
    2) JPL Mars Space Images - Featured Image
    3) Mars Weather
    4) Mars Facts
    5) Mars Hemispheres

### Step 2 - MongoDB and Flask Application

* Convert your Jupyter notebook into a Python script called `scrape_mars.py` with `scrape`
* Create a route called `/scrape` that will import your `scrape_mars.py`
* Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.
* Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements.
    * [final_app_part1.png](Images/final_app_part1.png) (Images/final_app_part1.png)
    * [final_app_part2.png](Images/final_app_part2.png) (Images/final_app_part2.png)

 # Dependencies
from bs4 import BeautifulSoup
from splinter import Browser
import requests
import pymongo

In [1]:
# Dependencies
from bs4 import BeautifulSoup
from splinter import Browser
import requests
import pandas as pd
import numpy as np

# NASA Mars News (1 of 5)

### Connect the Mongo to Mars
* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) (https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

In [2]:
# URL of page to be scraped
NASA_mars_news_url = 'https://mars.nasa.gov/news/'

# Retrieve page with the requests module
mars_response = requests.get(NASA_mars_news_url)

# mars_response ## Preview

In [3]:
# Create BeautifulSoup object; parse with 'html.parser'
martian_soup = BeautifulSoup(mars_response.text, 'html.parser')

# martian_soup ## Preview

### are you sure you did that right? 

In [4]:
# print(martian_soup.prettify()) ## Preview

In [5]:
news_title = martian_soup.find('div', 'a', class_="content_title").text
news_p = martian_soup.find('div', class_="rollover_description_inner").text

print(news_title)
print(news_p)



Why This Martian Full Moon Looks Like Candy



For the first time, NASA's Mars Odyssey orbiter has caught the Martian moon Phobos during a full moon phase. Each color in this new image represents a temperature range detected by Odyssey's infrared camera.



#### ========================== END NASA Mars News (1 of 5) ==========================

 # JPL Mars Space Images - Featured Image (2 of 5)
 
* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars) (https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

### I'm on a Mac

    [sudo] pip install selenium

    from splinter import Browser
    executable_path = {'executable_path':'</path/to/chrome>'}

    browser = Browser('chrome', **executable_path)

    brew cask install chromedriver

In [6]:
# https://splinter.readthedocs.io/en/latest/drivers/chrome.html # Thanks Fernanda!!!
!which chromedriver

/usr/local/bin/chromedriver


In [7]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
mars_img_browser = Browser('chrome', **executable_path, headless=False)

In [8]:
# URL of page to be scraped

mars_image_url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'

# I think we have to visit to make this work? 

mars_img_browser.visit(mars_image_url)

In [9]:
# Retrieve page with the requests module
mars_html = mars_img_browser.html
mars_img_soup = BeautifulSoup(mars_html, 'html.parser')

# print(img_soup.prettify())  #  PREVIEW

In [10]:
base_img_url = "https://www.jpl.nasa.gov"
img_url = mars_img_soup.find("a", class_="button fancybox")["data-fancybox-href"]
featured_image_url = (base_img_url + img_url)
print (featured_image_url)

https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA15883_ip.jpg


#### =========== END JPL Mars Space Images - Featured Image (2 of 5) ===========

# Mars Weather (3 of 5)

* Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) 
* Scrape the latest Mars weather tweet from the page. 
* Save the tweet text for the weather report as a variable called `mars_weather`.


In [11]:
# URL of page to be scraped
mars_weather_url = "https://twitter.com/marswxreport?lang=en)"

# Retrieve page with the requests module
mars_weather_request = requests.get(mars_weather_url)

# mars_weather_request  ## Preview

In [12]:
# Create BeautifulSoup object; parse with 'html.parser'
mars_weather_soup = BeautifulSoup(mars_weather_request.text, 'html.parser')

# print(mars_weather_soup.prettify())  #  PREVIEW

In [13]:
mars_weather_readings = mars_weather_soup.find('p', class_="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text").text

# print(mars_weather_readings)
mwr = mars_weather_readings.split("pic.")
mwr[0]


'InSight sol 167 (2019-05-17) low -100.5ºC (-148.9ºF) high -20.4ºC (-4.6ºF)\nwinds from the SW at 4.7 m/s (10.6 mph) gusting to 13.5 m/s (30.3 mph)\npressure at 7.50 hPa'

#### ====================== END Mars Weather (3 of 5) ======================

# Mars Facts (4 of 5)

* Visit the Mars Facts webpage [here] (https://space-facts.com/mars/)
* use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.
* Use Pandas to convert the data to a HTML table string.

In [14]:
# URL of page to be scraped
mars_facts_url = "https://space-facts.com/mars/"

In [15]:
# Create a beautiful Pandas df with Mars Fax
mars_facts_table = pd.read_html(mars_facts_url)
mars_facts_df = mars_facts_table[0]
mars_facts_df.columns = ["Query", "Fax"]
mars_facts_df

Unnamed: 0,Query,Fax
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.42 x 10^23 kg (10.7% Earth)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.52 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-153 to 20 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [16]:
eq_diam = mars_facts_df["Query"][0]
pl_diam = mars_facts_df["Query"][1]
mass = mars_facts_df["Query"][2]
moons = mars_facts_df["Query"][3]
o_dist = mars_facts_df["Query"][4]
o_period = mars_facts_df["Query"][5]
temp = mars_facts_df["Query"][6]
first_rec = mars_facts_df["Query"][7]
rec_by = mars_facts_df["Query"][8]

eq_diam_faq = mars_facts_df["Fax"][0]
pl_diam_faq = mars_facts_df["Fax"][1]
mass_faq = mars_facts_df["Fax"][2]
moons_faq = mars_facts_df["Fax"][3]
o_dist_faq = mars_facts_df["Fax"][4]
o_period_faq = mars_facts_df["Fax"][5]
temp_faq = mars_facts_df["Fax"][6]
first_rec_faq = mars_facts_df["Fax"][7]
rec_by_faq = mars_facts_df["Fax"][8]


In [17]:
mars_facts_html_table = mars_facts_df.to_html() # 'Table/mars_html_table.html' -- to create previewable html
mars_facts_html_table = mars_facts_html_table.replace("\n", "")
print(mars_facts_html_table)
# !open "Table/mars_html_table.html"

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>Query</th>      <th>Fax</th>    </tr>  </thead>  <tbody>    <tr>      <th>0</th>      <td>Equatorial Diameter:</td>      <td>6,792 km</td>    </tr>    <tr>      <th>1</th>      <td>Polar Diameter:</td>      <td>6,752 km</td>    </tr>    <tr>      <th>2</th>      <td>Mass:</td>      <td>6.42 x 10^23 kg (10.7% Earth)</td>    </tr>    <tr>      <th>3</th>      <td>Moons:</td>      <td>2 (Phobos &amp; Deimos)</td>    </tr>    <tr>      <th>4</th>      <td>Orbit Distance:</td>      <td>227,943,824 km (1.52 AU)</td>    </tr>    <tr>      <th>5</th>      <td>Orbit Period:</td>      <td>687 days (1.9 years)</td>    </tr>    <tr>      <th>6</th>      <td>Surface Temperature:</td>      <td>-153 to 20 °C</td>    </tr>    <tr>      <th>7</th>      <td>First Record:</td>      <td>2nd millennium BC</td>    </tr>    <tr>      <th>8</th>      <td>Recorded By:</td>      <td>Egyptian astronomers</td>

#### ========================= END Mars Facts (4 of 5) =========================

# Mars Hemispheres (5 of 5)

* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres. (https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars)

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

In [18]:
# Do I have to add this multiple times? 

!which chromedriver

/usr/local/bin/chromedriver


In [19]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
mars_hemispheres_base_img_browser = Browser('chrome', **executable_path, headless=False)

In [20]:
# URL of page to be scraped 
mars_hemisphere_url = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"

# Make sure you visit! 
mars_hemispheres_base_img_browser.visit(mars_hemisphere_url)

In [21]:
# Retrieve page with the requests module

mars_hemisphere_html = mars_hemispheres_base_img_browser.html
mars_hemisphere_html_soup = BeautifulSoup(mars_hemisphere_html, 'html.parser')

# print(mars_hemisphere_html_soup.prettify()) ## PREVIEW

In [22]:
# Gather each of the links to the hemispheres in order to find the image url to the full resolution image.
hemispheres_imgs = mars_hemisphere_html_soup.find_all("div", class_="item")

# hemispheres_imgs ## PREVIEW

In [141]:
##### A SPECIAL THANK YOU TO FERNANDA! #### 

hemispheres_base_img_url = "https://astrogeology.usgs.gov"

hemispheres_imgs_urls = []
hiu = []

for img in hemispheres_imgs:
    img_title = img.find("h3").text
    img_url = img.find("a")["href"]
    full_img_link = hemispheres_base_img_url + img_url 
    
    mars_hemispheres_base_img_browser.visit(full_img_link)
    hemispheres_base_img_html = mars_hemispheres_base_img_browser.html
    hemispheres_full_img_soup = BeautifulSoup(hemispheres_base_img_html, "html.parser")
    hemispheres_downloads = hemispheres_full_img_soup.find("div", class_="downloads")
    hemispheres_image_download_urls = hemispheres_downloads.find("a")["href"]
    hemispheres_imgs_urls.append({"title": img_title, "img_url": img_url})
    hiu.append({"title": img_title, "img_url": hemispheres_image_download_url})
    print(hiu[0]['img_url'])
    print(hiu[1]['img_url'])
    print(hiu[2]['img_url'])
    print(hiu[3]['img_url'])


['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg']
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg']
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg']
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg']


#### ========================= END Mars Hemispheres (5 of 5) =========================

####  ===================================  Fin  ====================================