In [1]:
# # Mission to Mars

# ![mission_to_mars](Images/mission_to_mars.jpg)

# In this assignment, you will build a web application that scrapes various websites for data related to the Mission to Mars and displays the information in a single HTML page. The following outlines what you need to do.

# ## Step 1 - Scraping

# Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

# * Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.

# ### NASA Mars News

# * Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

# ```python
# # Example:
# news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet"

# news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast."
# ```

# ### JPL Mars Space Images - Featured Image

# * Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

# * Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

# * Make sure to find the image url to the full size `.jpg` image.

# * Make sure to save a complete url string for this image.

# ```python
# # Example:
# featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'
# ```

# ### Mars Weather

# * Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called `mars_weather`.

# ```python
# # Example:
# mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55'
# ```

# ### Mars Facts

# * Visit the Mars Facts webpage [here](http://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

# * Use Pandas to convert the data to a HTML table string.

# ### Mars Hemispheres

# * Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

# * You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

# * Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

# * Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

# ```python
# # Example:
# hemisphere_image_urls = [
#     {"title": "Valles Marineris Hemisphere", "img_url": "..."},
#     {"title": "Cerberus Hemisphere", "img_url": "..."},
#     {"title": "Schiaparelli Hemisphere", "img_url": "..."},
#     {"title": "Syrtis Major Hemisphere", "img_url": "..."},
# ]
# ```

# - - -

# ## Step 2 - MongoDB and Flask Application

# Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

# * Start by converting your Jupyter notebook into a Python script called `scrape_mars.py` with a function called `scrape` that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

# * Next, create a route called `/scrape` that will import your `scrape_mars.py` script and call your `scrape` function.

#   * Store the return value in Mongo as a Python dictionary.

# * Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.

# * Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.

# ![final_app_part1.png](Images/final_app_part1.png)
# ![final_app_part2.png](Images/final_app_part2.png)

# - - -

# ## Hints

# * Use Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

# * Use Pymongo for CRUD applications for your database. For this homework, you can simply overwrite the existing document each time the `/scrape` url is visited and new data is obtained.

# * Use Bootstrap to structure your HTML template.



In [2]:
# Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

from time import sleep
import requests
import pandas as pd
from bs4 import BeautifulSoup
from pprint import pprint
from splinter import Browser
import pymongo
import selenium

executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)



In [3]:
# * Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

nasa_web = "https://mars.nasa.gov/news/"
browser.visit(nasa_web)
html = browser.html
soup = BeautifulSoup(html,'html.parser')




In [4]:
# print(soup.body.prettify())
html_body = soup.body


# collect the latest News Title
slides = soup.find_all('li', class_="slide")

for slide in slides:
    date = slide.find('div', class_='list_date').text
    news_title = slide.find('div', class_='content_title').text
    paragraph = slide.find('div', class_='article_teaser_body').text
    
    
    print(date)
    print(news_title)
    
    print(paragraph)
    print('----'*4)
    

# slides.find_all('div', class_='list_date')


November 27, 2018
NASA Hears MarCO CubeSats Loud and Clear from Mars 
A pair of tiny, experimental spacecraft fulfilled their mission yesterday, relaying back near-real-time data during InSight's landing.
----------------
November 26, 2018
InSight Is Catching Rays on Mars
The lander has sent data indicating its solar panels are open and receiving sunlight to power its surface operations.
----------------
November 26, 2018
NASA InSight Lander Arrives on Martian Surface 
The touchdown marks the eighth time NASA has successfully landed a spacecraft on Mars.
----------------
November 25, 2018
Landing Day for InSight
NASA's InSight spacecraft is on target for Mars landing at around noon PST today.
----------------
November 21, 2018
NASA InSight Landing on Mars: Milestones
On Nov. 26, NASA's InSight spacecraft will blaze through the Martian atmosphere and set a lander gently on the surface in less time than it takes to cook a hard-boiled egg.
----------------
November 21, 2018
NASA InSight T

In [5]:
# ### JPL Mars Space Images - Featured Image

# * Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).
# 
mars_images = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(mars_images)

In [6]:
# * Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.
##  htmlcode for button.
#---> <a class='button' href style='display: inline-block;'> More</a> 

# * Make sure to find the image url to the full size `.jpg` image.


# h3 = "class=release_date"
#  date = image.find('time')['datetime']



# * Make sure to save a complete url string for this image.
html_3 = browser.html
soup_3 = BeautifulSoup(html_3,'html.parser')

In [7]:
images = soup_3.find_all('li', class_='slide')
    
x = 0 
    
for image in images:
    if x <= 20:
        
        dates = image.find('h3', class_='release_date').text
        link = image.a['data-fancybox-href']
        sleep(1)
        
        x = x +1
        print(x)
        print('--'*8)
        print(dates)
        print('image url:', 'https://www.jpl.nasa.gov' + link)
        
#         browser.click_link_by_partial_text('MORE')
        
        

       
    
  
        
    

    
    
    
    
#     post = {
#         'date': dates,
#         'link': link,
#     }
    
#     collection.insert_one(post)

1
----------------
November 27, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22870_hires.jpg
2
----------------
November 27, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22869_hires.jpg
3
----------------
November 27, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22868_hires.jpg
4
----------------
November 27, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22867_hires.jpg
5
----------------
November 27, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22857_hires.jpg
6
----------------
November 26, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22575_hires.jpg
7
----------------
November 26, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22833_hires.jpg
8
----------------
November 26, 2018
image url: https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA22829_hires.jpg
9
----------------
November 26, 

In [8]:
# ### Mars Weather

# * Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called `mars_weather`.

mars_twitter = "https://twitter.com/marswxreport?lang=en"

browser.visit(mars_twitter)
html_2 = browser.html
soup_2 = BeautifulSoup(html_2,'html.parser')

In [9]:
tweets = soup_2.find_all('p', class_="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text")

count = 1

for tweet in tweets:
    tweet_post = tweet.text
    
    if tweet.text[:3] == "Sol":
        print('---------')
        print(tweet_post.replace(', ','\n'))

    
   
          
    


---------
Sol 2240 (2018-11-24)
high 0C/32F
low -70C/-93F
pressure at 8.49 hPa
daylight 06:28-18:44
---------
Sol 2239 (2018-11-23)
high -2C/28F
low -70C/-93F
pressure at 8.52 hPa
daylight 06:28-18:44
---------
Sol 2238 (2018-11-22)
high -2C/28F
low -69C/-92F
pressure at 8.53 hPa
daylight 06:27-18:43
---------
Sol 2237 (2018-11-21)
high -3C/26F
low -70C/-93F
pressure at 8.54 hPa
daylight 06:27-18:43
---------
Sol 2236 (2018-11-20)
high -3C/26F
low -71C/-95F
pressure at 8.57 hPa
daylight 06:26-18:42
---------
Sol 2235 (2018-11-19)
high 2C/35F
low -70C/-93F
pressure at 8.53 hPa
daylight 06:25-18:42
---------
Sol 2234 (2018-11-18)
high 2C/35F
low -70C/-93F
pressure at 8.57 hPa
daylight 06:25-18:41
---------
Sol 2233 (2018-11-17)
high -4C/24F
low -72C/-97F
pressure at 8.61 hPa
daylight 06:24-18:41
---------
Sol 2232 (2018-11-16)
high -3C/26F
low -73C/-99F
pressure at 8.58 hPa
daylight 06:24-18:40
---------
Sol 2231 (2018-11-15)
high -10C/14F
low -73C/-99F
pressure at 8.60 hPa
daylight 06:2

In [None]:
# ### Mars Facts

# * Visit the Mars Facts webpage [here](http://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

# * Use Pandas to convert the data to a HTML table string.

In [10]:
mars_facts_url = "http://space-facts.com/mars/"

In [11]:
tables = pd.read_html(mars_facts_url)

In [12]:
df = tables[0]
df.columns = ['Description', 'Value']
df

Unnamed: 0,Description,Value
0,Equatorial Diameter:,"6,792 km"
1,Polar Diameter:,"6,752 km"
2,Mass:,6.42 x 10^23 kg (10.7% Earth)
3,Moons:,2 (Phobos & Deimos)
4,Orbit Distance:,"227,943,824 km (1.52 AU)"
5,Orbit Period:,687 days (1.9 years)
6,Surface Temperature:,-153 to 20 °C
7,First Record:,2nd millennium BC
8,Recorded By:,Egyptian astronomers


In [None]:
# html_table = df.to_html('table.html')

# html_table.replace('\n', '')

In [None]:
# ### Mars Hemispheres

# * Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

# * You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

# * Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

# * Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

# ```python
# # Example:
# hemisphere_image_urls = [
#     {"title": "Valles Marineris Hemisphere", "img_url": "..."},
#     {"title": "Cerberus Hemisphere", "img_url": "..."},
#     {"title": "Schiaparelli Hemisphere", "img_url": "..."},
#     {"title": "Syrtis Major Hemisphere", "img_url": "..."},

In [13]:
astrogeology_url = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"
browser.visit(astrogeology_url)

astro_html = browser.html
astro = BeautifulSoup(astro_html,'html.parser')

In [14]:
browser.find_by_css('.thumb')[0].click()


In [15]:
browser.click_link_by_text('Sample')

In [16]:
browser.url

'https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced'

In [17]:
browser.visit(astrogeology_url)
sleep(2)
browser.find_by_css('.thumb')[1].click()
browser.click_link_by_text("Sample")
browser.url


'https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced'

In [18]:
browser.visit(astrogeology_url)
browser.find_by_css('.thumb')[2].click()
browser.click_link_by_text("Sample")
browser.url

'https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced'

In [19]:
browser.visit(astrogeology_url)
browser.find_by_css('.thumb')[3].click()
browser.click_link_by_text("Sample")
browser.url

'https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced'

In [20]:
browser.html

'<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>\n\t\t<link rel="stylesheet" type="text/css" href="//ajax.googleapis.com/ajax/libs/jqueryui/1.11.4/themes/smoothness/jquery-ui.css" />\n<title>Valles Marineris Hemisphere Enhanced | USGS Astrogeology Science Center</title>\n\t\t<meta name="description" content="Mosaic of the Valles Marineris hemisphere of Mars projected into point perspective, a view similar to that which one would…" />\n\t\t<meta name="keywords" content="USGS,Astrogeology Science Center,Cartography,Geology,Space,Geological Survey,Mapping" />\n\t\t<meta http-equiv="X-UA-Compatible" content="IE=edge" />\n\t\t<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />\n\t\t<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />\n\t\t<meta name="google-site-verification" content="x61hXXVj7wtfBSNOPnTftajMsZ5yB2W-qRoyr7GtOKM" />\n\t\t<!--<link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Open

In [21]:
hemispheres = astro.find('div', class_="collapsible results")
hem = hemispheres.find_all('div', class_='item')

for h in hem:
    hem_ref = h.a['href']
    bb = browser.find_by_css('a.product-item')
    bb[h].click()
    sleep(2)
    
    

ElementDoesNotExist: no elements could be found with css "a.product-item"

In [22]:
    
browser = init_browser()
browser.visit(url5)

first = browser.find_by_tag('h3')[0].text
second = browser.find_by_tag('h3')[1].text
third = browser.find_by_tag('h3')[2].text
fourth = browser.find_by_tag('h3')[3].text

browser.find_by_css('.thumb')[0].click()
first_img = browser.find_by_text('Sample')['href']
browser.back()

browser.find_by_css('.thumb')[1].click()
second_img = browser.find_by_text('Sample')['href']
browser.back()

browser.find_by_css('.thumb')[2].click()
third_img = browser.find_by_text('Sample')['href']
browser.back()

browser.find_by_css('.thumb')[3].click()
fourth_img = browser.find_by_text('Sample')['href']

hemisphere_image_urls = [
    {'title': first, 'img_url': first_img},
    {'title': second, 'img_url': second_img},
    {'title': third, 'img_url': third_img},
    {'title': fourth, 'img_url': fourth_img}
]

print(hemisphere_image_urls)
    
    

NameError: name 'init_browser' is not defined

In [None]:
# ## Step 2 - MongoDB and Flask Application

# Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.
conn='mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# * Start by converting your Jupyter notebook into a Python script called `scrape_mars.py` with a function called `scrape` that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.
db = client.mars_db
collection = db.items

# * Next, create a route called `/scrape` that will import your `scrape_mars.py` script and call your `scrape` function.

#   * Store the return value in Mongo as a Python dictionary.

# * Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.

# * Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.

# ![final_app_part1.png](Images/final_app_part1.png)
# ![final_app_part2.png](Images/final_app_part2.png)