# MARS CHALLENGE Splinter + BeautifulSoup

Spliter is a Python module that allows us to automate standard browser interactions. Splinter is an abstraction layer that allows us to use python code to make use of a browser automation tool called Selenium. 

In this exercise, we'll demonstrate how we can use Splinter to scrape content from a page and then re-route to the next page and continue to scrape.

- [Splinter Docs](https://splinter.readthedocs.io/en/latest/index.html)
- [Info about the ChromeDriver](https://splinter.readthedocs.io/en/latest/drivers/chrome.html)

### Import dependencies

In [1]:
from splinter import Browser
from bs4 import BeautifulSoup as bsoup

# from bs4 import BeautifulSoup
import requests
import pymongo
import datetime as dt
import time
import pandas as pd

# Mac Users

In [None]:
# # identify location of chromedriver and store it as a variable
# driverPath = !which chromedriver

# # Setup configuration variables to enable Splinter to interact with browser
# executable_path = {'executable_path': driverPath[0]}
# browser = Browser('chrome', **executable_path, headless=False)

# Windows Users

Windows users will need to store the `chromedriver.exe` file in the same location as this _.ipynb_ file, or they will need to provide the path to the executable.

In [None]:
# executable_path = {'executable_path': 'chromedriver.exe'}
# browser = Browser('chrome', **executable_path, headless=False)

### Define URL to scrape and inform the browser to visit the page and scrape news headlines

In [2]:
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [3]:
url = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'
browser.visit(url)
html = browser.html
soup = bsoup(html,'html.parser')

In [4]:
results = soup.select_one("ul.item_list li.slide")
# results = soup.select_one("ul")
news_title = results.find("div", class_= "content_title").get_text()
news_para = results.find('div', class_='article_teaser_body').get_text()

NASA_News_dict = {}
NASA_News_dict = {"news_title": news_title, "news_paragraph": news_para}
# print(news_title)
# print(news_para)
NASA_News_dict

{'news_title': "NASA's Perseverance Rover Is Midway to Mars ",
 'news_paragraph': "Sometimes half measures can be a good thing – especially on a journey this long. The agency's latest rover only has about 146 million miles left to reach its destination."}

### Define URL to scrape and inform the browser to visit the page and scrape featured image

In [2]:
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [3]:
url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url)
browser.click_link_by_partial_text('FULL IMAGE')

# Update the CURRENT html code in the browser
html = browser.html
img_soup = bsoup(html, 'html.parser')



In [14]:
# time.sleep(3)
results2 = browser.find_by_css('img[class="fancybox-image"]').first
NASA_image_dict = {}
NASA_image_dict["featured_image"] = (results2['src'])
NASA_image_dict

{'featured_image': 'https://www.jpl.nasa.gov/spaceimages/images/mediumsize/PIA19964_ip.jpg'}

### Define URL to scrape and inform the browser to visit the page and scrape table of facts

In [16]:
df = pd.read_html("https://space-facts.com/mars/")
len(df)
Mars_facts_df = df[0]
Mars_facts_df.to_html

<bound method DataFrame.to_html of                       0                              1
0  Equatorial Diameter:                       6,792 km
1       Polar Diameter:                       6,752 km
2                 Mass:  6.39 × 10^23 kg (0.11 Earths)
3                Moons:            2 (Phobos & Deimos)
4       Orbit Distance:       227,943,824 km (1.38 AU)
5         Orbit Period:           687 days (1.9 years)
6  Surface Temperature:                   -87 to -5 °C
7         First Record:              2nd millennium BC
8          Recorded By:           Egyptian astronomers>

### Define URL to scrape and inform the browser to visit the page and scrape hemisphere photographs

In [17]:
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [19]:
Hemisphere_List = ["Cerberus Hemisphere Enhanced", "Schiaparelli Hemisphere Enhanced", "Syrtis Major Hemisphere Enhanced", "Valles Marineris Hemisphere Enhanced"]
NASA_Mars_Hemisphere_Dict = {}
base_url = "https://astrogeology.usgs.gov/"
for x in Hemisphere_List:
    url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
    time.sleep(2)
    browser.visit(url)
    time.sleep(2)
    browser.click_link_by_partial_text(x)
    time.sleep(2)
    browser.click_link_by_partial_text("Open")
    time.sleep(2)

    # Update the CURRENT html code in the browser
    html = browser.html
    img_soup = bsoup(html, 'html.parser')
    time.sleep(2)
    # results3 = img_soup.find("img", {"class_": "wide-image"})
    results3 = img_soup.find('img', class_='wide-image')
    final_url = f"https://astrogeology.usgs.gov/{results3['src']}"
    NASA_Mars_Hemisphere_Dict[x] = final_url

NASA_Mars_Hemisphere_Dict


{'Cerberus Hemisphere Enhanced': 'https://astrogeology.usgs.gov//cache/images/f5e372a36edfa389625da6d0cc25d905_cerberus_enhanced.tif_full.jpg',
 'Schiaparelli Hemisphere Enhanced': 'https://astrogeology.usgs.gov//cache/images/3778f7b43bbbc89d6e3cfabb3613ba93_schiaparelli_enhanced.tif_full.jpg',
 'Syrtis Major Hemisphere Enhanced': 'https://astrogeology.usgs.gov//cache/images/555e6403a6ddd7ba16ddb0e471cadcf7_syrtis_major_enhanced.tif_full.jpg',
 'Valles Marineris Hemisphere Enhanced': 'https://astrogeology.usgs.gov//cache/images/b3c7c6c9138f57b4756be9b9c43e3a48_valles_marineris_enhanced.tif_full.jpg'}

In [None]:
time.sleep(3)
# results3 = img_soup.find("img", {"class_": "wide-image"})
results3 = img_soup.find('img', class_='wide-image')
print(results3['src'])


In [None]:
# test = results3.find_all('src')
# print(test)

In [None]:
# THIS WORKS
Hemisphere_List = ["Cerberus Hemisphere Enhanced", "Schiaparelli Hemisphere Enhanced", "Syrtis Major Hemisphere Enhanced", "Valles Marineris Hemisphere Enhanced"]
url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(url) 
browser.click_link_by_partial_text("Cerberus Hemisphere Enhanced")
time.sleep(3)
browser.click_link_by_partial_text("Open")

# Update the CURRENT html code in the browser
html = browser.html
img_soup = bsoup(html, 'html.parser')
time.sleep(3)
# results3 = img_soup.find("img", {"class_": "wide-image"})
results3 = img_soup.find('img', class_='wide-image')
print(results3['src'])

### Iterate over multiple pages and scrape content from each

In [None]:
# for i in range(100):
#     html = browser.html
#     soup = BeautifulSoup(html, 'html.parser')

#     quotes = soup.find_all('span', class_='text')
    
#     print(f'PAGE: {i+1}')
#     for q in quotes:
#         print('----------')
#         print(q.text)
#     try:
#         browser.click_link_by_partial_text("Next")
#     except:
#         break

### Close the browser window