# Mission to Mars

### Part 1: Scraping the information from

* Mars New Site: https://redplanetscience.com/
* Mars Images site: https://spaceimages-mars.com/
* Mars Facts page: https://galaxyfacts-mars.com/
* Mars Hemispheres images: https://marshemispheres.com/

In [1]:
# Dependencies
import pandas as pd
from splinter import Browser
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
import requests
import os

#### 1a: Mars New Site: https://redplanetscience.com/
* Get the latest article title
* Get the latest article description paragraph

In [2]:
#Open the browswer window - set up splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

[WDM] - Downloading: 100%|█████████████████| 8.15M/8.15M [00:00<00:00, 19.1MB/s]


In [3]:
# set URL to scrape
url = 'https://redplanetscience.com/'

# open the page in browswer
browser.visit(url)

In [4]:
# Create BeautifulSoup object; parse with 'html.parser'
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

In [5]:
#Locate the section containing the news articles
container = soup.find('section', class_='image_and_description_container')

#Locate the title and descriptive paragraph for the first article and save to variables
news = container.find('div', class_='content_title').text
news_p = container.find('div', class_='article_teaser_body').text


In [6]:
browser.quit()

#### 1b: Mars Images site: https://spaceimages-mars.com/
* Get the URL for the featured Mars image (full sized)

In [7]:
#Open the browswer window - set up splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

In [8]:
# set URL to scrape
url = 'https://spaceimages-mars.com/'

# open the page in browswer
browser.visit(url)

In [9]:
# Create BeautifulSoup object; parse with 'html.parser'
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

In [10]:
image = soup.find('div', class_ = "floating_text_area").a['href']
featured_image_url = f'{url}{image}'

In [11]:
browser.quit()

#### 1c: Mars Facts page: https://galaxyfacts-mars.com/
* Get the table comtaining Mars facts (diamter, mass, etc)*
* Use Pandas to convert to and HTML table string

###### *Note: I selected the table with just Mars facts, not the one with earth and Mars since this project is on Mars

In [14]:
#set url
url = 'https://galaxyfacts-mars.com/'
#use pandas to find all tables on the page
tables = pd.read_html(url)
#select the second table since it is in a side bar and there is one on the main page
df = tables[1]

In [15]:
#Turn the dataframe into HTML
html_table = df.to_html(classes = 'table table-stripped', header=False, index=False)

####  1d: Mars Hemispheres images: https://marshemispheres.com/
* Get the urls for each of the hemispheres (full resolution image)
* store the hemisphere title and URL string to a list with one dictionary for each hemisphere

In [20]:
# Stack Overflow URL to scrape
url = 'https://marshemispheres.com/'
# Retrieve page with the requests module
response = requests.get(url)
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(response.text, 'html.parser')
#find section with the Hemisphere pages
results = soup.find_all('div', class_='item')

In [21]:
hemisphere_image_urls = []  
    
#for each Hemisphere
for result in results:
    #find the address for the hemisphere page and get hemi name
    webadd = result.a['href']
    hemi_name=result.find('h3').text
       
    #Open the browswer window - set up splinter
    executable_path = {'executable_path': ChromeDriverManager().install()}
    browser = Browser('chrome', **executable_path, headless=True)
    
    # set URL to scrape
    hemi_url = url + webadd
    
    # open the page in browswer
    browser.visit(hemi_url)
    
    # Create BeautifulSoup object; parse with 'html.parser'
    hemi_html = browser.html
    hemi_soup = BeautifulSoup(hemi_html, 'html.parser')

    #find the url
    hemi_image = hemi_soup.find('img', class_='wide-image')['src']
    hemi_image_url = url+hemi_image
     
    # Dictionary to be inserted into list
    hemi_dict = {
        'title' : hemi_name,
        'img_url' : hemi_image_url
    }
      
    #Append dictionary to list
    hemisphere_image_urls.append(hemi_dict)
        
    #Close Broswer
    browser.quit()
        

In [None]:
# Create an empty dict for listings that we can save to Mongo
# Populate the dictionary with key-value pairs for Mars info and images

MarsInfo = {}
MarsInfo["NewsArtTitle"] = news
MarsInfo["NewsArtDesc"] = news_p 
MarsInfo["FeatImage"] = featured_image_url 
MarsInfo["FactTable"] = html_table
MarsInfo["HemiImages"] = hemisphere_image_urls 