# Module 12 Challenge
## Deliverable 1: Scrape Titles and Preview Text from Mars News

In [60]:
# Import Splinter and BeautifulSoup
from splinter import Browser
from bs4 import BeautifulSoup as soup
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import json 

In [2]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

### Step 1: Visit the Website

1. Use automated browsing to visit the [Mars NASA news site](https://redplanetscience.com). Inspect the page to identify which elements to scrape.

      > **Hint** To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.

In [3]:
# Visit the Mars NASA news site: https://redplanetscience.com
url = 'https://redplanetscience.com'
browser.visit(url)

# Optional delay for loading the page
browser.is_element_present_by_css('div.list_text', wait_time=1)

True

### Step 2: Scrape the Website

Create a Beautiful Soup object and use it to extract text elements from the website.

In [6]:
# 1. Create a Beautiful Soup object
html = browser.html
news_soup = soup(html, 'html.parser')
slide_elem = news_soup.select_one('div.list_text')

In [11]:
# 2. Extract all the text elements
table = news_soup.find('div', class_='col-md-8').text
print(table)



January 5, 2023
NASA's New Mars Rover Will Use X-Rays to Hunt Fossils
PIXL, an instrument on the end of the Perseverance rover's arm, will search for chemical fingerprints left by ancient microbes.




In [8]:
# assign the title and the summary text to variables that we can reference later
title_elem = slide_elem.find('div', class_='content_title')
print(title_elem)

<div class="content_title">NASA's New Mars Rover Will Use X-Rays to Hunt Fossils</div>


In [9]:
# Extracting the title text
title = title_elem.get_text()
print(title)

NASA's New Mars Rover Will Use X-Rays to Hunt Fossils


In [10]:
# Use the parent element to find the paragraph text
# and extract the paragraph text
preview = slide_elem.find('div', class_='article_teaser_body').text
preview

"PIXL, an instrument on the end of the Perseverance rover's arm, will search for chemical fingerprints left by ancient microbes."

### Step 3: Store the Results

Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

* Store each title-and-preview pair in a Python dictionary. And, give each dictionary two keys: `title` and `preview`. An example is the following:

  ```python
  {'title': "Mars Rover Begins Mission!", 
        'preview': "NASA's Mars Rover begins a multiyear mission to collect data about the little-explored planet."}
  ```

* Store all the dictionaries in a Python list.

* Print the list in your notebook.

In [54]:
# Find all divs that contain a content title (news)
news_divs_title = news_soup.find_all('div', class_="content_title")
print(news_divs_title)

[<div class="content_title">NASA's New Mars Rover Will Use X-Rays to Hunt Fossils</div>, <div class="content_title">NASA Readies Perseverance Mars Rover's Earthly Twin </div>, <div class="content_title">Global Storms on Mars Launch Dust Towers Into the Sky</div>, <div class="content_title">3 Things We've Learned From NASA's Mars InSight </div>, <div class="content_title">Sensors on Mars 2020 Spacecraft Answer Long-Distance Call From Earth</div>, <div class="content_title">NASA's Perseverance Rover Is Midway to Mars </div>, <div class="content_title">NASA Adds Return Sample Scientists to Mars 2020 Leadership Team</div>, <div class="content_title">NASA's MAVEN Maps Winds in the Martian Upper Atmosphere that Mirror the Terrain Below and Gives Clues to Martian Climate</div>, <div class="content_title">Robotic Toolkit Added to NASA's Mars 2020 Rover</div>, <div class="content_title">Mars Scientists Investigate Ancient Life in Australia</div>, <div class="content_title">NASA's Push to Save t

In [55]:
# Find all divs that contain a news content (body)
news_divs_body = news_soup.find_all('div', class_="article_teaser_body")
print(news_divs_body)

[<div class="article_teaser_body">PIXL, an instrument on the end of the Perseverance rover's arm, will search for chemical fingerprints left by ancient microbes.</div>, <div class="article_teaser_body">Did you know NASA's next Mars rover has a nearly identical sibling on Earth for testing? Even better, it's about to roll for the first time through a replica Martian landscape.</div>, <div class="article_teaser_body">A Mars Dust Tower Stands Out Dust storms are common on Mars. But every decade or so, something unpredictable happens: a series of runaway storms break out, covering the entire planet in a dusty haze.</div>, <div class="article_teaser_body">Scientists are finding new mysteries since the geophysics mission landed two years ago.</div>, <div class="article_teaser_body">Instruments tailored to collect data during the descent of NASA's next rover through the Red Planet's atmosphere have been checked in flight.</div>, <div class="article_teaser_body">Sometimes half measures can be 

In [69]:
# Create an empty list to store the dictionaries
news_list = []

# Loop through the text elements

for news in news_divs:
    news_title = news_divs_title
    news_content = news_divs_body
    
    news_dict = {}
    news_dict["title"] = news_title
    news_dict["preview"] = news_content
    
    news_list.append(news_dict)

news_list

[{'title': [<div class="content_title">NASA's New Mars Rover Will Use X-Rays to Hunt Fossils</div>,
   <div class="content_title">NASA Readies Perseverance Mars Rover's Earthly Twin </div>,
   <div class="content_title">Global Storms on Mars Launch Dust Towers Into the Sky</div>,
   <div class="content_title">3 Things We've Learned From NASA's Mars InSight </div>,
   <div class="content_title">Sensors on Mars 2020 Spacecraft Answer Long-Distance Call From Earth</div>,
   <div class="content_title">NASA's Perseverance Rover Is Midway to Mars </div>,
   <div class="content_title">NASA Adds Return Sample Scientists to Mars 2020 Leadership Team</div>,
   <div class="content_title">NASA's MAVEN Maps Winds in the Martian Upper Atmosphere that Mirror the Terrain Below and Gives Clues to Martian Climate</div>,
   <div class="content_title">Robotic Toolkit Added to NASA's Mars 2020 Rover</div>,
   <div class="content_title">Mars Scientists Investigate Ancient Life in Australia</div>,
   <div cl

### (Optional) Step 4: Export the Data

Optionally, store the scraped data in a file or database (to ease sharing the data with others). To do so, export the scraped data to either a JSON file or a MongoDB database.

In [73]:
mars_news = pd.DataFrame(news_dict)
mars_news

Unnamed: 0,title,preview
0,[NASA's New Mars Rover Will Use X-Rays to Hunt...,"[PIXL, an instrument on the end of the Perseve..."
1,[NASA Readies Perseverance Mars Rover's Earthl...,[Did you know NASA's next Mars rover has a nea...
2,[Global Storms on Mars Launch Dust Towers Into...,[A Mars Dust Tower Stands Out Dust storms are ...
3,[3 Things We've Learned From NASA's Mars InSig...,[Scientists are finding new mysteries since th...
4,[Sensors on Mars 2020 Spacecraft Answer Long-D...,[Instruments tailored to collect data during t...
5,[NASA's Perseverance Rover Is Midway to Mars ],[Sometimes half measures can be a good thing –...
6,[NASA Adds Return Sample Scientists to Mars 20...,[The leadership council for Mars 2020 science ...
7,[NASA's MAVEN Maps Winds in the Martian Upper ...,[Researchers have created the first map of win...
8,[Robotic Toolkit Added to NASA's Mars 2020 Rover],"[The bit carousel, which lies at the heart of ..."
9,[Mars Scientists Investigate Ancient Life in A...,[Teams with NASA's Mars 2020 and ESA's ExoMars...


In [59]:
browser.quit()