# Module 12 Challenge
## Deliverable 1: Scrape Titles and Preview Text from Mars News

In [33]:
# Import Splinter and BeautifulSoup
from splinter import Browser
from bs4 import BeautifulSoup as soup
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import json
from google.colab import files

In [18]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

[WDM] - Downloading: 100%|██████████| 8.72M/8.72M [00:00<00:00, 15.7MB/s]


### Step 1: Visit the Website

1. Use automated browsing to visit the [Mars NASA news site](https://redplanetscience.com). Inspect the page to identify which elements to scrape.

      > **Hint** To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.

In [19]:
# Visit the Mars NASA news site: https://redplanetscience.com
url = 'https://redplanetscience.com'
browser.visit(url)
browser.is_element_present_by_css('div.list_text', wait_time=1)

True

### Step 2: Scrape the Website

Create a Beautiful Soup object and use it to extract text elements from the website.

In [20]:
# Create a Beautiful Soup object
html = browser.html

In [21]:
mars_soup = soup(html, "html.parser")

In [22]:
# Extract all the text elements
title = mars_soup.find_all('div', class_='content_title')
preview = mars_soup.find_all('div', class_='article_teaser_body')

### Above does not work very well. Trying something else.

### Extract all text elements.

In [23]:
news_data = mars_soup.find_all('div', class_='col-md-8')
# confirm articles are retrieved
print('Article Count: ',len(news_data))
for article in news_data:
    print(article)

Article Count:  15
<div class="col-md-8">
<div class="list_text">
<div class="list_date">January 23, 2023</div>
<div class="content_title">NASA's Perseverance Rover 100 Days Out</div>
<div class="article_teaser_body">Mark your calendars: The agency's latest rover has only about 8,640,000 seconds to go before it touches down on the Red Planet, becoming history's next Mars car.</div>
</div>
</div>
<div class="col-md-8">
<div class="list_text">
<div class="list_date">January 22, 2023</div>
<div class="content_title">MAVEN Maps Electric Currents around Mars that are Fundamental to Atmospheric Loss</div>
<div class="article_teaser_body">Five years after NASA’s MAVEN spacecraft entered into orbit around Mars, data from the mission has led to the creation of a map of electric current systems in the Martian atmosphere.</div>
</div>
</div>
<div class="col-md-8">
<div class="list_text">
<div class="list_date">January 20, 2023</div>
<div class="content_title">NASA Moves Forward With Campaign to R

In [24]:
for article in news_data:
    # get article title
    title = article.find('div', class_='content_title')
    # get article preview
    preview = article.find('div', class_='article_teaser_body')
    # confirm results
    print(title.text, preview.text)

NASA's Perseverance Rover 100 Days Out Mark your calendars: The agency's latest rover has only about 8,640,000 seconds to go before it touches down on the Red Planet, becoming history's next Mars car.
MAVEN Maps Electric Currents around Mars that are Fundamental to Atmospheric Loss Five years after NASA’s MAVEN spacecraft entered into orbit around Mars, data from the mission has led to the creation of a map of electric current systems in the Martian atmosphere.
NASA Moves Forward With Campaign to Return Mars Samples to Earth During this next phase, the program will mature critical technologies and make critical design decisions as well as assess industry partnerships.
NASA's Mars Helicopter Attached to Mars 2020 Rover  The helicopter will be first aircraft to perform flight tests on another planet.
A New Video Captures the Science of NASA's Perseverance Mars Rover With a targeted launch date of July 30, the next robotic scientist NASA is sending to the to the Red Planet has big ambitio

### Step 3: Store the Results

Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

* Store each title-and-preview pair in a Python dictionary. And, give each dictionary two keys: `title` and `preview`. An example is the following:

  ```python
  {'title': "Mars Rover Begins Mission!", 
        'preview': "NASA's Mars Rover begins a multiyear mission to collect data about the little-explored planet."}
  ```

* Store all the dictionaries in a Python list.

* Print the list in your notebook.

In [25]:
# Create an empty list to store the dictionaries
news = []

In [26]:
# Loop through the text elements
# Extract the title and preview text from the elements
# Store each title and preview pair in a dictionary
# Add the dictionary to the list
for article in news_data:
    # Store the elements in a dictionary
    article_data = {}
    # confirm article title 
    title = article.find('div', class_='content_title')    
    # confirm article preview
    preview = article.find('div', class_='article_teaser_body')
    # add data to its dict
    article_data['title'] = title.text
    article_data['preview'] = preview.text
    # add dict to the list
    news.append(article_data)

# this snippet present because I wanted a different view on an error; leaving it cause it's pretty
df = pd.DataFrame.from_dict(news)
df

Unnamed: 0,title,preview
0,NASA's Perseverance Rover 100 Days Out,Mark your calendars: The agency's latest rover...
1,MAVEN Maps Electric Currents around Mars that ...,Five years after NASA’s MAVEN spacecraft enter...
2,NASA Moves Forward With Campaign to Return Mar...,"During this next phase, the program will matur..."
3,NASA's Mars Helicopter Attached to Mars 2020 R...,The helicopter will be first aircraft to perfo...
4,A New Video Captures the Science of NASA's Per...,"With a targeted launch date of July 30, the ne..."
5,NASA's Curiosity Mars Rover Takes a New Selfie...,Along with capturing an image before its steep...
6,NASA's Push to Save the Mars InSight Lander's ...,The scoop on the end of the spacecraft's robot...
7,5 Hidden Gems Are Riding Aboard NASA's Perseve...,"The symbols, mottos, and small objects added t..."
8,NASA Prepares for Moon and Mars With New Addit...,Robotic spacecraft will be able to communicate...
9,Robotic Toolkit Added to NASA's Mars 2020 Rover,"The bit carousel, which lies at the heart of t..."


In [32]:
news

[{'title': "NASA's Perseverance Rover 100 Days Out",
  'preview': "Mark your calendars: The agency's latest rover has only about 8,640,000 seconds to go before it touches down on the Red Planet, becoming history's next Mars car."},
 {'title': 'MAVEN Maps Electric Currents around Mars that are Fundamental to Atmospheric Loss',
  'preview': 'Five years after NASA’s MAVEN spacecraft entered into orbit around Mars, data from the mission has led to the creation of a map of electric current systems in the Martian atmosphere.'},
 {'title': 'NASA Moves Forward With Campaign to Return Mars Samples to Earth',
  'preview': 'During this next phase, the program will mature critical technologies and make critical design decisions as well as assess industry partnerships.'},
 {'title': "NASA's Mars Helicopter Attached to Mars 2020 Rover ",
  'preview': 'The helicopter will be first aircraft to perform flight tests on another planet.'},
 {'title': "A New Video Captures the Science of NASA's Perseveranc

In [28]:
browser.quit()

### (Optional) Step 4: Export the Data

Optionally, store the scraped data in a file or database (to ease sharing the data with others). To do so, export the scraped data to either a JSON file or a MongoDB database.

In [41]:
# Export data to JSON
json.dumps(news)

'[{"title": "NASA\'s Perseverance Rover 100 Days Out", "preview": "Mark your calendars: The agency\'s latest rover has only about 8,640,000 seconds to go before it touches down on the Red Planet, becoming history\'s next Mars car."}, {"title": "MAVEN Maps Electric Currents around Mars that are Fundamental to Atmospheric Loss", "preview": "Five years after NASA\\u2019s MAVEN spacecraft entered into orbit around Mars, data from the mission has led to the creation of a map of electric current systems in the Martian atmosphere."}, {"title": "NASA Moves Forward With Campaign to Return Mars Samples to Earth", "preview": "During this next phase, the program will mature critical technologies and make critical design decisions as well as assess industry partnerships."}, {"title": "NASA\'s Mars Helicopter Attached to Mars 2020 Rover ", "preview": "The helicopter will be first aircraft to perform flight tests on another planet."}, {"title": "A New Video Captures the Science of NASA\'s Perseveranc

In [30]:
# Export data to MongoDB
