# Module 12 Challenge
## Deliverable 1: Scrape Titles and Preview Text from Mars News

In [43]:
# Import Splinter and BeautifulSoup
from splinter import Browser
from bs4 import BeautifulSoup as bs
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import pymongo
import requests
import json
import pprint

In [37]:
conn= "mongodb://localhost:27017"
client=pymongo.MongoClient(conn)

In [38]:
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)



### Step 1: Visit the Website

1. Use automated browsing to visit the [Mars NASA news site](https://redplanetscience.com). Inspect the page to identify which elements to scrape.

      > **Hint** To identify which elements to scrape, you might want to inspect the page by using Chrome DevTools.

In [39]:
# Visit the Mars NASA news site: https://redplanetscience.com
url = "https://redplanetscience.com"
browser.visit(url)


### Step 2: Scrape the Website

Create a Beautiful Soup object and use it to extract text elements from the website.

In [40]:
# Create a Beautiful Soup object
mars_soup = bs(browser.html, 'html.parser')

In [41]:
# Scrape the URL for 'title" elements
title_news = mars_soup.find_all("div", class_='content_title')
print(title_news)



[<div class="content_title">NASA's Mars 2020 Comes Full Circle</div>, <div class="content_title">Screening Soon: 'The Pathfinders' Trains Lens on Mars</div>, <div class="content_title">My Culture, My Voice</div>, <div class="content_title">NASA's Mars Perseverance Rover Passes Flight Readiness Review</div>, <div class="content_title">NASA InSight's 'Mole' Is Out of Sight</div>, <div class="content_title">InSight's 'Mole' Team Peers into the Pit</div>, <div class="content_title">NASA's Mars 2020 Rover Closer to Getting Its Name</div>, <div class="content_title">AI Is Helping Scientists Discover Fresh Craters on Mars</div>, <div class="content_title">What's Mars Solar Conjunction, and Why Does It Matter?</div>, <div class="content_title">Heat and Dust Help Launch Martian Water Into Space, Scientists Find</div>, <div class="content_title">Nine Finalists Chosen in NASA's Mars 2020 Rover Naming Contest</div>, <div class="content_title">NASA's Mars 2020 Rover Completes Its First Drive</div>,

In [55]:
# Scrape the URL for 'paragraph' elements
news_para = mars_soup.find_all("div", class_="article_teaser_body")
print(news_para)

[<div class="article_teaser_body">Aiming to pinpoint the Martian vehicle's center of gravity, engineers took NASA's 2,300-pound Mars 2020 rover for a spin in the clean room at JPL. </div>, <div class="article_teaser_body">With the Mars 2020 mission ramping up, the documentary — the first of four about past JPL missions to the Red Planet to be shown at Caltech — tells a gripping backstory.</div>, <div class="article_teaser_body">In honor of Hispanic Heritage Month, Christina Hernandez, an instrument engineer on the Mars 2020 mission, talks about her childhood and journey to NASA.</div>, <div class="article_teaser_body">​The agency's Mars 2020 mission has one more big prelaunch review – the Launch Readiness Review, on July 27.</div>, <div class="article_teaser_body">Now that the heat probe is just below the Martian surface, InSight's arm will scoop some additional soil on top to help it keep digging so it can take Mars' temperature.</div>, <div class="article_teaser_body">Efforts to save

### Step 3: Store the Results

Extract the titles and preview text of the news articles that you scraped. Store the scraping results in Python data structures as follows:

* Store each title-and-preview pair in a Python dictionary. And, give each dictionary two keys: `title` and `preview`. An example is the following:

  ```python
  {'title': "Mars Rover Begins Mission!", 
        'preview': "NASA's Mars Rover begins a multiyear mission to collect data about the little-explored planet."}
  ```

* Store all the dictionaries in a Python list.

* Print the list in your notebook.

In [8]:
# Create an empty list to store the dictionaries
listings = []
results = mars_soup.find_all('div', class_='list_text')


In [10]:
# Loop through the text elements
for result in results:
    # Find the title and preview text from the elements
        # Extract the title of article
        title = result.find('div', class_='content_title')
        # Extract the article preview
        preview = result.find('div', class_='article_teaser_body')

        # Store each title and preview pair in a dictionary
        post = {
                'title': title.text,
                'preview': preview.text
                }

        # Add the dictionary to the list
        listings.append(post)

pprint(listings)

[{'title': 'NASA Moves Forward With Campaign to Return Mars Samples to Earth', 'preview': 'During this next phase, the program will mature critical technologies and make critical design decisions as well as assess industry partnerships.'}, {'title': 'Global Storms on Mars Launch Dust Towers Into the Sky', 'preview': 'A Mars Dust Tower Stands Out Dust storms are common on Mars. But every decade or so, something unpredictable happens: a series of runaway storms break out, covering the entire planet in a dusty haze.'}, {'title': "NASA's Mars Perseverance Rover Passes Flight Readiness Review", 'preview': "\u200bThe agency's Mars 2020 mission has one more big prelaunch review – the Launch Readiness Review, on July 27."}, {'title': "NASA's Mars Reconnaissance Orbiter Undergoes Memory Update", 'preview': 'Other orbiters will continue relaying data from Mars surface missions for a two-week period.'}, {'title': "Nine Finalists Chosen in NASA's Mars 2020 Rover Naming Contest", 'preview': "Nine f

In [30]:
browser.quit()

### (Optional) Step 4: Export the Data

Optionally, store the scraped data in a file or database (to ease sharing the data with others). To do so, export the scraped data to either a JSON file or a MongoDB database.

In [39]:
with open('Resources/listings.json', 'w', encoding='utf-8') as mars_file:
    json.dump(listings, mars_file, ensure_ascii=False, indent=4)
