<h1> Mission To Mars</h1>

<h4> Web scraping various sites to obtain information on Mars. </h4>

In [1]:
import lxml
import pandas as pd
import pymongo 
import requests 
from bs4 import BeautifulSoup as bs
from selenium import webdriver

<h3>Nasa Mars News</h3>

Had issues getting Requests to pull the info so I had to use Selenium + lxml

In [2]:
# scrape Mars News 
nasa_url = 'https://mars.nasa.gov/news/'

driver = webdriver.Firefox()
driver.get(nasa_url)
driver.implicitly_wait(5)

nasa_soup = bs(driver.page_source,'lxml')
driver.close()

In [3]:
# find latest news title and paragraph text
news_title = nasa_soup.body.find('div', class_='content_title').text
news_paragraph = nasa_soup.body.find('div', class_='article_teaser_body').text

print(news_title)
print(news_paragraph)

Mars Now
After a months-long contest among students to name NASA's newest Mars rover, the agency will reveal the winning name — and the winning student — this Thursday. 


<h3>JPL Mars Space Images - Featured Image</h3>

In [4]:
# scrape JPL
jpl_base = 'https://www.jpl.nasa.gov'
jpl_url = jpl_base+'/spaceimages/?search=&category=Mars'
response = requests.get(jpl_url)

jpl_soup = bs(response.text, 'html.parser')

In [5]:
# find featured image 
jpl_soup.find('article')['style']

"background-image: url('/spaceimages/images/wallpaper/PIA22574-1920x1200.jpg');"

In [6]:
# strip off everything that isnt the image path  
image_str = jpl_soup.find('article')['style'].split(' ')[1].strip("url").strip("('');")
image_str

'/spaceimages/images/wallpaper/PIA22574-1920x1200.jpg'

In [7]:
# concatenate base url with image path 
featured_image_url = jpl_base + image_str
featured_image_url

'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA22574-1920x1200.jpg'

<h3>Mars Weather</h3>

In [8]:
# scrape Twitter
weather_url = 'https://twitter.com/marswxreport?lang=en'
response = requests.get(weather_url)

weather_soup = bs(response.text, 'html.parser')

In [9]:
# find most recent weather report  
tweets = weather_soup.find_all('div', class_ = 'js-tweet-text-container')

for tweet in tweets:
    mars_weather = tweet.find('p').text
    break

print(mars_weather)

InSight sol 450 (2020-03-02) low -93.5ºC (-136.4ºF) high -10.4ºC (13.3ºF)
winds from the SSW at 5.5 m/s (12.4 mph) gusting to 20.6 m/s (46.1 mph)
pressure at 6.30 hPapic.twitter.com/82lzRqibcC


<h3>Mars Facts</h3>

In [18]:
# scrape Mars Facts 
facts_url = 'http://space-facts.com/mars/'
response = requests.get(facts_url)

facts_soup = bs(response.text, 'html.parser')

In [27]:
facts_table = pd.read_html(facts_url)
mars_table = facts_table[0]

mars_table.columns = ["Description", "Value"]
mars_table.set_index("Description", inplace=True)

mars_table

Unnamed: 0_level_0,Value
Description,Unnamed: 1_level_1
Equatorial Diameter:,"6,792 km"
Polar Diameter:,"6,752 km"
Mass:,6.39 × 10^23 kg (0.11 Earths)
Moons:,2 (Phobos & Deimos)
Orbit Distance:,"227,943,824 km (1.38 AU)"
Orbit Period:,687 days (1.9 years)
Surface Temperature:,-87 to -5 °C
First Record:,2nd millennium BC
Recorded By:,Egyptian astronomers


In [28]:
# convert data to html table string 
mars_table_html = mars_table.to_html(header=False, index=False, justify="left")
mars_table_html

'<table border="1" class="dataframe">\n  <tbody>\n    <tr>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <td>6.39 × 10^23 kg (0.11 Earths)</td>\n    </tr>\n    <tr>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <td>227,943,824 km (1.38 AU)</td>\n    </tr>\n    <tr>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <td>-87 to -5 °C</td>\n    </tr>\n    <tr>\n      <td>2nd millennium BC</td>\n    </tr>\n    <tr>\n      <td>Egyptian astronomers</td>\n    </tr>\n  </tbody>\n</table>'

<h3>Mars Hemispheres</h3>

In [74]:
# scrape -- Requests 
hemi_url = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"
response = requests.get(hemi_url)

hemi_soup = bs(response.text, 'html.parser')

In [76]:
# hemisphere urls dictionary 
hemi_base = 'https://astrogeology.usgs.gov'

hemi_imgs = []
hemi_imgs = [
    {
    'Title': hemi.text.strip('Enhanced'),
    'URL': (hemi_base + hemi['href']),
    }
    for hemi in hemi_soup.find_all('a', class_='itemLink product-item')
]
hemi_imgs

[{'Title': 'Cerberus Hemisphere ',
  'URL': 'https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced'},
 {'Title': 'Schiaparelli Hemisphere ',
  'URL': 'https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced'},
 {'Title': 'Syrtis Major Hemisphere ',
  'URL': 'https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced'},
 {'Title': 'Valles Marineris Hemisphere ',
  'URL': 'https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced'}]

In [89]:
# less optimized way 
images = hemi_soup.find_all(class_='itemLink product-item')

# find image titles and iterate thru 
h3_tags = hemi_soup.find_all('h3')
titles = [t.text for t in h3_tags]

for title in titles:
    print(title)
    
for image in images:
    image_url = hemi_base + image['href']
    print(image_url)      

Cerberus Hemisphere Enhanced
Schiaparelli Hemisphere Enhanced
Syrtis Major Hemisphere Enhanced
Valles Marineris Hemisphere Enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced
