# Scrape the data from Mars Facts

A <div /> tag with a class attribute of sidebar contains the side panel. But, a child element, which has the <table /> tag, contains the data that we want. This <table /> tag, in turn, has table and table-striped classes.

The html code for the table shows as follows:
<table class = "table table-striped"> <!--main container-->
   <tbody> <!--contains all the rows in the Mars Planet Profile section-->
      <tr> <!--each tr tag regers to a table row-->
         <th scope="row"> Equatorial Diameter:</th> <!--table header-->
         <td>6,792 km</td> <!--table data-->
      </tr>
      <tr>...</tr>
      <tr>...</tr>
      <tr>...</tr>
      <tr>...</tr>
      <tr>...</tr>
      <tr>...</tr>
      <tr>...</tr>
      <tr>...</tr>
   </tbody>
</table>

In [1]:
from splinter import Browser
from bs4 import BeautifulSoup as soup
from webdriver_manager.chrome import ChromeDriverManager

# Set up Splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False)

# Visit the Mars Facts site
url = 'https://galaxyfacts-mars.com/'
browser.visit(url)

html = browser.html
html_soup = soup(html, 'html.parser')

[WDM] - Downloading: 100%|██████████| 6.46M/6.46M [00:00<00:00, 9.78MB/s]


In [2]:
# Use Beautiful Soup to find the table using the class attribute 'table-striped' to identify the table element
table = html_soup.find('table', class_='table-striped')

In [4]:
# Store the table in a python data structure 

mars_facts = {} # initialize empty dictionary to hold table data 
rows = table.find_all('tr') # find all the table rows tr and save to row variable
for row in rows: 
    row_heading = row.find('th').text # table's header th get saved to the row_heading variable 
    row_data = row.find('td').text.strip() # extract table data td. Strip for white space removal. Save to row_data
    mars_facts[row_heading] = row_data # row becomes entry to mars_facts dictionary. Table heading is key, table data is value. 

print(mars_facts)

{'Equatorial Diameter:': '6,792 km', 'Polar Diameter:': '6,752 km', 'Mass:': '6.39 × 10^23 kg (0.11 Earths)', 'Moons:': '2 ( Phobos  &  Deimos )', 'Orbit Distance:': '227,943,824 km (1.38 AU)', 'Orbit Period:': '687 days (1.9 years)', 'Surface Temperature:': '-87 to -5 °C', 'First Record:': '2nd millennium BC', 'Recorded By:': 'Egyptian astronomers'}


In [5]:
browser.quit()

In [6]:
# Scrape table using pandas

import pandas as pd

In [7]:
# Create dataframe from the HTML table 

df = pd.read_html('https://galaxyfacts-mars.com') # read_html searches for tables and returns a list of those that exist in html code of the webpage.
df

[                         0                1                2
 0  Mars - Earth Comparison             Mars            Earth
 1                Diameter:         6,779 km        12,742 km
 2                    Mass:  6.39 × 10^23 kg  5.97 × 10^24 kg
 3                   Moons:                2                1
 4       Distance from Sun:   227,943,824 km   149,598,262 km
 5          Length of Year:   687 Earth days      365.24 days
 6             Temperature:     -87 to -5 °C      -88 to 58°C,
                       0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.39 × 10^23 kg (0.11 Earths)
 3                Moons:          2 ( Phobos & Deimos )
 4       Orbit Distance:       227,943,824 km (1.38 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                   -87 to -5 °C
 7         First Record:              2nd millennium BC

In [8]:
# Select for the first table only 

mars_df = df[0]
mars_df

Unnamed: 0,0,1,2
0,Mars - Earth Comparison,Mars,Earth
1,Diameter:,"6,779 km","12,742 km"
2,Mass:,6.39 × 10^23 kg,5.97 × 10^24 kg
3,Moons:,2,1
4,Distance from Sun:,"227,943,824 km","149,598,262 km"
5,Length of Year:,687 Earth days,365.24 days
6,Temperature:,-87 to -5 °C,-88 to 58°C


In [10]:
# Rename the columns in the table 

mars_df.columns=['description', 'Mars', 'Earth']
mars_df

Unnamed: 0,description,Mars,Earth
0,Mars - Earth Comparison,Mars,Earth
1,Diameter:,"6,779 km","12,742 km"
2,Mass:,6.39 × 10^23 kg,5.97 × 10^24 kg
3,Moons:,2,1
4,Distance from Sun:,"227,943,824 km","149,598,262 km"
5,Length of Year:,687 Earth days,365.24 days
6,Temperature:,-87 to -5 °C,-88 to 58°C


In [11]:
# Eliminate the first row of the df
mars_df = mars_df.iloc[1:]
mars_df

Unnamed: 0,description,Mars,Earth
1,Diameter:,"6,779 km","12,742 km"
2,Mass:,6.39 × 10^23 kg,5.97 × 10^24 kg
3,Moons:,2,1
4,Distance from Sun:,"227,943,824 km","149,598,262 km"
5,Length of Year:,687 Earth days,365.24 days
6,Temperature:,-87 to -5 °C,-88 to 58°C
