Data scraping with BeautifulSoup in Jupyter Notebook delivers news articles about Mars and planet weather data
The Jupyter Notebook part_1_mars_news.ipynb visits a static site on Mars news and scrapes the site for article titles and the preview text for each article with Beautiful Soup. The list of this information is printed in the Jupyter Notebook.
The Jupyter Notebook part_2_mars_weather.ipynb visits the Mars Temperature Data Site (a static website) and scrapes the tables there with Beautiful Soup. This data is assembled into a Pandas DataFrame. The column headings are:
- id: the identification number of a single transmission from the Curiosity rover
- terrestrial_date: the date on Earth
- sol: the number of elapsed sols (Martian days) since Curiosity landed on Mars
- ls: the solar longitude
- month: the Martian month
- min_temp: the minimum temperature, in Celsius, of a single Martian day (sol)
- pressure: The atmospheric pressure at Curiosity's location
Data is converted to the appropriate datetime, int, or float data types. The data is analyzed to answer the following questions:
- How many months exist on Mars?
- How many Martian (and not Earth) days worth of data exist in the scraped dataset?
- What are the coldest and the warmest months on Mars (at the location of Curiosity)? The results are plotted as a bar chart.
- Which months have the lowest and the highest atmospheric pressure on Mars? This is plotted as a bar chart.
- About how many terrestrial (Earth) days exist in a Martian year? This answer is found by analyzing a line chart.
The data in the DataFrame is exported to a CSV file in the output folder.