# Homework 2: Exploring Solar System Bodies (Pandas introduction)
Welcome to Assignment 12!

In this assignment, we will analyze data about celestial bodies in the solar system using Python, NumPy, and Pandas. The goals of this assignment are to:

 - Open a simple dataset formatted as JSON using pandas.
 - Apply simple statistical analysis to real-world data.
 - Refine Python programming skills through hands-on practice.
 - Ensure you can run Python and Python notebook environments (e.g., Jupyter Notebook, JupyterLab, Collab, VSCode) and troubleshoot any setup issues.

A key part of this homework is verifying that you can successfully run Python notebooks. If you encounter any difficulties, seek help from the instructor or AIs. Additionally, use Slack to ask questions or share insights. If you see a classmate struggling, helping them out will be great for a collaborative learning environment (and may count extra points in engagement 😀).

In [None]:
# if you are running this notebook in your local machine,
# make sure you have all the dependencies installed
# uncomment the following lines to install the dependencies
# This may be needed if you are running this notebook in online
# environments such as Google Colab
#
# !pip install numpy pandas
#
# also copy the data file to the same directory as this notebook
# and update the paths accordingly

### Instructions

1. Follow the instructions on how to setup your Python and Jupyter (or VSCode) environment and cloning or downloading our repository. Instructions can be found in the class notes.
2. Ensure that you have Python, Jupyter Notebook, and the necessary libraries installed (`NumPy` and `Pandas`).
3. Load the dataset `Datasets/sol_data.json` into a Pandas DataFrame.
4. Answer the questions below by writing Python code.
5. No plots or visualizations are required—your insights should come from code-based analysis and outputs.

### Dataset Overview
The dataset contains information about celestial objects, including:
- **isPlanet**: Indicates whether the object is a planet (`True` or `False`).
- **isDwarfPlanet**: Indicates whether the object is a dwarf planet (`True` or `False`).
- **orbit_type**: Classifies the object as "Primary" (planets) or "Secondary" (moons).
- Physical and orbital properties, such as **mass**, **density**, **meanRadius**, **gravity**, **sideralOrbit**, and more.


### Submission Guidelines

- Submit your completed notebook as a HTML export, or a PDF file.

To export to HTML, if you are on Jupyter, select `File` > `Export Notebook As` > `HTML`.

If you are on VSCode, you can use the `Jupyter: Export to HTML` command.
 - Open the command palette (Ctrl+Shift+P or Cmd+Shift+P on Mac).
    - Search for `Jupyter: Export to HTML`.
    - Save the HTML file to your computer and submit it via Canvas.

---

> **Hint:** If you are learning pandas, check out our tutorials or the official documentation:
> - [Pandas Getting started](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)
> - [Pandas DataFrame API Documentation](https://pandas.pydata.org/docs/reference/frame.html)
> - [Our lecture on Pandas](https://filipinascimento.github.io/usable_ai/panda_basics)
> 
> 
> **Using Generative AI Responsibly**
>
> You're welcome to use Generative AI to assist your learning, but focus on understanding the concepts rather than just solving the assignment. For example:
>
> - Instead of asking: `What's the code to count moons orbiting each planet?`
> - Try asking: `How can I use Pandas to group and count values? Can you provide examples? Can you explain the steps?`
>
> This way, you will learn how the solution works while building your skills. Remember to give context to the generative AI, so it can better assist you. Talk to the instructor and AIs if you have any questions or need insights.

In [None]:
import pandas as pd
import numpy as np

# Load the dataset
data = pd.read_json('../../Datasets/sol_data.json')
# The ../../ are needed to go back two levels in the directory structure.
# Note that the path is relative to the location of the notebook file. Double check
# if the path is correct based on your system
data.head()

### 1. General Information

- How many objects are in the dataset?
- How many are planets? How many are moons?


In [None]:
# Total number of objects
# Fill in code to calculate total number of objects

# Number of planets
# Fill in code to calculate number of planets

# Number of moons
# Fill in code to calculate number of moons

> **Hint**: By moon we mean a natural satellite of a planet or another object in the solar system. Take a look at the columns and see if you can identify the criteria for classifying an object as a moon. Ask the instructor or AIs for help if needed. 

### 2. Planets

- What is the mean density of all planets?
- Which planet has the highest surface gravity, and what is its gravity value?
- List all planets in descending order of their mass.


In [None]:

# Mean density of all planets
# Fill in code

# Planet with the highest surface gravity
# Fill in code

# Planets by descending mass
# Fill in code


### 3. Moons (Satellites)
- How many moons orbit each planet? Present this as a table or dictionary.
- What is the average radius (meanRadius) of all moons?
- Compare the average surface gravity of moons to that of planets.


In [None]:
# Number of moons orbiting each planet
# Fill in code

# Average radius of all moons
# Fill in code

# Compare average surface gravity of moons vs. planets
# Fill in code


### 4. Orbital Properties

- Which object has the highest orbital eccentricity, and what is its value?
- Calculate the average semi-major axis (semimajorAxis) for planets and compare it to that of moons.
- Identify the moon with the shortest orbital period (sideralOrbit) and the planet it orbits.


In [None]:
# Highest orbital eccentricity
# Fill in code

# Average semi-major axis of planets vs. moons
# Fill in code

# Moon with the shortest orbital period
# Fill in code

### 5. Discovery Dates

- How many objects have recorded discovery dates?
- Which is the oldest discovered moon (except ours) for which we have recorded discovery dates, and when was it discovered?

> Look at the format of dates in the dataset. You will find NA values for objects without recorded discovery dates. Also some dates are just a year, while others are more precise. Complete dates are formatted as `DD/MM/YYYY` (e.g, 12/04/1997), while years are formatted as `YYYY`, e.g., `1997`. Finally some dates may have `??` in place of day or months, which should be cleaned up. For instance by converting `??/??/1997` to `01/01/1997` or `??/04/1997` to `01/04/1997`. 

> **Hint**: Pandas `.to_datetime()` does not support dates before 1600. I recommend to create a function to clean the dates and use the `.apply()` to run. For example, first ignore NA values, then convert the valid complete dates while handling the years by padding them to a full date format if needed (like Jan 1st). Alternatively, you can use pd.period.

In [None]:
# Example of how to parse and clean the strings for the assignment
def preprocess_dates(date_string):
    # conver to YYYY-MM-DD
    if pd.isna(date_string):
        return pd.NA
    
    # replace ?? by 01
    date_string = date_string.replace('??', '01')

    # add 01/01 if only year is provided
    if len(date_string) == 4:
        date_string = '01/01/' + date_string
    
    # transform to YYYY-MM-DD
    date_splitted = date_string.split('/')

    # but only if the string has 3 parts (day, month, year)
    if len(date_splitted) == 3:
        day = date_splitted[0]
        month = date_splitted[1]
        year = date_splitted[2]
        return f"{year}-{month}-{day}"
        # or using pandas Period (pd.Period)
        # return pd.Period(year=int(year), month=int(month), day=int(day), freq="D")
    else:
        return pd.NA

data['parsedDiscoveryDate'] = data['discoveryDate'].apply(preprocess_dates)

In [None]:
# Objects with discovery dates
# Fill in code

# Oldest discovered moon
# Fill in code

### 6. Advanced Analysis

- Calculate the average density of moons that orbit planets with a mass greater than Earth's mass (`5.97e24 kg`).
- Group all objects by their `orbit_type` and compute the average orbital eccentricity for each group.
- Identify the top 3 moons with the highest escape velocity (escape).


In [None]:
# Average density of moons orbiting planets with mass > Earth
# Fill in code

# Average orbital eccentricity by orbit_type
# Fill in code

# Top 3 moons with highest escape velocity
# Fill in code

### 7. Extra questions

1. How many moons have a mass less than 10% of Earth's moon? What percentage of all moons does this represent?
2. Calculate the ratio of moons to planets in the dataset. Which planet has the highest number of moons relative to its mass?
3. Group moons by their host planet and calculate the average density for each group. Which planet hosts moons with the highest average density?

In [None]:
# Moons with a mass less than Earth's moon and percentage
# Fill in code

# Ratio of moons to planets and planet with highest moon to mass ratio
# Fill in code

# Average density of moons per planet
# Fill in code
