# Introduction to pandas
Today, we'll be diving into an important part of scientific computing with Python - the pandas library.

pandas (derived from "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals) is a powerful, open source data manipulation library for Python. It provides data structures and functions needed to manipulate structured data, and is one of the most essential tools in the data scientist's toolbox.

## Why use pandas?

1. **Handling data**: pandas provides robust tools for reading and writing data in a multitude of formats including CSV, Excel, SQL databases, and more.

2. **Data manipulation**: pandas makes it easy to filter, select, and transform data. It also allows for merging and reshaping datasets in a straightforward manner.

3. **Data analysis**: pandas provides easy-to-use data structures and data manipulation functions along with capabilities for descriptive statistics, aggregation, handling missing data, and more.

4. **Compatibility**: pandas is built on top of Python and is integrated well with other libraries in the scientific Python ecosystem like NumPy, Matplotlib, and scikit-learn.

In this lesson, we'll explore the power of pandas using a dataset of rocket engines. We'll be using a `DataFrame` object in pandas, which is a two-dimensional tabular, column-oriented data structure with both row and column labels. If you've ever worked with data in Excel or SQL, you can think of a `DataFrame` as being similar to a spreadsheet or a SQL table.

Our goal is to get you comfortable with using pandas to load, manipulate, and analyze data. By the end of this lesson, you'll have the foundational knowledge you need to start using pandas in your own scientific computing projects.

In [None]:
import pandas

In [None]:
# Create a dictionary representing the data
data = {
    "Rocket Engine": [
        "F-1",
        "J-2",
        "SSME (RS-25)",
        "Merlin 1D",
        "Raptor",
        "RD-180",
        "RD-191",
        "YF-100",
        "LE-7A",
        "Vikas",
        "Vulcain 2",
    ],
    "Used in Mission": [
        "Saturn V",
        "Saturn V",
        "Space Shuttle",
        "Falcon 9",
        "Starship",
        "Atlas V",
        "Angara",
        "Long March 5, 6, 7",
        "H-II",
        "PSLV, GSLV",
        "Ariane 5",
    ],
    "Country": [
        "USA",
        "USA",
        "USA",
        "USA",
        "USA",
        "Russia",
        "Russia",
        "China",
        "Japan",
        "India",
        "Europe",
    ],
    "Manufacturer": [
        "Rocketdyne",
        "Rocketdyne",
        "Rocketdyne",
        "SpaceX",
        "SpaceX",
        "NPO Energomash",
        "NPO Energomash",
        "Academy of Aerospace Liquid Propulsion Technology",
        "Mitsubishi Heavy Industries",
        "ISRO",
        "ArianeGroup",
    ],
    "Type": [
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
        "Liquid",
    ],
    "Thrust (kN)": [6770, 1033, 1860, 914, 2200, 4152, 1923, 1200, 1098, 799.43, 1350],
    "Specific Impulse (s)": [
        "263 (Sea Level), 304 (Vacuum)",
        "421 (Vacuum)",
        "366 (Sea Level), 452 (Vacuum)",
        "311 (Sea Level), 348 (Vacuum)",
        "330 (Sea Level), 380 (Vacuum)",
        "311 (Sea Level), 338 (Vacuum)",
        "311 (Sea Level), 337.5 (Vacuum)",
        "300 (Sea Level), 335 (Vacuum)",
        "440 (Vacuum)",
        "293 (Sea Level)",
        "431 (Vacuum)",
    ],
    "Fuel": [
        "RP-1/LOX",
        "LH2/LOX",
        "LH2/LOX",
        "RP-1/LOX",
        "CH4/LOX",
        "RP-1/LOX",
        "RP-1/LOX",
        "RP-1/LOX",
        "LH2/LOX",
        "UH25/N2O4",
        "LH2/LOX",
    ],
}

# Create a DataFrame
rocket_df = pandas.DataFrame(data)

# Print the DataFrame
display(rocket_df)

In [None]:
# Summarize the dataframe with .describe and .info
display(
    rocket_df.info(),
    rocket_df.describe(),
)

In [None]:
# Select the 'Rocket Engine' column
rocket_df["Rocket Engine"]

In [None]:
# Select the row with index 3
rocket_df.loc[3]

In [None]:
# Convert 'Thrust (kN)' to 'Thrust (N)' by multiplying by 1000
rocket_df["Thrust (N)"] = rocket_df["Thrust (kN)"].apply(lambda x: x * 1000)

In [None]:
# Split 'Specific Impulse (s)' into two separate columns for 'Specific Impulse Sea Level (s)' and 'Specific Impulse Vacuum (s)'
rocket_df[
    ["Specific Impulse Sea Level (s)", "Specific Impulse Vacuum (s)"]
] = rocket_df["Specific Impulse (s)"].str.split(",", expand=True)
# Remove the descriptor from each new column and convert to float
rocket_df["Specific Impulse Sea Level (s)"] = (
    rocket_df["Specific Impulse Sea Level (s)"]
    .str.extract("(\d+)", expand=False)
    .astype("float")
)
rocket_df["Specific Impulse Vacuum (s)"] = (
    rocket_df["Specific Impulse Vacuum (s)"]
    .str.extract("(\d+)", expand=False)
    .astype("float")
)
# Drop the original 'Specific Impulse (s)' column
rocket_df = rocket_df.drop(columns=["Specific Impulse (s)"])

## Applying Functions to DataFrames with Right Triangle Examples

Now, we are going to see how to apply a function to a DataFrame. Applying functions can be extremely useful for carrying out complex calculations on datasets. We are going to demonstrate this using the example of a right triangle.

As we know, in a right triangle, the square of the length of the hypotenuse (side `c`) is equal to the sum of the squares of the other two sides (sides `a` and `b`). This is known as the Pythagorean theorem and can be written as c² = a² + b².

In this exercise, we have a dataset of right triangles with the lengths of sides `a` and `b`. We will create a function to calculate the length of side `c` (the hypotenuse) and then use the `apply` function in pandas to apply this calculation to our DataFrame.

This will give us a practical example of how you can use pandas to easily apply complex mathematical formulas to large datasets.

In [None]:
# Prepare data for right triangles
triangle_data = {
    "Triangle": ["Triangle 1", "Triangle 2", "Triangle 3", "Triangle 4", "Triangle 5"],
    "a": [3, 5, 9, 12, 7],
    "b": [4, 12, 40, 5, 24],
}

# Convert the dictionary into DataFrame
df_triangles = pandas.DataFrame(triangle_data)

## Exercise
In the cell below, correct the code to calculate the length of side `c` (the hypotenuse) and then use the `apply` function in pandas to apply this calculation to our DataFrame.

```python
def ____(____):
    return ____

df_triangles['c'] = df_triangles._____(____, axis=__)
```

In [None]:
# Define a function to calculate the length of side c (the hypotenuse)
def ____(____):
    return ____


df_triangles["c"] = df_triangles._____(____, axis=__)