# Planning a Moon Mission by using Python Pandas

Moon rocks are a huge part of the scientific discovery and understanding of our universe and planet. They can tell us about how planets and moons were formed and guide us as we prepare for further space exploration.

Though NASA's Astromaterials Acquisition and Curation Office  has a list of all extraterrestrial samples from the Moon and elsewhere, this module focuses on those collected from the Moon. Across the six Apollo missions that landed humans on the Moon, and the three lunar missions that landed robotic probes on the Moon, we've successfully returned 383 kilograms (over 800 pounds) of rocks, soil, core samples. All of these samples have been categorized and photographed and are available to view on the NASA Lunar Sample and Photo Catalog .

The catalog is not only of interest to the casual space enthusiast, but it is a critical resource for active researchers and educational institutions that might inspire the next generation of space rock scientists. You can actually request samples for research , education , or public displays . The challenge for the curation office then is to make sure that there are enough samples of different kinds of lunar matter to continue the collective understanding of our universe and planet.

This challenge is especially difficult for two reasons:

We can't simply send an astronaut up to the Moon to collect a bit more of a particular type of rock.
When astronauts do land on the Moon again, it might be difficult for them to identify specifically what type of sample they are collecting.
Moon rock samples undergo a thorough analysis and cleansing by experts here on Earth, and it's the curators who develop an understanding of what samples they have, what requests for samples they get, and what is the most challenging part of ensuring that research can continue.

This module begins to explore how you can use data and a bit of Python coding to come up with an understanding of what we have, and make recommendations for what the next pair of astronauts should look for when they land on the Moon as part of the Artemis program in 2024.

Before you continue, be sure to spend time going through the Lunar Sample and Photo Catalog . Although this module focuses on the weights, types, and number of samples for each Apollo mission, there are a lot of other details that you could use to expand on the analysis and to help you develop even better recommendations for the next team of astronauts.

# Import Python Libraries

In [2]:
import pandas as pd

# Import Dataset in Jupyter Notebook

In [3]:
rock_samples = pd.read_csv('rocksamples.csv')

In [4]:
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (g),Pristine (%)
0,10001,Apollo11,Soil,Unsieved,125.8,88.36
1,10002,Apollo11,Soil,Unsieved,5629.0,93.73
2,10003,Apollo11,Basalt,Ilmenite,213.0,65.56
3,10004,Apollo11,Core,Unsieved,44.8,71.76
4,10005,Apollo11,Core,Unsieved,53.4,40.31


In [7]:
rock_samples.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2229 entries, 0 to 2228
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ID            2229 non-null   int64  
 1   Mission       2229 non-null   object 
 2   Type          2229 non-null   object 
 3   Subtype       2226 non-null   object 
 4   Weight (g)    2229 non-null   float64
 5   Pristine (%)  2229 non-null   float64
dtypes: float64(2), int64(1), object(3)
memory usage: 104.6+ KB


From this output, we can see that 2,229 samples were collected from the Apollo missions. Looking at a sample of the data, we can see that each row contains:

- ID - The unique ID used to keep track of the sample at NASA.
- Mission - The mission responsible for retrieving the sample.
- Type - The type of sample (type of rock or other classification).
- Subtype - A more specific type classification.
- Weight (g) - The original weight of the sample, in grams.
- Pristine (%) - The percentage of the sample that remains (some sample is used up during resear

## Clean Data

### Convert the sample weight
While details of rocket design are proprietary, some information is publicly available, such as the weight of the modules (parts of the rocket) that will carry the samples back to Earth, and the total amount of weight that the rocket can lift above the atmosphere.

We will get into the specifics of that in a later unit, but the critical part for the purposes of the samples is understanding that rocket weight is often measured in kilograms, not grams. We should then manipulate the original data by converting the sample weights into kilograms for easier data analysis later.

In [10]:
rock_samples['Weight (g)'] = rock_samples['Weight (g)'].apply(lambda x: x * 0.001)
rock_samples.rename(columns={'Weight (g)': 'Weight (kg)'}, inplace=True)
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (kg),Pristine (%)
0,10001,Apollo11,Soil,Unsieved,0.1258,88.36
1,10002,Apollo11,Soil,Unsieved,5.629,93.73
2,10003,Apollo11,Basalt,Ilmenite,0.213,65.56
3,10004,Apollo11,Core,Unsieved,0.0448,71.76
4,10005,Apollo11,Core,Unsieved,0.0534,40.31


Here we first modified the values in the Weight (g) column to be the same value multiplied by 0.001. Then we modified the name of the column to be more accurate by changing it to Weight (kg).

### Create a new DataFrame

In [13]:
missions = pd.DataFrame()
missions['Mission'] = rock_samples['Mission'].unique()
missions.head()

Unnamed: 0,Mission
0,Apollo11
1,Apollo12
2,Apollo14
3,Apollo15
4,Apollo16


In [14]:
mission.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrame

### Sum total sample weight by mission

In [16]:
sample_total_weight = rock_samples.groupby('Mission')['Weight (kg)'].sum()
missions = pd.merge(missions, sample_total_weight, on='Mission')
mission.rename(columns={'Weight (kg)': 'Sample Weight (kg)'}, inplace=True)
missions

Unnamed: 0,Mission,Weight (kg)
0,Apollo11,21.55424
1,Apollo12,34.34238
2,Apollo14,41.83363
3,Apollo15,75.3991
4,Apollo16,92.46262
5,Apollo17,109.44402


Let's break out this code a bit. The first line was <code>sample_total_weight = rock_samples.groupby('Mission')['Weight (kg)'].sum()</code>, which can be broken out as follows:

- <code>rock_samples.groupby('Mission')</code> - This groups all the rows by the values in the Mission column.
- <code>rock_samples.groupby('Mission')['Weight (kg)']</code>- This grabs all the values in the Weight (kg) column, but groups by unique values in the Mission column.
- <code>rock_samples.groupby('Mission')['Weight (kg)'].sum()</code> - This sums all the values in the Weight (kg) column for each unique value in the Mission column.

If you were to print out that one line, you would get a pandas series, which is basically a 1D data type, or a list. The list would have the index be the unique value from the Mission column, instead of a number:

In [18]:
sample_total_weight = rock_samples.groupby('Mission')['Weight (kg)'].sum()
sample_total_weight

Mission
Apollo11     21.55424
Apollo12     34.34238
Apollo14     41.83363
Apollo15     75.39910
Apollo16     92.46262
Apollo17    109.44402
Name: Weight (kg), dtype: float64

The next line, <code>pd.merge(missions, sample_total_weight, on='Mission')</code>, can be described as:

Merge the <<code>missions</code> DataFrame with the <code>sample_total_weight</code> series by using the **Mission** column as the index to merge on. What the computer will do is basically this: for each value in the **Missions** column in the missions DataFrame, find that same value in the <code>sample_total_weight</code> series, and add the value from the series into the row as a new column in the DataFrame.

### Get the difference in weights across missions

In [24]:
missions['Weight diff'] = missions['Weight (kg)'].diff()
missions

Unnamed: 0,Mission,Weight (kg),Weight diff
0,Apollo11,21.55424,
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814


In [25]:
missions['Weight diff'] = missions['Weight diff'].fillna(value=0)
missions

Unnamed: 0,Mission,Weight (kg),Weight diff
0,Apollo11,21.55424,0.0
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814


This Python code did the following:

- Looked only at the Weight diff column in the missions DataFrame
- Filled all "na" (or null) values with a certain value
- The value to fill in the na values is 0
- Saved the modified list of values for that column back into the column

This last step is important. Pandas is a library that is designed to let us explore data, which means that some of the functions will provide insight into the data, but not directly modify it. 

### Add rocket weight data to the mission analysis

#### Add in command and lunar module data
By using the NASA Space Science Data Coordinated Archive , we gathered information about each module used in each mission. As you did when you created the samples tables, create six new columns, three for the lunar modules and three for the command modules:

Module name
Module mass
Module mass diff
Fill in any NaN values with 0:

In [32]:
missions['Lunar module (LM)'] = {'Eagle (LM-5)', 'Intrepid (LM-6)', 'Antares (LM-8)', 'Falcon (LM-10)', 'Orion (LM-11)', 'Challenger (LM-12)'}
missions['LM mass (kg)'] = {15103, 15235, 15264, 16430, 16445, 16456}
missions['LM mass diff'] = missions['LM mass (kg)'].diff()
missions['LM mass diff'] = missions['LM mass diff'].fillna(value=0)

missions['Command module (CM)'] = {'Columbia (CSM-107)', 'Yankee Clipper (CM-108)', 'Kitty Hawk (CM-110)', 'Endeavor (CM-112)', 'Casper (CM-113)', 'America (CM-114)'}
missions['CM mass (kg)'] = {5560, 5609, 5758, 5875, 5840, 5960}
missions['CM mass diff'] = missions['CM mass (kg)'].diff()
missions['CM mass diff'] = missions['CM mass diff'].fillna(value=0)

missions

Unnamed: 0,Mission,Weight (kg),Weight diff,Lunar module(LM),LM mass (kg),LM mass diff,LM Mass diff,Command module (CM),Lunar module (LM),CM mass (kg),CM mass diff
0,Apollo11,21.55424,0.0,Eagle (LM-5),15264,0.0,0.0,Columbia (CSM-107),Eagle (LM-5),5960,0.0
1,Apollo12,34.34238,12.78814,Challenger (LM-12),15235,-29.0,-29.0,Yankee Clipper (CM-108),Challenger (LM-12),5609,-351.0
2,Apollo14,41.83363,7.49125,Orion (LM-11),16456,1221.0,1221.0,Endeavor (CM-112),Orion (LM-11),5840,231.0
3,Apollo15,75.3991,33.56547,Intrepid (LM-6),16430,-26.0,-26.0,Casper (CM-113),Intrepid (LM-6),5875,35.0
4,Apollo16,92.46262,17.06352,Antares (LM-8),16445,15.0,15.0,America (CM-114),Antares (LM-8),5560,-315.0
5,Apollo17,109.44402,16.9814,Falcon (LM-10),15103,-1342.0,-1342.0,Kitty Hawk (CM-110),Falcon (LM-10),5758,198.0


In [34]:
missions['Total weight (kg)'] = missions['LM mass (kg)'] + missions['CM mass (kg)']
missions['Total weight diff'] = missions['LM mass diff'] + missions['CM mass diff']
missions

Unnamed: 0,Mission,Weight (kg),Weight diff,Lunar module(LM),LM mass (kg),LM mass diff,LM Mass diff,Command module (CM),Lunar module (LM),CM mass (kg),CM mass diff,Total weight (kg),Total weight diff
0,Apollo11,21.55424,0.0,Eagle (LM-5),15264,0.0,0.0,Columbia (CSM-107),Eagle (LM-5),5960,0.0,21224,0.0
1,Apollo12,34.34238,12.78814,Challenger (LM-12),15235,-29.0,-29.0,Yankee Clipper (CM-108),Challenger (LM-12),5609,-351.0,20844,-380.0
2,Apollo14,41.83363,7.49125,Orion (LM-11),16456,1221.0,1221.0,Endeavor (CM-112),Orion (LM-11),5840,231.0,22296,1452.0
3,Apollo15,75.3991,33.56547,Intrepid (LM-6),16430,-26.0,-26.0,Casper (CM-113),Intrepid (LM-6),5875,35.0,22305,9.0
4,Apollo16,92.46262,17.06352,Antares (LM-8),16445,15.0,15.0,America (CM-114),Antares (LM-8),5560,-315.0,22005,-300.0
5,Apollo17,109.44402,16.9814,Falcon (LM-10),15103,-1342.0,-1342.0,Kitty Hawk (CM-110),Falcon (LM-10),5758,198.0,20861,-1144.0


## Understand the data in the missions DataFrame

### Compare the Data

We know that the Saturn V payload was 43,500 kg, and the weights of the modules varied from mission to mission. So, to determine the ratios that will allow us to make predictions about the Artemis missions, we can use:

- Saturn V payload
- Mission sample weight
- Mission module weight

In [41]:
# Sample-to-weight ratio
saturnVPayload = 43500
missions['Crewed area : Payload'] = missions['Total weight (kg)'] / saturnVPayload
missions['Sample : Crewed area'] = missions['Weight (kg)'] / missions['Total weight (kg)']
missions['Sample : Payload'] = missions['Weight (kg)'] / saturnVPayload
missions

Unnamed: 0,Mission,Weight (kg),Weight diff,Lunar module(LM),LM mass (kg),LM mass diff,LM Mass diff,Command module (CM),Lunar module (LM),CM mass (kg),CM mass diff,Total weight (kg),Total weight diff,Crewed area : Payload,Sample : Crewed area,Sample : Payload
0,Apollo11,21.55424,0.0,Eagle (LM-5),15264,0.0,0.0,Columbia (CSM-107),Eagle (LM-5),5960,0.0,21224,0.0,0.487908,0.001016,0.000495
1,Apollo12,34.34238,12.78814,Challenger (LM-12),15235,-29.0,-29.0,Yankee Clipper (CM-108),Challenger (LM-12),5609,-351.0,20844,-380.0,0.479172,0.001648,0.000789
2,Apollo14,41.83363,7.49125,Orion (LM-11),16456,1221.0,1221.0,Endeavor (CM-112),Orion (LM-11),5840,231.0,22296,1452.0,0.512552,0.001876,0.000962
3,Apollo15,75.3991,33.56547,Intrepid (LM-6),16430,-26.0,-26.0,Casper (CM-113),Intrepid (LM-6),5875,35.0,22305,9.0,0.512759,0.00338,0.001733
4,Apollo16,92.46262,17.06352,Antares (LM-8),16445,15.0,15.0,America (CM-114),Antares (LM-8),5560,-315.0,22005,-300.0,0.505862,0.004202,0.002126
5,Apollo17,109.44402,16.9814,Falcon (LM-10),15103,-1342.0,-1342.0,Kitty Hawk (CM-110),Falcon (LM-10),5758,198.0,20861,-1144.0,0.479563,0.005246,0.002516


### Save the ratios

In [44]:
crewedArea_payload_ratio = missions['Crewed area : Payload'].mean()
sample_crewedArea_ratio = missions['Sample : Crewed area'].mean()
sample_payload_ratio = missions['Sample : Payload'].mean()

print("We can then use these ratios to predict the Artemis capacity for samples.")
print(crewedArea_payload_ratio)
print(sample_crewedArea_ratio)
print(sample_payload_ratio)

We can then use these ratios to predict the Artemis capacity for samples.
0.49630268199233724
0.0028946732226251396
0.0014369195019157093


## Predict Aretemis Sample Capacity

### Create an Artemis mission DataFrame
We don't have all the details about the Artemis mission, but we do know currently that three iterations of the rocket will be cycled through for each mission. Each rocket will have one version meant to sustain a crew and one meant only for cargo. For the purposes of this module, we will focus only on the three rockets meant to house crew, to be more aligned with the Apollo missions. We also know that the expected payload of the Space Launch System (SLS) is expected to grow with each iteration, but that the current weight of Orion (the command and lunar modules combined) has one estimated weight today.

Again, we will call the command and lunar modules the crewed area, and we can create a DataFrame with the information we have about the three crewed missions:

In [47]:
artemis_crewedArea = 26520
artemis_mission = pd.DataFrame({'Mission':['artemis1','artemis1b','artemis2'],
                                 'Total weight (kg)':[artemis_crewedArea,artemis_crewedArea,artemis_crewedArea],
                                 'Payload (kg)':[26988, 37965, 42955]})
artemis_mission

Unnamed: 0,Mission,Total weight (kg),Payload (kg)
0,artemis1,26520,26988
1,artemis1b,26520,37965
2,artemis2,26520,42955


And we can estimate the weight of samples based on the ratios we determined from the Artemis missions:

In [50]:
artemis_mission['Sample weight from total (kg)'] = artemis_mission['Total weight (kg)'] * sample_crewedArea_ratio
artemis_mission['Sample weight from payload (kg)'] = artemis_mission['Payload (kg)'] * sample_payload_ratio
artemis_mission

Unnamed: 0,Mission,Total weight (kg),Payload (kg),Sample weight from total (kg),Sample weight from payload (kg)
0,artemis1,26520,26988,76.766734,38.779584
1,artemis1b,26520,37965,76.766734,54.552649
2,artemis2,26520,42955,76.766734,61.722877


In [51]:
artemis_mission['Estimated sample weight (kg)'] = (artemis_mission['Sample weight from payload (kg)'] + artemis_mission['Sample weight from total (kg)'])/2
artemis_mission

Unnamed: 0,Mission,Total weight (kg),Payload (kg),Sample weight from total (kg),Sample weight from payload (kg),Estimated sample weight (kg)
0,artemis1,26520,26988,76.766734,38.779584,57.773159
1,artemis1b,26520,37965,76.766734,54.552649,65.659691
2,artemis2,26520,42955,76.766734,61.722877,69.244806


### Prioritize Moon rock sample gathering based on Data

Determining which types of samples to collect from the Moon requires expertise, but we can start to make some assumptions to learn how to clean and manipulate data.

First, we can determine how much remains of each sample that was returned from the Apollo missions, given the amount that was originally collected and the percentage of remaining pristine sample.

In [54]:
rock_samples['Remaining (kg)'] = rock_samples['Weight (kg)'] * (rock_samples["Pristine (%)"] * .01)
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (kg),Pristine (%),Remaining (kg)
0,10001,Apollo11,Soil,Unsieved,0.1258,88.36,0.111157
1,10002,Apollo11,Soil,Unsieved,5.629,93.73,5.276062
2,10003,Apollo11,Basalt,Ilmenite,0.213,65.56,0.139643
3,10004,Apollo11,Core,Unsieved,0.0448,71.76,0.032148
4,10005,Apollo11,Core,Unsieved,0.0534,40.31,0.021526


Looking at the <code>head()</code> or <code>info()</code> of the rock_samples DataFrame isn't useful at this point. With over 2,000 samples, it's difficult to get an understanding of what the values are. For that, you can use the <code>describe()</code> function:

In [56]:
rock_samples.describe()

Unnamed: 0,ID,Weight (kg),Pristine (%),Remaining (kg)
count,2229.0,2229.0,2229.0,2229.0
mean,52058.432032,0.168253,84.512764,0.138103
std,26207.651471,0.637286,22.057299,0.525954
min,10001.0,0.0,0.0,0.0
25%,15437.0,0.003,80.01,0.002432
50%,65527.0,0.0102,92.3,0.00853
75%,72142.0,0.09349,98.14,0.07824
max,79537.0,11.729,180.0,11.169527


In [58]:
low_samples = rock_samples.loc[(rock_samples['Weight (kg)'] >= .16) & (rock_samples['Pristine (%)'] <= 50)]
low_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (kg),Pristine (%),Remaining (kg)
11,10017,Apollo11,Basalt,Ilmenite,0.973,43.71,0.425298
14,10020,Apollo11,Basalt,Ilmenite,0.425,27.88,0.11849
15,10021,Apollo11,Breccia,Regolith,0.25,30.21,0.075525
29,10045,Apollo11,Basalt,Olivine,0.185,12.13,0.022441
37,10057,Apollo11,Basalt,Ilmenite,0.919,35.15,0.323028


In [59]:
low_samples.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27 entries, 11 to 2183
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ID              27 non-null     int64  
 1   Mission         27 non-null     object 
 2   Type            27 non-null     object 
 3   Subtype         27 non-null     object 
 4   Weight (kg)     27 non-null     float64
 5   Pristine (%)    27 non-null     float64
 6   Remaining (kg)  27 non-null     float64
dtypes: float64(3), int64(1), object(3)
memory usage: 1.7+ KB


Twenty-seven samples seem like a small amount to base a recommendation on. We can probably find some other samples that are needed for more research here on Earth. To discover them, we can use the unique() function to see how many unique types we have across the low_samples and rock_samples DataFrames.

In [61]:
low_samples.Type.unique()

array(['Basalt', 'Breccia', 'Soil', 'Core'], dtype=object)

In [62]:
rock_samples.Type.unique()

array(['Soil', 'Basalt', 'Core', 'Breccia', 'Special', 'Crustal'],
      dtype=object)

In [63]:
low_samples.groupby('Type')['Weight (kg)'].count()

Type
Basalt     14
Breccia     8
Core        1
Soil        4
Name: Weight (kg), dtype: int64

In [65]:
needed_samples = low_samples[low_samples['Type'].isin(['Basalt', 'Breccia'])]
needed_samples.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 22 entries, 11 to 2183
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ID              22 non-null     int64  
 1   Mission         22 non-null     object 
 2   Type            22 non-null     object 
 3   Subtype         22 non-null     object 
 4   Weight (kg)     22 non-null     float64
 5   Pristine (%)    22 non-null     float64
 6   Remaining (kg)  22 non-null     float64
dtypes: float64(3), int64(1), object(3)
memory usage: 1.4+ KB


## Develop a recommendation of Moon rock samples to be collected

Let's take a step back and see how the number of samples compares to the amount of sample. We can compare the total weight from the needed_samples DataFrame to the rock_samples DataFrame. That is, we'll compare the samples we've identified as running low to all the samples collected on Apollo missions.

In [67]:
needed_samples.groupby('Type')['Weight (kg)'].sum()

Type
Basalt     17.4234
Breccia    10.1185
Name: Weight (kg), dtype: float64

In [68]:
rock_samples.groupby('Type')['Weight (kg)'].sum()

Type
Basalt      93.14077
Breccia    168.88075
Core        19.93587
Crustal      4.74469
Soil        87.58981
Special      0.74410
Name: Weight (kg), dtype: float64

In [70]:
needed_samples = needed_samples.append(rock_samples.loc[rock_samples['Type'] == 'Crustal'])
needed_samples.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 114 entries, 11 to 2189
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ID              114 non-null    int64  
 1   Mission         114 non-null    object 
 2   Type            114 non-null    object 
 3   Subtype         114 non-null    object 
 4   Weight (kg)     114 non-null    float64
 5   Pristine (%)    114 non-null    float64
 6   Remaining (kg)  114 non-null    float64
dtypes: float64(3), int64(1), object(3)
memory usage: 7.1+ KB


## Summary of needed samples

The final step is to consolidate everything we know into one table that can be shared with the astronauts. First, we need a column for each type of rock that we have already identified as rocks we want more samples of:

In [74]:
needed_samples_overview = pd.DataFrame()
needed_samples_overview['Type'] = needed_samples.Type.unique()
needed_samples_overview

Unnamed: 0,Type
0,Basalt
1,Breccia
2,Crustal


In [75]:
needed_sample_weights = needed_samples.groupby('Type')['Weight (kg)'].sum().reset_index()
needed_samples_overview = pd.merge(needed_samples_overview, needed_sample_weights, on='Type')
needed_samples_overview.rename(columns={'Weight (kg)':'Total weight (kg)'}, inplace=True)
needed_samples_overview

Unnamed: 0,Type,Total weight (kg)
0,Basalt,17.4234
1,Breccia,10.1185
2,Crustal,9.48938


In [76]:
needed_sample_ave_weights = needed_samples.groupby('Type')['Weight (kg)'].mean().reset_index()
needed_samples_overview = pd.merge(needed_samples_overview, needed_sample_ave_weights, on='Type')
needed_samples_overview.rename(columns={'Weight (kg)':'Average weight (kg)'}, inplace=True)
needed_samples_overview

Unnamed: 0,Type,Total weight (kg),Average weight (kg)
0,Basalt,17.4234,1.244529
1,Breccia,10.1185,1.264812
2,Crustal,9.48938,0.103145


**Crustals are small!** They're probably a lot harder to spot, so no wonder we don't have a lot of them.

We probably want to give the astronauts some indication of how many of each type we want them to collect. So, for the three types we're looking for, we should grab the total number we have of each type and get the remaining percentage of each type of rock.

In [77]:
total_rock_count = rock_samples.groupby('Type')['ID'].count().reset_index()
needed_samples_overview = pd.merge(needed_samples_overview, total_rock_count, on='Type')
needed_samples_overview.rename(columns={'ID':'Number of samples'}, inplace=True)
total_rocks = needed_samples_overview['Number of samples'].sum()
needed_samples_overview['Percentage of rocks'] = needed_samples_overview['Number of samples'] / total_rocks
needed_samples_overview

Unnamed: 0,Type,Total weight (kg),Average weight (kg),Number of samples,Percentage of rocks
0,Basalt,17.4234,1.244529,351,0.25885
1,Breccia,10.1185,1.264812,959,0.707227
2,Crustal,9.48938,0.103145,46,0.033923


In [79]:
artemis_ave_weight = artemis_mission['Estimated sample weight (kg)'].mean()
artemis_ave_weight

64.22588520079607

In [80]:
needed_samples_overview['Weight to collect'] = needed_samples_overview['Percentage of rocks'] * artemis_ave_weight
needed_samples_overview['Rocks to collect'] = needed_samples_overview['Weight to collect'] / needed_samples_overview['Average weight (kg)']
needed_samples_overview

Unnamed: 0,Type,Total weight (kg),Average weight (kg),Number of samples,Percentage of rocks,Weight to collect,Rocks to collect
0,Basalt,17.4234,1.244529,351,0.25885,16.624842,13.358345
1,Breccia,10.1185,1.264812,959,0.707227,45.422289,35.912271
2,Crustal,9.48938,0.103145,46,0.033923,2.178754,21.123128


So, we might tell the Artemis astronauts to please try to collect 

- 13 Basalt rocks 
- 35 Breccia rocks, and 
- 21 Crustal rocks. 

Whew!