## Reads dataset

In [30]:
import pandas as pd
rock_samples = pd.read_csv('data/rocksamples.csv')

# head() shows the first five lines of the DataFrame
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight(g),Pristine(%)
0,10001,Apollo11,Soil,Unsieved,125.8,88.36
1,10002,Apollo11,Soil,Unsieved,5629.0,93.73
2,10003,Apollo11,Basalt,Ilmenite,213.0,65.56
3,10004,Apollo11,Core,Unsieved,44.8,71.76
4,10005,Apollo11,Core,Unsieved,53.4,40.31


In [31]:

"""
    Describes the DataFrame
"""
rock_samples.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2229 entries, 0 to 2228
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   ID           2229 non-null   int64  
 1   Mission      2229 non-null   object 
 2   Type         2229 non-null   object 
 3   Subtype      2226 non-null   object 
 4   Weight(g)    2229 non-null   float64
 5   Pristine(%)  2229 non-null   float64
dtypes: float64(2), int64(1), object(3)
memory usage: 104.6+ KB


## Convert Weight from grams to kilogram

In [32]:
rock_samples['Weight(g)'] = rock_samples['Weight(g)'].apply(lambda x : x * 0.001)
rock_samples.rename(columns={'Weight(g)' : 'Weight(Kg)'}, inplace=True)
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight(Kg),Pristine(%)
0,10001,Apollo11,Soil,Unsieved,0.1258,88.36
1,10002,Apollo11,Soil,Unsieved,5.629,93.73
2,10003,Apollo11,Basalt,Ilmenite,0.213,65.56
3,10004,Apollo11,Core,Unsieved,0.0448,71.76
4,10005,Apollo11,Core,Unsieved,0.0534,40.31


The rock_samples dataframe has a row for every sample that was collected but, we want to understand the rock samples in total as they relate to the specific rockets that brought them back.

In [33]:
missions = pd.DataFrame()
missions['Mission'] = rock_samples['Mission'].unique()
missions.head()


Unnamed: 0,Mission
0,Apollo11
1,Apollo12
2,Apollo14
3,Apollo15
4,Apollo16


In [34]:
missions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 1 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Mission  6 non-null      object
dtypes: object(1)
memory usage: 176.0+ bytes


## Sum total weight by Mission
Adds a new column to the missions dataframe to represent the sum of all samples collected on that mission

In [35]:
sample_total_weight = rock_samples.groupby('Mission')['Weight(Kg)'].sum()
missions = pd.merge(missions, sample_total_weight, on='Mission')
missions.rename(columns={'Weight(Kg)' : 'Sample Weight(Kg)'}, inplace=True)
missions

Unnamed: 0,Mission,Sample Weight(Kg)
0,Apollo11,21.55424
1,Apollo12,34.34238
2,Apollo14,41.83363
3,Apollo15,75.3991
4,Apollo16,92.46262
5,Apollo17,109.44402


## Get the difference in weights across missions
Take a look at a lot of different cross sections of data that are available. The total weight of the samples increased with each mission, but it's hard to immediately see by how much.

Add one more column to the missions dataframe that simply grabs the difference between the current row and the row preceding it.

In [36]:
missions['Weight Diff'] = missions['Sample Weight(Kg)'].diff()
missions

Unnamed: 0,Mission,Sample Weight(Kg),Weight Diff
0,Apollo11,21.55424,
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814


In [38]:
missions['Weight Diff'] = missions['Weight Diff'].fillna(value=0)
missions

Unnamed: 0,Mission,Sample Weight(Kg),Weight Diff
0,Apollo11,21.55424,0.0
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814
