Spaceflight data.
Some questions to ask (<em>Emphasised</em> text indicates as yet unanswered questions): 
<ul>
    <li>Who's been blasted into space the most times?</li>
    <li>Who's been to the most different orbits?</li>
    <li>Which spacecraft has been blasted into space the most times? And how many have been used more than once?</li>
    <li>Who's spent the most time in space?</li>
    <li>Which re-usable spacecraft has been used the most frequently?</li>
    <li>What is the largest number of people to have been on the same spacecraft/mission at the same time? <em>(and when was this?)</em></li>
    <li><em>What is the largest number of free-flying missions to be in orbit at the same time? (and when was this?)</em></li>
</ul>

In [37]:
import numpy as np
import pandas as pd

Read in data file

In [2]:
el = pd.read_csv("data/sftl-full.txt", sep="\t", header=0, parse_dates=[0], dtype={'eventType': 'category'})
el.head()

Unnamed: 0,date,subject,eventType,object
0,1960-07-29,Mercury_No.4,SUPPORTS,Mercury-Atlas_1
1,1960-07-29,Mercury-Atlas_1,DEPARTS,Earth
2,1960-07-29,Mercury-Atlas_1,ARRIVES,Sub_Orbital
3,1960-07-29,Mercury-Atlas_1,ENDS,
4,1960-08-19,Vostok_1K_KS2,SUPPORTS,Korabl-Sputnik_2


Some explanation of the data.</br>
The events describe changes to the state of three and a bit different types of object. The three main types are Mission, Orbit and Component. A mission is a slightly abstract concept that groups together the components for a particular purpose. All the components of a mission are physically connected and share a set of orbital ephemeris. An orbit is a rough grouping of different orbits. E.g. LEO for all low-Earth-orbits. Components are further sub-types into travellers and spacecraft. Travellers are people and animals that have been sent into space. Spacecraft are the ships that carried them. The eventType indicates the type of objects in question as follows:
<ul>
    <li>ARRIVES, DEPARTS and ENDS: The subject of the event is a mission, the object is an orbit.</li>
    <li>JOINS: The subject is a traveller and the object is a mission.</li>
    <li>SUPPORTS: The subject is a spacecraft and the object is a mission.</li>
</ul>  
Note that missions can exist without any craft assigned to them. This is typical for missions to space stations whereby the spacecraft becomes part of the space station mission but it's original mission continues until such time as the spacecraft un-docks and returns to Earth.</br>
Timestamps are only accurate to the nearest day but the order that the events occur is cronologically correct (sort of) and therefore shouldn't be ignored.</br>
I can't remember what BREAK events are for. I think it might be to separate groups of events that happen on the same day.

Check the number of different types of events

In [3]:
el.eventType.unique()

[SUPPORTS, DEPARTS, ARRIVES, ENDS, JOINS, BREAK]
Categories (6, object): [SUPPORTS, DEPARTS, ARRIVES, ENDS, JOINS, BREAK]

Pull out all of the unique components and set up a table to track their state.

In [4]:
componentevents = el[(el['eventType']=="SUPPORTS")|(el['eventType']=="JOINS")]
components = componentevents.subject.unique()
c = pd.DataFrame(index=components)
c["mission"] = None
c["currentstate"] = None
c

Unnamed: 0,mission,currentstate
Mercury_No.4,,
Vostok_1K_KS2,,
Belka_and_Strelka,,
Vostok_1K_KS3,,
Pchyolka_and_Mushka,,
...,...,...
David_Saint-Jacques,,
Anne_McClain,,
Crew Dragon 2 01,,
Soyuz_MS-12_Spacecraft,,


Do the same for missions.

In [5]:
missionevents = el[(el['eventType']=="DEPARTS")|(el['eventType']=="ARRIVES")]
m = pd.DataFrame(index = missionevents.subject.unique())
m["orbit"]=None
m["currentstate"]=None
m["currentcraft"]=0
m["maxcraft"]=0
m["currenttravellers"]=0
m["maxtravellers"]=0
m

Unnamed: 0,orbit,currentstate,currentcraft,maxcraft,currenttravellers,maxtravellers
Mercury-Atlas_1,,,0,0,0,0
Korabl-Sputnik_2,,,0,0,0,0
Korabl-Sputnik_3,,,0,0,0,0
Mercury-Redstone_1A,,,0,0,0,0
Unamed_Vostok_Flight1,,,0,0,0,0
...,...,...,...,...,...,...
Soyuz_MS-11,,,0,0,0,0
New_Shepard#10,,,0,0,0,0
Crew_Dragon_Demo-1,,,0,0,0,0
Soyuz_MS-12,,,0,0,0,0


Create initial states for all missions. This is because most missions start out by having components assigned to them. They don't usually get their own events till later.

In [6]:
ms = []
for index, mission in m.iterrows():
    msr = {"mission": index, "orbit": None, "startdate": pd.Timestamp.min, "enddate": pd.Timestamp.max}
    m.at[index, "currentstate"] = len(ms)
    ms.append(msr)
m

Unnamed: 0,orbit,currentstate,currentcraft,maxcraft,currenttravellers,maxtravellers
Mercury-Atlas_1,,0,0,0,0,0
Korabl-Sputnik_2,,1,0,0,0,0
Korabl-Sputnik_3,,2,0,0,0,0
Mercury-Redstone_1A,,3,0,0,0,0
Unamed_Vostok_Flight1,,4,0,0,0,0
...,...,...,...,...,...,...
Soyuz_MS-11,,419,0,0,0,0
New_Shepard#10,,420,0,0,0,0
Crew_Dragon_Demo-1,,421,0,0,0,0
Soyuz_MS-12,,422,0,0,0,0


Track the history of missions and components. This needs to be done at the same time to keep track of relationships between the two. Essentially it's converting a list of events into a list (two lists) of states.

In [7]:
cs = []
for index, row in el.iterrows():
    csr = {}
    msr = {}
    laststateindex = None
    if row["eventType"] == "SUPPORTS" or row["eventType"] == "JOINS":
        mission = row["object"]
        # Increment new mission craft / traveller count
        if row["eventType"] == "SUPPORTS":
            m.at[mission, "currentcraft"] += 1
            if m.at[mission, "currentcraft"] > m.at[mission, "maxcraft"]:
                m.at[mission, "maxcraft"] = m.at[mission, "currentcraft"]
        else:
            m.at[mission, "currenttravellers"] += 1
            if m.at[mission, "currenttravellers"] > m.at[mission, "maxtravellers"]:
                m.at[mission, "maxtravellers"] = m.at[mission, "currenttravellers"]
        component = row["subject"]
        # Figure out the date of the last event that changed the state of this component
        laststateindex = c.at[component, "currentstate"]
        csr["pred"] = laststateindex
        # Update the end date of the previous state
        if not(laststateindex is None):
            try:
                cs[laststateindex]["enddate"] = row["date"]
                cs[laststateindex]["succ"] = len(cs)
                # Log the state of the mission we are leaving
                # Look up the mission recorded in the previous state record for this component
                prevmission = cs[laststateindex]["mission"]
                # Look up the current state of this mission
                prevmissionstate = m.at[prevmission, "currentstate"]
                # Update the previous component state record to document the state of the mission when this component left it.
                cs[laststateindex]["endmissionstate"] = prevmissionstate
                # Decrement previous mission craft / traveller count
                if row["eventType"] == "SUPPORTS":
                    m.at[prevmission, "currentcraft"] -= 1
                else:
                    m.at[prevmission, "currenttravellers"] -= 1
            except IndexError:
                print("Index error on component state array access.")
                print("Event: " + str(row))
                print("Length of component state array: " + str(len(cs)))
                print("laststateindex: " + str(laststateindex))
                print("Component record: " + str(c.loc[component]))
        # Create a record for the new state of this component
        csr["startdate"] = row["date"]
        csr["enddate"] = pd.Timestamp.max
        csr["component"] = component
        csr["mission"] = mission
        csr["startmissionstate"] = m.at[mission,"currentstate"]
        cs.append(csr)
        if len(cs) > 0:
            c.at[component, "currentstate"] = len(cs) - 1
        else:
            c.at[component, "currentstate"] = None
    elif row["eventType"] == "ARRIVES" or row["eventType"] == "DEPARTS" or row["eventType"] == "ENDS":
        mission = row["subject"]
        if row["eventType"] == "ARRIVES":
            orbit = row["object"]
        else:
            orbit = str(row["object"]) + "-Tr-"
        # Figure out the date of the last event that changed the state of this mission
        laststateindex = m.at[mission, "currentstate"]
        msr["pred"] = laststateindex
        # Update the previous state
        if not(laststateindex is None):
            try:
                ms[laststateindex]["enddate"] = row["date"]
                if row["eventType"] == "ARRIVES" or row["eventType"] == "DEPARTS":
                    ms[laststateindex]["succ"] = len(ms)
                else:
                    ms[laststateindex]["succ"] = None
                # For departure events, check that the previous state's orbit matches that described in the event
                if row["eventType"] == "DEPARTS":
                    if ms[laststateindex]["orbit"] != row["object"]:
                        # If not, see if it's set at all.
                        if ms[laststateindex]["orbit"] is None:
                            # If not set, use the departure event to update it. Otherwise report a warning.
                            ms[laststateindex]["orbit"] = row["object"]
                        else:
                            print("WARNING: Mission "
                                  + mission
                                  + " departing "
                                  + row["object"]
                                  + " but previously assigned "
                                  + ms[laststateindex]["orbit"])
                # For arrival events, update the "orbit" field of the previous state to capture the destination
                elif row["eventType"] == "ARRIVES":
                    ms[laststateindex]["orbit"] = str(ms[laststateindex]["orbit"]) + str(row["object"])
            except IndexError:
                print("Index error on mission state array access.")
                print("Event: " + str(row))
                print("Length of mission state array: " + str(len(ms)))
                print("laststateindex: " + str(laststateindex))
                print("Mission record: " + str(m.loc[mission]))
        if row["eventType"] == "ARRIVES" or row["eventType"] == "DEPARTS": 
            # Create a record for the new state of this mission
            msr["startdate"] = row["date"]
            msr["enddate"] = pd.Timestamp.max
            msr["mission"] = mission
            msr["orbit"] = orbit
            ms.append(msr)
            if len(ms) > 0:
                m.at[mission, "currentstate"] = len(ms) - 1
            else:
                m.at[mission, "currentstate"] = None

In [8]:
msdf = pd.DataFrame(ms)
msdf

Unnamed: 0,mission,orbit,startdate,enddate,succ,pred
0,Mercury-Atlas_1,Earth,1677-09-21 00:12:43.145225,1960-07-29 00:00:00.000000000,424.0,
1,Korabl-Sputnik_2,Earth,1677-09-21 00:12:43.145225,1960-08-19 00:00:00.000000000,426.0,
2,Korabl-Sputnik_3,Earth,1677-09-21 00:12:43.145225,1960-12-01 00:00:00.000000000,430.0,
3,Mercury-Redstone_1A,Earth,1677-09-21 00:12:43.145225,1960-12-19 00:00:00.000000000,434.0,
4,Unamed_Vostok_Flight1,Earth,1677-09-21 00:12:43.145225,1960-12-22 00:00:00.000000000,438.0,
...,...,...,...,...,...,...
2117,Soyuz_MS-12,LEO,2019-03-14 00:00:00.000000,2262-04-11 23:47:16.854775807,,2116.0
2118,New_Shepard#11,Earth-Tr-Sub_Orbital,2019-05-02 00:00:00.000000,2019-05-02 00:00:00.000000000,2119.0,423.0
2119,New_Shepard#11,Sub_Orbital,2019-05-02 00:00:00.000000,2019-05-02 00:00:00.000000000,2120.0,2118.0
2120,New_Shepard#11,Sub_Orbital-Tr-Earth,2019-05-02 00:00:00.000000,2019-05-02 00:00:00.000000000,2121.0,2119.0


In [9]:
csdf = pd.DataFrame(cs)
csdf

Unnamed: 0,pred,startdate,enddate,component,mission,startmissionstate,succ,endmissionstate
0,,1960-07-29,2262-04-11 23:47:16.854775807,Mercury_No.4,Mercury-Atlas_1,0,,
1,,1960-08-19,2262-04-11 23:47:16.854775807,Vostok_1K_KS2,Korabl-Sputnik_2,1,,
2,,1960-08-19,2262-04-11 23:47:16.854775807,Belka_and_Strelka,Korabl-Sputnik_2,1,,
3,,1960-12-01,2262-04-11 23:47:16.854775807,Vostok_1K_KS3,Korabl-Sputnik_3,2,,
4,,1960-12-01,2262-04-11 23:47:16.854775807,Pchyolka_and_Mushka,Korabl-Sputnik_3,2,,
...,...,...,...,...,...,...,...,...
3499,3495.0,2019-03-14,2262-04-11 23:47:16.854775807,Soyuz_MS-12_Spacecraft,International_Space_Station,1589,,
3500,3496.0,2019-03-14,2262-04-11 23:47:16.854775807,Nick_Hague,International_Space_Station,1589,,
3501,3497.0,2019-03-14,2262-04-11 23:47:16.854775807,Christina_Koch,International_Space_Station,1589,,
3502,3498.0,2019-03-14,2262-04-11 23:47:16.854775807,Aleksey_Ovchinin,International_Space_Station,1589,,


Next, loop through the component state records and create a link table to capture all of the states of a component. Note that some components have no final mission state, this is beacuse they remain part of the mission until it's end. E.g. I guess in theory CSM Endeavour is still part of Apollo 15 to this day.

In [10]:
import numpy as np
csms = []
for csindex, csr in csdf.iterrows():
    # Find all mission states associated with this component state.
    FirstMissionState = csr["startmissionstate"]
    LastMissionState = csr["endmissionstate"]
    mission = csr["mission"]
    msSubSet = msdf[(msdf.mission == mission) & (msdf.index >= FirstMissionState)]
    if not(np.isnan(LastMissionState)):
        msSubSet = msSubSet[(msSubSet.index <= LastMissionState)]
    for msindex, msr in msSubSet.iterrows():
        csms.append((msindex, csindex))
            
csmsindex = index = pd.MultiIndex.from_tuples(csms, names=['MissionState', 'ComponentState'])
csmsdf = pd.DataFrame(index=csmsindex)
csmsdf

MissionState,ComponentState
0,0
424,0
425,0
1,1
426,1
...,...
423,3503
2118,3503
2119,3503
2120,3503


In [11]:
csmsdf.loc[1,1]

Series([], Name: (1, 1), dtype: float64)

Join the component and mission states together into a combined set. If a component state is part of a mission that goes through various state changes the component state is repeated for each mission state. E.g. the complete Apollo 15 mission(s).

In [50]:
x = csmsdf.join(msdf[["orbit", "mission", "startdate", "enddate"]], on="MissionState")
x = x.join(csdf[["component", "startdate", "enddate"]], on="ComponentState", lsuffix="_mission", rsuffix="_component")
x[(x['mission']=="Apollo_15")|(x['mission']=="Apollo_15#Lunar_surface")]

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,mission,startdate_mission,enddate_mission,component,startdate_component,enddate_component
MissionState,ComponentState,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
89,231,Earth,Apollo_15,1677-09-21 00:12:43.145225,1971-07-26,CSM_Endeavour,1971-07-26,2262-04-11 23:47:16.854775807
798,231,Earth-Tr-LEO,Apollo_15,1971-07-26 00:00:00.000000,1971-07-26,CSM_Endeavour,1971-07-26,2262-04-11 23:47:16.854775807
799,231,LEO,Apollo_15,1971-07-26 00:00:00.000000,1971-07-26,CSM_Endeavour,1971-07-26,2262-04-11 23:47:16.854775807
800,231,LEO-Tr-Lunar_Orbit,Apollo_15,1971-07-26 00:00:00.000000,1971-07-29,CSM_Endeavour,1971-07-26,2262-04-11 23:47:16.854775807
801,231,Lunar_Orbit,Apollo_15,1971-07-29 00:00:00.000000,1971-08-04,CSM_Endeavour,1971-07-26,2262-04-11 23:47:16.854775807
808,231,Lunar_Orbit-Tr-Earth,Apollo_15,1971-08-04 00:00:00.000000,1971-08-07,CSM_Endeavour,1971-07-26,2262-04-11 23:47:16.854775807
809,231,Earth,Apollo_15,1971-08-07 00:00:00.000000,1971-08-07,CSM_Endeavour,1971-07-26,2262-04-11 23:47:16.854775807
89,232,Earth,Apollo_15,1677-09-21 00:12:43.145225,1971-07-26,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000
798,232,Earth-Tr-LEO,Apollo_15,1971-07-26 00:00:00.000000,1971-07-26,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000
799,232,LEO,Apollo_15,1971-07-26 00:00:00.000000,1971-07-26,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000


Another example: all states for Soyuz_T-15's spacecraft:

In [13]:
x[(x.component == "Soyuz_T-15_Spacecraft")]

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,mission,startdate_mission,enddate_mission,component,startdate_component,enddate_component
MissionState,ComponentState,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
187,837,Earth,Soyuz_T-15,1677-09-21 00:12:43.145225,1986-03-13 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-03-13,1986-03-15 00:00:00.000000000
1194,837,Earth-Tr-LEO,Soyuz_T-15,1986-03-13 00:00:00.000000,1986-03-13 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-03-13,1986-03-15 00:00:00.000000000
1195,837,LEO,Soyuz_T-15,1986-03-13 00:00:00.000000,1986-07-16 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-03-13,1986-03-15 00:00:00.000000000
1193,840,LEO,Mir,1986-02-20 00:00:00.000000,2001-03-23 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-03-15,1986-05-05 00:00:00.000000000
1195,843,LEO,Soyuz_T-15,1986-03-13 00:00:00.000000,1986-07-16 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-05-05,1986-05-06 00:00:00.000000000
1054,846,LEO,Salyut_7,1982-04-19 00:00:00.000000,1991-02-07 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-05-06,1986-06-25 00:00:00.000000000
1195,849,LEO,Soyuz_T-15,1986-03-13 00:00:00.000000,1986-07-16 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-06-25,1986-06-25 00:00:00.000000000
1193,852,LEO,Mir,1986-02-20 00:00:00.000000,2001-03-23 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-06-25,1986-07-16 00:00:00.000000000
1195,855,LEO,Soyuz_T-15,1986-03-13 00:00:00.000000,1986-07-16 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-07-16,2262-04-11 23:47:16.854775807
1196,855,LEO-Tr-Earth,Soyuz_T-15,1986-07-16 00:00:00.000000,1986-07-16 00:00:00.000000000,Soyuz_T-15_Spacecraft,1986-07-16,2262-04-11 23:47:16.854775807


Next we need to work out the date range for each of the mission+component states. Each will have a start date drawn from the mission state and a second from the component (i.e. when the component became part of the mission). Need to pick the latest start and earliest finish.<br>
Also, we can use the start and end dates to calculate a duration.

In [14]:
def duration(row):
    # Duration is meaningless if the state isn't bounded.
    if row.enddate == pd.Timestamp.max or row.startdate == pd.Timestamp.min:
        return pd.NaT
    else:
        return row.enddate - row.startdate
    
x["startdate"] = x.apply(lambda row: max(row.startdate_mission, row.startdate_component), axis=1)
x["enddate"] = x.apply(lambda row: min(row.enddate_mission, row.enddate_component), axis=1)
x["duration"] = x.apply(lambda row: duration(row), axis=1)
x[(x.component == "LM_Falcon")]

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,mission,startdate_mission,enddate_mission,component,startdate_component,enddate_component,startdate,enddate,duration
MissionState,ComponentState,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
89,232,Earth,Apollo_15,1677-09-21 00:12:43.145225,1971-07-26,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000,1971-07-26,1971-07-26,0 days
798,232,Earth-Tr-LEO,Apollo_15,1971-07-26 00:00:00.000000,1971-07-26,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000,1971-07-26,1971-07-26,0 days
799,232,LEO,Apollo_15,1971-07-26 00:00:00.000000,1971-07-26,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000,1971-07-26,1971-07-26,0 days
800,232,LEO-Tr-Lunar_Orbit,Apollo_15,1971-07-26 00:00:00.000000,1971-07-29,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000,1971-07-26,1971-07-29,3 days
801,232,Lunar_Orbit,Apollo_15,1971-07-29 00:00:00.000000,1971-08-04,LM_Falcon,1971-07-26,1971-07-30 00:00:00.000000000,1971-07-29,1971-07-30,1 days
797,236,Lunar_Orbit,Apollo_15#Lunar_surface,1971-07-26 00:00:00.000000,1971-07-30,LM_Falcon,1971-07-30,1971-08-02 00:00:00.000000000,1971-07-30,1971-07-30,0 days
802,236,Lunar_Orbit-Tr-Lunar_Surface,Apollo_15#Lunar_surface,1971-07-30 00:00:00.000000,1971-07-30,LM_Falcon,1971-07-30,1971-08-02 00:00:00.000000000,1971-07-30,1971-07-30,0 days
803,236,Lunar_Surface,Apollo_15#Lunar_surface,1971-07-30 00:00:00.000000,1971-08-02,LM_Falcon,1971-07-30,1971-08-02 00:00:00.000000000,1971-07-30,1971-08-02,3 days
804,236,Lunar_Surface-Tr-Lunar_Orbit,Apollo_15#Lunar_surface,1971-08-02 00:00:00.000000,1971-08-02,LM_Falcon,1971-07-30,1971-08-02 00:00:00.000000000,1971-08-02,1971-08-02,0 days
805,236,Lunar_Orbit,Apollo_15#Lunar_surface,1971-08-02 00:00:00.000000,1971-08-04,LM_Falcon,1971-07-30,1971-08-02 00:00:00.000000000,1971-08-02,1971-08-02,0 days


Lets do stuff with state durations, hence we can ignore the dates.

In [15]:
y = x[["orbit","mission", "component", "duration"]]
y

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,mission,component,duration
MissionState,ComponentState,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0,Earth,Mercury-Atlas_1,Mercury_No.4,0 days
424,0,Earth-Tr-Sub_Orbital,Mercury-Atlas_1,Mercury_No.4,0 days
425,0,Sub_Orbital,Mercury-Atlas_1,Mercury_No.4,0 days
1,1,Earth,Korabl-Sputnik_2,Vostok_1K_KS2,0 days
426,1,Earth-Tr-LEO,Korabl-Sputnik_2,Vostok_1K_KS2,0 days
...,...,...,...,...,...
423,3503,Earth,New_Shepard#11,New_Shepard_3,0 days
2118,3503,Earth-Tr-Sub_Orbital,New_Shepard#11,New_Shepard_3,0 days
2119,3503,Sub_Orbital,New_Shepard#11,New_Shepard_3,0 days
2120,3503,Sub_Orbital-Tr-Earth,New_Shepard#11,New_Shepard_3,0 days


In [16]:
y.groupby(["orbit","component"]).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,mission,duration
orbit,component,Unnamed: 2_level_1,Unnamed: 3_level_1
Earth,7K-L1 K146,Kosmos_146,0 days
Earth,7K-L1 K154,Kosmos_154Kosmos_154,0 days
Earth,7K-L1 Z4,Zond_4Zond_4,0 days
Earth,Abdul_Ahad_Mohmand,Soyuz_TM-6Soyuz_TM-5,0 days
Earth,Aidyn_Aimbetov,Soyuz_TMA-18MSoyuz_TMA-16M,0 days
...,...,...,...
Sub_Orbital-Tr-Earth,SpaceShipOne_X0,SpaceShipOne_flight_15PSpaceShipOne_flight_16P...,0 days
Sub_Orbital-Tr-Earth,Vasily_Lazarev,Soyuz_18a,0 days
Sub_Orbital-Tr-Earth,Virgil_Grissom,Mercury-Redstone_4,0 days
Sub_Orbital-Tr-Earth,Vostok_1K_UN,Unamed_Vostok_Flight1,0 days


Next work out the type of each component (craft or traveller).

In [17]:
componentTypes = el[(el['eventType']=="SUPPORTS")|(el['eventType']=="JOINS")][["subject","eventType"]].groupby(["subject","eventType"], observed=True).count()
componentTypes

subject,eventType
7K-L1 K146,SUPPORTS
7K-L1 K154,SUPPORTS
7K-L1 Z4,SUPPORTS
Abdul_Ahad_Mohmand,JOINS
Aidyn_Aimbetov,JOINS
...,...
Zarya,SUPPORTS
Zhai_Zhigang,JOINS
Zhang_Xiaoguang,JOINS
Zvezda,SUPPORTS


In [18]:
mapDict = {"SUPPORTS":"craft","JOINS":"traveller"}
componentTypes = pd.DataFrame(componentTypes.reset_index(level=["eventType"])["eventType"].map(mapDict))
componentTypes


Unnamed: 0_level_0,eventType
subject,Unnamed: 1_level_1
7K-L1 K146,craft
7K-L1 K154,craft
7K-L1 Z4,craft
Abdul_Ahad_Mohmand,traveller
Aidyn_Aimbetov,traveller
...,...
Zarya,craft
Zhai_Zhigang,traveller
Zhang_Xiaoguang,traveller
Zvezda,craft


In [19]:
z = y.join(componentTypes, on="component")

Now finally do something interesting. Find out the total number of days each traveller has spent away from Earth.

In [20]:
TravellerTotals = z[(z["eventType"]=="traveller")&(z["orbit"]!="Earth")].groupby(["component", "eventType"]).sum()
TravellerTotals[["mission","duration"]].sort_values(by="duration",ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,mission,duration
component,eventType,Unnamed: 2_level_1,Unnamed: 3_level_1
Gennady_Padalka,traveller,Soyuz_TM-28Soyuz_TM-28MirSoyuz_TM-28Soyuz_TM-2...,878 days
Yuri_Malenchenko,traveller,Soyuz_TM-19Soyuz_TM-19MirSoyuz_TM-19Soyuz_TM-1...,825 days
Sergei_Krikalev,traveller,Soyuz_TM-7Soyuz_TM-7MirSoyuz_TM-7Soyuz_TM-7Soy...,803 days
Aleksandr_Kaleri,traveller,Soyuz_TM-14Soyuz_TM-14MirSoyuz_TM-14Soyuz_TM-1...,769 days
Sergei_Avdeyev,traveller,Soyuz_TM-15Soyuz_TM-15MirSoyuz_TM-15Soyuz_TM-1...,744 days
Fyodor_Yurchikhin,traveller,STS-112STS-112International_Space_StationSTS-1...,675 days
Anatoly_Solovyev,traveller,Soyuz_TM-5Soyuz_TM-5MirSoyuz_TM-4Soyuz_TM-4Soy...,628 days
Pavel_Vinogradov,traveller,Soyuz_TM-26Soyuz_TM-26MirSoyuz_TM-26Soyuz_TM-2...,547 days
Viktor_Afanasyev_(cosmonaut),traveller,Soyuz_TM-11Soyuz_TM-11MirSoyuz_TM-11Soyuz_TM-1...,546 days
Musa_Manarov,traveller,Soyuz_TM-4Soyuz_TM-4MirSoyuz_TM-6Soyuz_TM-6Soy...,541 days


Time spent either orbiting on on the surface of the moon.

In [21]:
TravellerTotalsByOrbit = z[(z["eventType"]=="traveller")&((z["orbit"]=="Lunar_Orbit")|(z["orbit"]=="Lunar_Surface"))].groupby(["component", "eventType"]).sum()
TravellerTotalsByOrbit.sort_values(by="duration",ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,mission,duration
component,eventType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Eugene_Cernan,traveller,Lunar_OrbitLunar_OrbitLunar_OrbitLunar_OrbitLu...,Apollo_10Apollo_10#LM–CSM_dockingApollo_10Apol...,8 days
John_Young_(astronaut),traveller,Lunar_OrbitLunar_OrbitLunar_OrbitLunar_Surface...,Apollo_10Apollo_16Apollo_16#Lunar_surfaceApoll...,8 days
Alfred_M._Worden,traveller,Lunar_Orbit,Apollo_15,6 days
Ronald_Evans_(astronaut),traveller,Lunar_Orbit,Apollo_17,6 days
David_R._Scott,traveller,Lunar_OrbitLunar_OrbitLunar_SurfaceLunar_Orbit...,Apollo_15Apollo_15#Lunar_surfaceApollo_15#Luna...,6 days
Thomas_K._Mattingly,traveller,Lunar_Orbit,Apollo_16,6 days
James_B._Irwin,traveller,Lunar_OrbitLunar_OrbitLunar_SurfaceLunar_Orbit...,Apollo_15Apollo_15#Lunar_surfaceApollo_15#Luna...,6 days
Harrison_H._Schmitt,traveller,Lunar_OrbitLunar_OrbitLunar_SurfaceLunar_Orbit...,Apollo_17Apollo_17#Moon_landingApollo_17#Moon_...,5 days
Charles_M._Duke,traveller,Lunar_OrbitLunar_OrbitLunar_SurfaceLunar_Orbit...,Apollo_16Apollo_16#Lunar_surfaceApollo_16#Luna...,5 days
Michael_Collins_(astronaut),traveller,Lunar_Orbit,Apollo_11,3 days


Lets try some counting of traveller launches. Assume that each occurrence of a transfer orbit starting with "EARTH" is a launch.

In [22]:
TravellerLaunchCounts = z[(z["eventType"]=="traveller")&((z["orbit"]=="Earth-Tr-LEO")|(z["orbit"]=="Earth-Tr-Sub_Orbital"))].groupby(["component", "eventType"]).count()
TravellerLaunchCounts.sort_values(by="duration",ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,mission,duration
component,eventType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Franklin_R._Chang-Diaz,traveller,7,7,7
Jerry_L._Ross,traveller,7,7,7
Curtis_L._Brown,traveller,6,6,6
Sergei_Krikalev,traveller,6,6,6
F._Story_Musgrave,traveller,6,6,6
Yuri_Malenchenko,traveller,6,6,6
C._Michael_Foale,traveller,6,6,6
Steven_A._Hawley,traveller,5,5,5
James_D._Halsell,traveller,5,5,5
Kenneth_D._Bowersox,traveller,5,5,5


Lets try the same for craft.

In [23]:
CraftLaunchCounts = z[(z["eventType"]=="craft")&((z["orbit"]=="Earth-Tr-LEO")|(z["orbit"]=="Earth-Tr-Sub_Orbital"))].groupby(["component", "eventType"]).count()
CraftLaunchCounts.sort_values(by="duration",ascending=False).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,mission,duration
component,eventType,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Space_Shuttle_Discovery,craft,39,39,39
Space_Shuttle_Atlantis,craft,33,33,33
Space_Shuttle_Columbia,craft,28,28,28
Space_Shuttle_Endeavour,craft,25,25,25
Space_Shuttle_Challenger,craft,10,10,10
SPACEHAB SM,craft,7,7,7
SPACEHAB LDM,craft,7,7,7
New_Shepard_2,craft,5,5,5
New_Shepard_3,craft,5,5,5
SpaceShipOne_X0,craft,3,3,3


Find all the re-usable spacecraft and break out the orbital and sub-orbital launches. Again assume a "launch" is a transfer from EARTH to either LEO or Sub_Orbital.

In [24]:
craftLauchesByOrbit = z.loc[
    (z["eventType"]=="craft")&((z["orbit"]=="Earth-Tr-LEO")|(z["orbit"]=="Earth-Tr-Sub_Orbital"))
    ,["component", "orbit", "mission"]]\
    .groupby(["component", "orbit"])\
    .count()\
    .unstack()
totalLaunches = z.loc[
    (z["eventType"]=="craft")&((z["orbit"]=="Earth-Tr-LEO")|(z["orbit"]=="Earth-Tr-Sub_Orbital"))
    ,["component", "orbit", "mission"]]\
    .groupby(["component"])\
    .count()["orbit"]
craftLauchesByOrbit["totLaunch"] = totalLaunches
craftLauchesByOrbit.sort_values(by="totLaunch",ascending=False)\
    .head(20)

Unnamed: 0_level_0,mission,mission,totLaunch
orbit,Earth-Tr-LEO,Earth-Tr-Sub_Orbital,Unnamed: 3_level_1
component,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Space_Shuttle_Discovery,39.0,,39
Space_Shuttle_Atlantis,33.0,,33
Space_Shuttle_Columbia,28.0,,28
Space_Shuttle_Endeavour,25.0,,25
Space_Shuttle_Challenger,9.0,1.0,10
SPACEHAB SM,7.0,,7
SPACEHAB LDM,7.0,,7
New_Shepard_2,,5.0,5
New_Shepard_3,,5.0,5
SpaceShipOne_X0,,3.0,3


Lets use the above approach to get more detail on space traveller total duration.

In [25]:
import datetime
travellerDurationByOrbit = z.loc[
    (z["eventType"]=="traveller")
    &(z["duration"] > datetime.timedelta(0))
    ,["component", "orbit", "duration"]]\
    .groupby(["component", "orbit"])\
    .sum()\
    .unstack()
totalTravellerDuration = z.loc[
    (z["eventType"]=="traveller")&(z["orbit"]!="Earth")
    ,["component", "orbit", "duration"]]\
    .groupby(["component"])\
    .sum()["duration"]
travellerDurationByOrbit["totDuration"] = totalTravellerDuration
travellerDurationByOrbit.sort_values(by="totDuration",ascending=False).head(30)


Unnamed: 0_level_0,duration,duration,duration,duration,duration,duration,duration,duration,duration,duration,totDuration
orbit,Earth,LEO,LEO-Tr-Lunar_Flyby,LEO-Tr-Lunar_Orbit,Lunar_Flyby-Tr-Earth,Lunar_Orbit,Lunar_Orbit-Tr-Earth,Lunar_Orbit-Tr-Lunar_Surface,Lunar_Surface,Lunar_Surface-Tr-Lunar_Orbit,Unnamed: 11_level_1
component,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2
Gennady_Padalka,5359 days,878 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,878 days
Yuri_Malenchenko,7196 days,825 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,825 days
Sergei_Krikalev,5360 days,803 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,803 days
Aleksandr_Kaleri,6169 days,769 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,769 days
Sergei_Avdeyev,1844 days,744 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,744 days
Fyodor_Yurchikhin,4770 days,675 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,675 days
Anatoly_Solovyev,2916 days,628 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,628 days
Pavel_Vinogradov,5333 days,547 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,547 days
Viktor_Afanasyev_(cosmonaut),2645 days,546 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,546 days
Musa_Manarov,711 days,541 days,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,541 days


Bit boring as LEO durations totally eclipse all other orbits.

I'm intersted in which traveller has been to the most different orbits (I have a feeling it's Jim Lovell)...

In [26]:
travellerOrbits = pd.DataFrame(z[(z["eventType"]=="traveller")].groupby("component").orbit.unique())
travellerOrbits["uniqueOrbitCount"] = travellerOrbits.apply(lambda row: len(row.orbit), axis=1)
travellerOrbits.sort_values(by="uniqueOrbitCount", ascending=False).head(35)

Unnamed: 0_level_0,orbit,uniqueOrbitCount
component,Unnamed: 1_level_1,Unnamed: 2_level_1
Buzz_Aldrin,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Earth, LEO-T...",10
Neil_Armstrong,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Earth, LEO-T...",10
Pete_Conrad,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Earth, LEO-T...",10
Jim_Lovell,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Earth, LEO-T...",10
John_Young_(astronaut),"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Earth, LEO-T...",10
Eugene_Cernan,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Earth, LEO-T...",10
Edgar_D._Mitchell,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Lunar_Orbit,...",9
Alan_B._Shepard,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Lunar_Orbit,...",9
Alan_Bean,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Lunar_Orbit,...",9
Harrison_H._Schmitt,"[Earth, Earth-Tr-LEO, LEO, LEO-Tr-Lunar_Orbit,...",9


Hmm, this is mostly just highlighting issues with the data. I wonder if I should ignore transfer "orbits"...

Checking how many people have actually been to the moon...

In [27]:
moonMen = z[(z["eventType"]=="traveller")&((z["orbit"]=="Lunar_Orbit")|(z["orbit"]=="Lunar_Flyby"))].groupby("component").orbit.unique()
moonMen

component
Alan_B._Shepard                             [Lunar_Orbit]
Alan_Bean                                   [Lunar_Orbit]
Alfred_M._Worden                            [Lunar_Orbit]
Buzz_Aldrin                                 [Lunar_Orbit]
Charles_M._Duke                             [Lunar_Orbit]
David_R._Scott                              [Lunar_Orbit]
Edgar_D._Mitchell                           [Lunar_Orbit]
Eugene_Cernan                               [Lunar_Orbit]
Frank_Borman                                [Lunar_Orbit]
Fred_Haise                                  [Lunar_Flyby]
Harrison_H._Schmitt                         [Lunar_Orbit]
Jack_Swigert                                [Lunar_Flyby]
James_B._Irwin                              [Lunar_Orbit]
Jim_Lovell                     [Lunar_Orbit, Lunar_Flyby]
John_Young_(astronaut)                      [Lunar_Orbit]
Michael_Collins_(astronaut)                 [Lunar_Orbit]
Neil_Armstrong                              [Lunar_Orbit]
Pete

In [28]:
len(moonMen)

24

There should be 24. And technicaly they were all on a flyby before their orbit insertion burns...
Ah ha, I have Eugene_A._Cernan AND Eugene_Cernan. Lets find out which is the most common.

In [29]:
z[(z["component"]=="Eugene_A._Cernan")|(z["component"]=="Eugene_Cernan")].groupby(["component","mission"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,orbit,duration,eventType
component,mission,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Eugene_Cernan,Apollo_10,8,8,8
Eugene_Cernan,Apollo_10#LM–CSM_docking,1,1,1
Eugene_Cernan,Apollo_17,8,8,8
Eugene_Cernan,Apollo_17#Moon_landing,5,5,5
Eugene_Cernan,Gemini_9A,5,5,5


Done a search and replace on the original data. Now the above comments don't make any sense...

Lets have a go at launch frequency. Or maybe launch interval? (Should that be from landing to launch? Or launch to launch?)

In [30]:
ComponentLaunches = x.loc[((x["orbit"]=="Earth-Tr-LEO")|(z["orbit"]=="Earth-Tr-Sub_Orbital")),["mission","component","startdate"]]
ComponentLaunchIntervals = ComponentLaunches.groupby("component").agg({'mission':'count','startdate': lambda x: x.max() - x.min()})
meanLaunchIntervals = ComponentLaunchIntervals[(ComponentLaunchIntervals["mission"]>1)].apply(lambda row: row["startdate"] / (row["mission"]-1), axis=1)
ComponentLaunchIntervals["meanLaunchIntervals"] = meanLaunchIntervals
ComponentLaunchIntervals[(ComponentLaunchIntervals["mission"]>1)].sort_values(by="meanLaunchIntervals", ascending=True).head(30)

Unnamed: 0_level_0,mission,startdate,meanLaunchIntervals
component,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
X-15 No.3,2,34 days,34 days 00:00:00
Joseph_A._Walker,2,34 days,34 days 00:00:00
SpaceShipOne_X0,3,105 days,52 days 12:00:00
New_Shepard_2,5,317 days,79 days 06:00:00
Susan_L._Still,2,88 days,88 days 00:00:00
Gregory_T._Linteris,2,88 days,88 days 00:00:00
Roger_K._Crouch,2,88 days,88 days 00:00:00
Mike_Melvill,2,100 days,100 days 00:00:00
Space_Shuttle_Challenger,10,1061 days,117 days 21:20:00
New_Shepard_3,5,506 days,126 days 12:00:00


So the above is based on launch to launch interval. Quite interesting I guess. I think landing to launch would be much harder.

Analysing max traveller and craft stats that I added in earlier.

In [36]:
m.sort_values(by="maxcraft", ascending=False).head(15)

Unnamed: 0,orbit,currentstate,currentcraft,maxcraft,currenttravellers,maxtravellers
International_Space_Station,,1589,18,18,6,13
Mir,,1657,6,10,0,13
Salyut_7,,1299,2,4,0,6
Salyut_6,,1066,2,3,0,4
STS-130,,1879,0,3,5,6
STS-88,,1593,0,2,1,6
STS-123,,1819,0,2,4,7
Skylab,,997,1,2,0,3
STS-77,,1507,0,2,2,6
Apollo_17,,839,1,2,3,3


Which is great, but I'd really like to understand the circumstances of these maximums.