<a href="https://colab.research.google.com/github/minihic/cubesatlab-assignment/blob/main/assignment1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CubeSatLab/Design I
Assignment 1

## Analyzing CubeSat Missions
Objective: To analyze and visualize CubeSat mission data from the provided Wikipedia source.


## Instructions

### Step 1: Data Gathering
- Visit the Wikipedia page on CubeSats: [List of CubeSats](https://en.wikipedia.org/wiki/List_of_CubeSats)
- Collect relevant data from the CubeSat missions, such as launch year, size, mission status (success or failure), mission category, and deployment type (ISS or direct launch).

### Step 2: Data Analysis
Organize the data and calculate the required statistics for the following aspects:
- Distribution (percentage) of CubeSat sizes.
- Number of CubeSat launches per year.
- Number of mission failures and successes over the years (distinguish between launch failure and mission failure).
- Distribution (percentage) of missions based on the mission category.
- Distribution (percentage) of deployment types (launch from ISS or direct launch).

### Step 3: Data Visualization

Using a tool like Microsoft Excel, Google Sheets, or any other data visualization software, create appropriate graphs for each of the analyzed aspects.
For each analysis, include clear and labeled graphs such as pie charts, bar graphs, and line charts to represent the data effectively.
Include a title, labels, and legends to make the graphs informative.

### Step 4: Interpretation and Conclusion

Write a brief interpretation of the results for each analysis.
Discuss any trends, patterns, or insights youâ€™ve gained from the data.
Conclude the assignment by summarizing the most important findings.

### Grading Rubric:

- Data Gathering: 10 points
- Data Analysis: 20 points
- Data Visualization: 20 points
- Interpretation and Conclusion: 20 points
- Presentation and Clarity: 10 points
- Adherence to Instructions: 10 points
- Overall Quality: 10 points

# Data Gathering & Cleaning

In [1]:
import pandas as pd
import numpy as np
import html5lib
import re

In [2]:
with open("Listofcubesats.html", "r", encoding="utf-8") as f:
	html_text = f.read()

tables = pd.read_html(html_text)

table = tables[0]

FileNotFoundError: [Errno 2] No such file or directory: 'Listofcubesats.html'

In [None]:
# Clean up data (remove references) []
def remove_numeric_brackets(val):
    if pd.isna(val):
        return val
    if not isinstance(val, str):
        return val
    cleaned = re.sub(r'\s*\[\d+\]', '', val)
    return cleaned.strip()

for col in table.columns:
    if pd.api.types.is_categorical_dtype(table[col]):
        new_cats = [remove_numeric_brackets(c) if isinstance(c, str) else c for c in table[col].cat.categories]
        table[col] = table[col].cat.rename_categories(new_cats)
    elif table[col].dtype == object:
        table[col] = table[col].apply(remove_numeric_brackets)


In [None]:
# Normalize values in the 'Type' column and convert to categorical dtype
def normalize_type(t):
	if pd.isna(t):
		return np.nan
	s = str(t).strip()
	s_lower = s.lower()

	# canonicalize common CubeSat size variants
	if re.search(r'\b2u\b', s_lower):
		return '2U'
	if re.search(r'\b3u\b', s_lower):
		return '3U'
	if re.search(r'\b6u\b', s_lower):
		return '6U'
	if re.search(r'\b1u\b', s_lower):
		return '1U'
	# keep explicit femto label
	if 'femto' in s_lower:
		return 'Other'
	# map non-standard launcher text to Other
	if 'in-orbit cubesat' in s_lower or ('launcher' in s_lower and 'cubesat' in s_lower):
		return 'Other'
	# otherwise return cleaned original string
	return s

table['Type'] = table['Type'].apply(normalize_type).astype('category')

# verify conversion and inspect categories
print("n_categories:", table['Type'].nunique(dropna=True))
print(table['Type'].value_counts(dropna=False))

In [None]:
# Converts the 'Mission status' column to categorical dtype
table['Mission status'] = table['Mission status'].astype('category')

# verify conversion and inspect categories
print("n_categories:", table['Mission status'].nunique(dropna=True))
print(table['Mission status'].value_counts(dropna=False))

In [None]:
# Classifies into Success / Mission Failure / Launch Failure / Unknown using Mission status, Mission, and Remark
def classify_final_outcome(status, mission, remarks):
    # combine available text fields into a single searchable string
    parts = []
    for v in (status, mission, remarks):
        if pd.isna(v):
            continue
        parts.append(str(v).lower())
    s = " ".join(parts)

    # check for explicit launch-failure indicators first
    launch_kw = [
        'launch failure', 'failed to reach orbit', 'failed to achieve orbit', 'failed to orbit',
        'failed at launch', 'failed during launch', 'launcher failure', 'rocket failure',
        'stage failure', 'explosion', 'destroyed during launch', 'lost during launch',
        'launch aborted', 'launch abort', 'did not reach orbit', 'failure to reach orbit',
        'failed to attain orbit', 'fairing'
    ]
    if any(kw in s for kw in launch_kw):
        return 'Launch Failure'

    # then check for general mission-failure indicators
    mission_kw = [
        'fail', 'failure', 'failed', 'lost contact', 'no signal', 'anomaly', 'burnup',
        'canceled', 'cancelled', 'deactivated', 'decommissioned', 'malfunction', 'terminated',
        'decayed', 'reentered', 'reentry'
    ]
    if any(kw in s for kw in mission_kw):
        return 'Mission Failure'

    # then check for success indicators
    success_kw = [
        'active', 'complete', 'completed', 'success', 'succeeded', 'operational',
        'deployed', 'in orbit', 'on-orbit', 'successful', 'commissioned', 'returned'
    ]
    if any(kw in s for kw in success_kw):
        return 'Success'

    return 'Unknown'

# apply and create categorical column
table['final_outcome'] = table.apply(
    lambda row: classify_final_outcome(row.get('Mission status'), row.get('Mission'), row.get('Remarks')),
    axis=1
).astype('category')

print(table['final_outcome'].value_counts(dropna=False))

In [None]:
# Classifies detailed Mission text to a small set of high-level categories
def map_mission_to_category(m):
    if pd.isna(m):
        return 'Unknown'
    s = str(m).lower()
    # keyword-driven coarse categories (order matters: more specific first)
    mapping = [
        ('Earth Observation', ['earth', 'mapping', 'imaging', 'remote sensing', 'imagery', 'land', 'ocean', 'terrain', 'observation', 'camera']),
        ('Space Weather', ['space weather', 'space-weather', 'spaceweather', 'space weather']),
        ('Communication', ['communic', 'comms', 'amateur', 'ham', 'relay', 'telemetry']),
        ('Education', ['student', 'university', 'educat', 'student-built', 'education', 'outreach']),
        ('Science', ['science', 'scientif', 'research', 'experiment', 'measure']),
        ('Technology', ['technology', 'tech', 'demonstrat', 'prototype', 'test', 'technology demonstration', 'tech demo']),
    ]
    for cat, keywords in mapping:
        if any(kw in s for kw in keywords):
            return cat
    # fallback for short/ambiguous descriptions
    return 'Other'

# Apply mapping and convert to categorical dtype
table['mission_category'] = table['Mission'].apply(map_mission_to_category).astype('category')

# Merge very small categories into 'Other' to further reduce category count
counts = table['mission_category'].value_counts()
small_cats = counts[counts < 5].index.difference(['Unknown', 'Other'])
if len(small_cats):
    table.loc[table['mission_category'].isin(small_cats), 'mission_category'] = 'Other'
    table['mission_category'] = table['mission_category'].astype('category')

print(table['mission_category'].value_counts(dropna=False))

In [None]:
# Create a new column indicating whether the satellite was deployed from the ISS based on the Remarks column
def detect_iss_deployment(remark):
    if pd.isna(remark):
        return 'Unknown'
    s = str(remark).lower()
    # common explicit patterns
    if re.search(r'deployed\s+from\s+(the\s+)?iss', s) or 'international space station' in s:
        return 'Yes'
    # some remarks may say "deployed" with date but not mention ISS; treat those as No
    return 'No'

table['deployed_from_ISS'] = table['Remarks'].apply(detect_iss_deployment).astype('category')

print(table['deployed_from_ISS'].value_counts(dropna=False))

In [None]:
# Sets date to datetime dtype
table['Launch date (UTC)'] = pd.to_datetime(table['Launch date (UTC)'], errors='coerce').dt.normalize()

print(table['Launch date (UTC)'].dtype)

In [None]:
# Creates a compact summary dataframe with the requested fields
cols = [
    'Name',
    'COSPAR ID (NORAD ID)',
    'Type',
    'Launch date (UTC)',
    'final_outcome',
    'mission_category',
    'deployed_from_ISS'
]

available = [c for c in cols if c in table.columns]
summary_df = table[available].copy()

summary_df = summary_df.rename(columns={
    'COSPAR ID (NORAD ID)': 'COSPAR ID',
    'Launch date (UTC)': 'Launch date',
    'final_outcome': 'Mission Outcome',
    'mission_category': 'Mission Category',
    'deployed_from_ISS': 'Deployed from ISS'
})

summary_df.info()

## Processed Table (first 25 rows)

In [None]:
summary_df.head(25)

# Data Analysis

In [None]:
# Distribution (percentage) of CubeSat sizes (Type) using the existing summary_df
counts = summary_df['Type'].value_counts(dropna=False)
percent = (counts / counts.sum()) * 100

dist_df = pd.DataFrame({
    'count': counts,
    'percent': percent.round(2)
}).sort_values('count', ascending=False)

dist_df

In [None]:
# Count CubeSat launches per year and print the results
launches_per_year = summary_df['Launch date'].dt.year.value_counts().sort_index()
print("CubeSat launches per year (count):")
print(launches_per_year)

# Data Visualization

freiofr


# Interpretation and Conclusion