<a href="https://colab.research.google.com/github/sinahuss/solar-flare-prediction/blob/main/notebooks/solar_flare_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# C964 Capstone: Solar Flare Prediction and Analysis

### 1. Import Libraries

Import the necessary Python libraries.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### 2. Load Dataset

Load the dataset from a public GitHub repository into a Pandas dataframe. Display the first few rows to verify that it has been loaded.

In [None]:
url = 'https://raw.githubusercontent.com/sinahuss/solar-flare-prediction/refs/heads/main/data/data.csv'
df = pd.read_csv(url)

df.head()

### 3. Categorize and Count Flare Classes

Define a function to classify flares based on severity, apply it to create a new column, and then display the distribution of these flare classes.

In [None]:
# Determine the highest flare class for each row
def get_flare_class(row):
    if row['severe flares'] > 0:
        return 'X'
    elif row['moderate flares'] > 0:
        return 'M'
    elif row['common flares'] > 0:
        return 'C'
    else:
        return 'None'

# Create a new target column
df['flare_class'] = df.apply(get_flare_class, axis=1)

print(df['flare_class'].value_counts())

### 4. One-Hot Encode Categorical Features

In [None]:
# Identify the categorical columns to be encoded
categorical_cols = ['modified Zurich class', 'largest spot size', 'spot distribution']

# Apply one-hot encoding using pandas get_dummies
df_encoded = pd.get_dummies(df, columns=categorical_cols)

# Display the first few rows to see the new columns
df_encoded.head()

### 5. Correlation Matrix Heatmap

In [None]:
# Increase the figure size for better readability
plt.figure(figsize=(12, 10))

# Create the heatmap
# Drop the 'flare_class' column before calculating the correlation matrix
sns.heatmap(df_encoded.drop('flare_class', axis=1).corr(), annot=False, cmap='viridis')

plt.title('Feature Correlation Matrix')
plt.show()

### 6. Class Imbalance Visualization

In [None]:
plt.figure(figsize=(10, 6))

# Create a count plot for your new 'flare_class' target variable
sns.countplot(data=df_encoded, order=['None', 'C', 'M', 'X'], x='flare_class', hue='flare_class', palette='magma', legend=False)

plt.title('Distribution of Solar Flare Classes')
plt.ylabel('Number of Events')
plt.xlabel('Flare Class')
plt.show()