# Storm-time Superposed Analysis of Solar, Solar Wind and Atmospheric Variables

Superposed Epoch Anlysis for the satellite drag variables during storms. 

Initial data can be read in through the 1 hour omni data. The 1 hour is probably sufficient as we are mostly interetsed in what the solar wind variables are doing through storms as compared to the atmospheric density. 

Notes:
- Solar wind variables are daily variables and so we might have to a typical superposed epoch analysis if the storm phases are short as opposed to a time-normalized analysis.
- Solar Variables used in models
    - F10.7, S10, M10, Y10, F30, F81
- Geomagnetic Variables used in models
    - Kp, ap, Dst, n_sw, v_sw, IMF
- DTM2020 has to models, an operational and research model
    - Operational uses lower cadence F10.7 and KP indices as inputs
    - Research model used higher cadence F30 and Hpo indices as inputs

Data that needs to be retrieved
- S10
- M10
- Y10
- F30
- F81
- Hpo 

In [8]:
#plot matplotlib figures in the notebook
%matplotlib inline

#print all output in a cell 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import pandas as pd
import matplotlib.pyplot as plt
from sea_norm import sean

In [2]:
# import Vivians read_omni function

import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from SatDrag_Py.read_omni_function import read_omni

In [3]:
# get storm times
# read in storm start and end times
storm_txt = 'C:\\Users\\murph\\GitHub\\SatDrag\\data\\storms_drag_epochs_no_overlap.txt'
storm_time = pd.read_csv(storm_txt, header=None, skiprows=1, 
                     delim_whitespace=1, names = ['t_st','t_dst','t_en'], parse_dates=[0, 1, 2],
                     infer_datetime_format=True)

In [53]:
import glob

local_dir = 'D:\\data\\OMNI'

def load_omni(res: str = '1h',
         sdate='2010-01-01',
         nd: int = 1,
         edate=None,
         gz=True,
         dl=True,
         drop_flag=True,
         force=False):
    

    if res == '1h':
        d_dir = 'YEARLY_1HOUR'
        freq = 'Y'
    elif res == '5m':
        d_dir = ''
        freq = 'Y'
    elif res =='1m':
        d_dir = ''
        freq = 'M'
    else:
        d_dir = 'YEARLY_1HOUR'
        freq = 'Y'

    if edate is not None:
        d_ser = pd.date_range(start=sdate, end=edate, freq=freq)
    else:
        d_ser = pd.date_range(
            start=sdate, periods=nd, freq=freq)

    #fn = glob.glob(os.path.join(local_dir,d_dir,'[2014-2016]'))

    fn = [os.path.join(local_dir,d_dir,f'omni2_{x.year}.dat') for x in d_ser]

    print(fn)

In [54]:
load_omni(res='1h',
         sdate='2010-01-01', nd=2)

['D:\\data\\OMNI\\YEARLY_1HOUR\\omni2_2010.dat', 'D:\\data\\OMNI\\YEARLY_1HOUR\\omni2_2011.dat']


In [None]:

    

def read_omni(input_path, output_filepath, remove_bad = True, percentage = 0.25):
    print("Start OMNI")
    files = os.path.join(input_path, "OMNI*.csv")
    #Combine OMNI files
    files = glob.glob(files)

    #Create dataframe with all OMNI files
    df_sw = pd.concat(map(pd.read_csv, files), ignore_index = True)

    #Set time column to datetime
    df_sw.iloc[:,0] = pd.to_datetime(df_sw.iloc[:,0], format='%Y-%m-%dT%H:%M:%S.%fZ')

    #Rename column to "Datetime" for easier access
    df_sw.rename(columns = {'EPOCH_TIME_yyyy-mm-ddThh:mm:ss.sssZ':"Datetime"}, inplace = True)


    #Replace bad values with NaN
    df_sw = df_sw.replace([999.99], [np.nan])
    df_sw = df_sw.replace([9999.99], [np.nan])
    df_sw = df_sw.replace([99999.9], [np.nan])
    df_sw = df_sw.replace([99.99], [np.nan])
    df_sw = df_sw.replace([1.00000e+07], [np.nan])
    
    #Remove columns with frequent bad values
    if (remove_bad == True): 
        df_sw = df_sw.dropna(thresh=(1-percentage)*len(df_sw), axis=1)

    #Create a csv file with cleaned data
    df_sw.to_csv(output_filepath, index = False)
    
    print("OMNI finished")
    
    return output_filepath