# The thinking behind this code
Here, I'm just looking to combine the JEDI catalog with the CDAW CME catalog. I really only care about the CME speed and mass in CDAW, so I strip my dataframe down to just that. 
By "combine" I really mean that I'd like to see which EVE emission lines (or line combos) are most predictive of CME speed and mass. I'd like to use some machine learning techniques to accomplish this. 
Combining the two catalogs also means matching the CMEs in CDAW to the rows I have in JEDI (or putting in null values where CDAW doesn't have a corresponding CME). 

In [1]:
# Standard modules
import numpy as np
import pandas as pd
from astropy.time import Time
import matplotlib.pyplot as plt
from matplotlib import dates
import seaborn as sns

# Custom modules
from jpm_time_conversions import *
from jpm_logger import JpmLogger
%matplotlib inline
sns.set()
plt.style.use('jpm-dark')

## First things first: I've got to read in the catalogs and do a bit of cleaning
and then take a look at the resultant dataframes

In [2]:
# Read in the JEDI and CDAW catalogs
jedi = pd.read_csv('/Users/jmason86/Dropbox/Research/Postdoc_NASA/Analysis/Coronal Dimming Analysis/JEDI Catalog/jedi_v1.csv', low_memory=False)
cdaw = pd.read_csv('/Users/jmason86/Dropbox/Research/Data/CDAW/Historical CME Data.csv', parse_dates=[['Date', 'Time']])

In [3]:
# Clean the CDAW catalog and strip out the columns I don't care about
cdaw.index = pd.DatetimeIndex(cdaw['Date_Time'])
cdaw.index.rename('Datetime', inplace=True)
cdaw.drop(['Date_Time', 'PA', 'Width', 'KE [erg]'], inplace=True, axis=1)

In [4]:
# More cleaning: restricting the time range of CDAW to that of JEDI
cdaw = cdaw[jedi['GOES Flare Start Time'][0]: jedi['GOES Flare Start Time'][len(jedi) - 1]]
cdaw.head()

Unnamed: 0_level_0,Linear Speed [km/s],Mass [g]
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2010-05-04 18:30:05,425.0,52000000000000.0
2010-05-05 00:30:05,259.0,
2010-05-05 13:31:45,519.0,
2010-05-05 17:06:05,462.0,120000000000000.0
2010-05-05 17:54:05,231.0,70000000000000.0


In [5]:
jedi.head()

Unnamed: 0,Event #,GOES Flare Start Time,GOES Flare Peak Time,GOES Flare Class,Pre-Flare Start Time,Pre-Flare End Time,Flare Interrupt,9.4 Pre-Flare Irradiance [W/m2],13.1 Pre-Flare Irradiance [W/m2],13.3 Pre-Flare Irradiance [W/m2],...,103.2 by 63.0 Fitting Score,103.2 by 71.9 Fitting Score,103.2 by 72.2 Fitting Score,103.2 by 77.0 Fitting Score,103.2 by 79.0 Fitting Score,103.2 by 83.6 Fitting Score,103.2 by 95.0 Fitting Score,103.2 by 97.3 Fitting Score,103.2 by 97.7 Fitting Score,103.2 by 102.6 Fitting Score
0,1.0,2010-05-04 16:15:00.000,2010-05-04 16:29:00.000,C3.6,2010-05-04 08:29:00.000,2010-05-04 16:29:00.000,True,,,,...,,,,,,,,,,
1,2.0,2010-05-05 07:09:00.000,2010-05-05 07:16:00.000,C2.3,2010-05-04 23:16:00.000,2010-05-05 07:16:00.000,True,4e-06,2e-06,,...,,,,,,,,,,
2,3.0,2010-05-05 11:37:00.000,2010-05-05 11:52:00.000,C8.8,2010-05-04 23:16:00.000,2010-05-05 07:16:00.000,True,4e-06,2e-06,,...,,,,,,,,,,
3,4.0,2010-05-05 17:13:00.000,2010-05-05 17:19:00.000,M1.2,2010-05-04 23:16:00.000,2010-05-05 07:16:00.000,False,4e-06,2e-06,,...,,,,,,,,,,
4,5.0,2010-05-07 07:29:00.000,2010-05-07 07:42:00.000,C2.0,2010-05-06 23:42:00.000,2010-05-07 07:42:00.000,True,,,,...,,,,,,,,,,


## Make a merged catalog (DataFrame)
I am using JEDI as the baseline and will fill in what I can from CDAW. This merged set will of course contain columns in addition to what's in JEDI.

In [64]:
jedicdaw = jedi.copy()
jedicdaw['Has CME'] = False
jedicdaw['Matching CME time to time of'] = np.nan
jedicdaw['CME Time'] = np.nan
jedicdaw['CME Speed [km/s]'] = np.nan
jedicdaw['CME Mass [g]'] = np.nan

## Matching up rows in JEDI and CDAW
To match up the rows in the two catalogs, I am using the standard that Alysha Reinard is, but slightly modified for my case: an event is correlated if the CME CDAW start time is between 2 hours before and 4 hours after the dimming max depth time (mean across emission lines) and within 45º of the flare location (converted to position angle). 

First I'll just define a function to convert flare position in lat/lon to position angle so it can be directly compared with the CME CDAW position

In [7]:
def coord2pa(ew_coord, ns_coord):
    """Function to translate ew/ns coordinates into position angle
    Written by Alysha Reinard. 
    
    Inputs:
        ew_coord [float]: The east/west coordinate
        ns_coord [float]: The north/south coordinate
        
    Optional Inputs:
       None

    Outputs:
        pa [float]: The converted position angle
                                                 
    Optional Outputs:
        None

    Example:
        pa = coord2pa(35, -40)    
    """
    x = ew_coord * 1.0
    y = ns_coord * 1.0
    if y != 0:
        pa = np.arctan(-x / y)
    else:
        pa = 3.1415926 / 2.  # limit of arctan(infinity)

    pa = pa * 180.0 / 3.1415926

    if y < 0:
        pa = pa + 180    
    if pa < 0:
        pa = pa + 360
        
    if x == 0 and y == 0:
        pa =- 1

    return pa

Just to make sure the function isn't buggy, I'll enter some random values

In [8]:
coord2pa(35, -40)

221.1859258682658

I need to match up times: which CMEs occur reasonably close in time to each dimming/flare? First I need to figure out what that dimming/flare time should be. 

In [9]:
dimming_times = jedi.filter(regex='Depth Time')
mean_times = []
for i in range(len(dimming_times)):
    tmp = pd.DatetimeIndex(dimming_times.iloc[i])
    tmp = np.nanmean(pd.DatetimeIndex.to_julian_date(tmp[tmp.notnull()]))
    if not np.isnan(tmp):
        mean_times.append(Time(tmp, format='jd').iso)
        jedicdaw['Matching CME time to time of'].iloc[i] = 'Dimming'
    else:
        mean_times.append(jedi['GOES Flare Peak Time'].iloc[i])
        jedicdaw['Matching CME time to time of'].iloc[i] = 'Flare'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [36]:
jedi_time = Time(mean_times)
cdaw_time = Time(cdaw.index.values.astype(str))

In [None]:
for jedi_row_index in range(len(jedi)):
    ind = np.where((cdaw_time.jd <= (4./24. + jedi_time[jedi_row_index].jd)) & (cdaw_time.jd >= (jedi_time[jedi_row_index].jd - 2./24.)))
    if ind[0].size == 1:
        jedicdaw['Has CME'].iloc[jedi_row_index] = True
        jedicdaw['CME Time'].iloc[jedi_row_index] = cdaw_time[ind[0]].iso[0]
        jedicdaw['CME Speed [km/s]'].iloc[jedi_row_index] = cdaw['Linear Speed [km/s]'].iloc[ind[0]].values[0]
        jedicdaw['CME Mass [g]'].iloc[jedi_row_index] = cdaw['Mass [g]'].iloc[ind[0]].values[0]
    elif ind[0].size > 1:
        # TODO: Figure out how to decide what to do with multiple matching CMEs -- for now just grabbing the first one
        jedicdaw['Has CME'].iloc[jedi_row_index] = ind[0].size
        jedicdaw['CME Time'].iloc[jedi_row_index] = cdaw_time[ind[0]].iso[0]
        jedicdaw['CME Speed [km/s]'].iloc[jedi_row_index] = cdaw['Linear Speed [km/s]'].iloc[ind[0]].values[0]
        jedicdaw['CME Mass [g]'].iloc[jedi_row_index] = cdaw['Mass [g]'].iloc[ind[0]].values[0]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [102]:
jedicdaw.head()

Unnamed: 0,Event #,GOES Flare Start Time,GOES Flare Peak Time,GOES Flare Class,Pre-Flare Start Time,Pre-Flare End Time,Flare Interrupt,9.4 Pre-Flare Irradiance [W/m2],13.1 Pre-Flare Irradiance [W/m2],13.3 Pre-Flare Irradiance [W/m2],...,103.2 by 83.6 Fitting Score,103.2 by 95.0 Fitting Score,103.2 by 97.3 Fitting Score,103.2 by 97.7 Fitting Score,103.2 by 102.6 Fitting Score,Has CME,Matching CME time to time of,CME Time,CME Speed [km/s],CME Mass [g]
0,1.0,2010-05-04 16:15:00.000,2010-05-04 16:29:00.000,C3.6,2010-05-04 08:29:00.000,2010-05-04 16:29:00.000,True,,,,...,,,,,,False,,,,
1,2.0,2010-05-05 07:09:00.000,2010-05-05 07:16:00.000,C2.3,2010-05-04 23:16:00.000,2010-05-05 07:16:00.000,True,4e-06,2e-06,,...,,,,,,False,,,,
2,3.0,2010-05-05 11:37:00.000,2010-05-05 11:52:00.000,C8.8,2010-05-04 23:16:00.000,2010-05-05 07:16:00.000,True,4e-06,2e-06,,...,,,,,,True,,2010-05-05 13:31:45.000,519.0,
3,4.0,2010-05-05 17:13:00.000,2010-05-05 17:19:00.000,M1.2,2010-05-04 23:16:00.000,2010-05-05 07:16:00.000,False,4e-06,2e-06,,...,,,,,,False,,,,
4,5.0,2010-05-07 07:29:00.000,2010-05-07 07:42:00.000,C2.0,2010-05-06 23:42:00.000,2010-05-07 07:42:00.000,True,,,,...,,,,,,2,,2010-05-07 08:06:05.000,543.0,500000000000000.0
