# MMS Event Explorer

This notebook is just a demonstration of the workflow/application that will be created. Its pupose is to read a SITL report by web scraping and then identify the corresponding MMS observation that pinpoints the BBF/DF events in the report.

In [1]:
# std lib imports
import datetime as dt
# 3rd party imports
import requests
import numpy as np
import pandas as pd
# local pkg imports
import pyspedas # as spds
import pytplot

from bokeh.plotting import figure, output_notebook, show
output_notebook()

## Ground-Truth Data

We know from [this article](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019JA026872) that BBF/DF events were identified during the August 8-9, 2016 time period. This was the event also identified within the HackWeek pitch.

From the project scientists, Scientists in the Loop (SITL) reports are handwritten/typed ASCII/text-based logs of events occuring in the data. The scientists that write the SITL reports use what is referred to as "quick look" data. A sample "quick look" plot of data for the article follows.

![quick look stackplot](https://www.ssl.berkeley.edu/~moka/eva/img/2016/2016-08-08_170014_mms1.png)

In the code below, the SITL report's url from August 8-9, 2016 below is hard-coded and then the ASCII file is read and split as a CSV file according to the "keys"/data columns and then translated into a pandas dataframe. This simple code snippet should be replaced eventually with a web scraping routine to identify all available reports and then process them into corresponding pandas DataFrames which will then be reduced to JSON objects/dictionaries identifying BBF/DF events and the respective information (event duration, magnitude, speed, magnetic field properties, etc.).

In [2]:
rows = []

# https apparently doesn't work
report = 'http://www.ssl.berkeley.edu/~moka/eva/list/2016/2016-08-08_170014.txt'

response = requests.get(report)
data = response.text.splitlines()

keys = ['datetime', 'FOM', 'ID', 'Discussion']

# skip first 15 lines and remove last line which is empty
for line in data[15:-1]:
    rows.append(dict(zip(keys, line.split(',', 3))))

df = pd.DataFrame(rows)

Now we can print out what is contained in this SITL report:

In [3]:
df

Unnamed: 0,datetime,FOM,ID,Discussion
0,2016-08-09/03:43:34 - 2016-08-09/03:46:24,20.0,jstawarz(EVA),Density enhancment with +/-100 km/s flow sig...
1,2016-08-09/04:33:04 - 2016-08-09/04:34:24,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B f..."
2,2016-08-09/04:34:24 - 2016-08-09/04:35:44,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B f..."
3,2016-08-09/04:35:44 - 2016-08-09/04:37:24,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B f..."
4,2016-08-09/04:37:24 - 2016-08-09/04:39:04,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B f..."
...,...,...,...,...
60,2016-08-09/10:07:34 - 2016-08-09/10:09:54,50.0,jstawarz(EVA),Fill. FPI data available.
61,2016-08-09/10:09:54 - 2016-08-09/10:12:14,50.0,jstawarz(EVA),Fill. FPI data available.
62,2016-08-09/10:12:14 - 2016-08-09/10:14:44,50.0,jstawarz(EVA),Fill. FPI data available.
63,2016-08-09/10:14:44 - 2016-08-09/10:17:04,50.0,jstawarz(EVA),Fill. FPI data available.


Let's remove rows that do not have the words "Dipolarization" or "BBF" in them. This simplified process should be replaced with ML for language processing to account for typographical errors or even the shorthand form of DF events.

In [4]:
# slice the pandas dataframe per event type/substring
# case insensitive substring supplied and not a regex query
df_DF = df[df.Discussion.str.contains('dipolarization', regex=False, case=False)]
df_BBF = df[df.Discussion.str.contains('bbf', regex=False, case=False)]

In [5]:
df = df_DF.append(df_BBF) #there could be duplicate rows here, we just want to retain all entries

In [6]:
# print out only events from SITL reports (column formatting will also be needed; i.e., datetime period)
pd.set_option('display.max_colwidth', None)
df = df.sort_values(by=['datetime'], ascending=[True])
df

Unnamed: 0,datetime,FOM,ID,Discussion
1,2016-08-09/04:33:04 - 2016-08-09/04:34:24,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B field activity, No FPI data."
2,2016-08-09/04:34:24 - 2016-08-09/04:35:44,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B field activity, No FPI data."
3,2016-08-09/04:35:44 - 2016-08-09/04:37:24,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B field activity, No FPI data."
4,2016-08-09/04:37:24 - 2016-08-09/04:39:04,70.0,jstawarz(EVA),"Dipolarization, Flows +/-200 km/s, E and B field activity, No FPI data."
5,2016-08-09/07:49:24 - 2016-08-09/07:51:54,49.0,jstawarz(EVA),Some flow perturbations and beginning of dipolarization. Less field activity than following section.
6,2016-08-09/07:51:54 - 2016-08-09/07:54:24,49.0,jstawarz(EVA),Some flow perturbations and beginning of dipolarization. Less field activity than following section.
7,2016-08-09/07:54:24 - 2016-08-09/07:56:24,75.0,jstawarz(EVA),"Dipolarization, E and B field fluctuations, +400 km/s Vx, No FPI data"
8,2016-08-09/07:56:24 - 2016-08-09/07:58:34,75.0,jstawarz(EVA),"Dipolarization, E and B field fluctuations, +400 km/s Vx, No FPI data"
9,2016-08-09/07:58:34 - 2016-08-09/08:00:34,75.0,jstawarz(EVA),"Dipolarization, E and B field fluctuations, +400 km/s Vx, No FPI data"
10,2016-08-09/08:00:34 - 2016-08-09/08:02:44,75.0,jstawarz(EVA),"Dipolarization, E and B field fluctuations, +400 km/s Vx, No FPI data"


---

# Observational Data Hunt

Once we have identified a set of events, then we can turn to the actual observational data to obtain more physical information about these events (such as duration, location, magnitude, average during event, etc.) and use that for a training set to attempt to find more events of the same type but at a different observational date and time.

We are looking right now for the following events from the SITL report (and the [corresponding paper](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019JA026872) referenced above and in the pitch):
- BBF between 9:19 - 9:26 UTC
- DFs between 9:19 - 9:26; 9:33 - 9:38 UTC

Because of these events, we need to know what types of observations to use as well as from what instruments those observations originate. From the scientists, we are told to use the B-field components and bulk ion moments (speed and density) for L2 observations. We also were told to use "survey" rates for B-fields and "fast" rates for the moments. Let's grab that corresponding data using pyspedas. (Note: We will use probe 1 for this example since that corresponds to the MMS instrument within the article.)

__Coordinate Systems:__

There are 3 coordinate systems that are used here:
- __GSM: Geocentric Solar Magnetospheric__ _(X lies along the Earth-Sun line. The dipole axis is within the XZ-plane. Z is directed roughly towards magnetic north. Y is completing the right handed system (and is perpendicular to the dipole axis).)_
- __GSE: Geocentric Solar Ecliptic__ _(X lies along the Earth-Sun line. Z is perpendicular to the ecliptic plane towards north. Y is completing the right handed system.)_
- SM: Solar Magnetospheric

The system used in the article is SM, however, we do not have the coordinate transformation routine available at this time to translate between GSE/GSM to SM since the data retrieved comes in GSE and GSM. For this project, we will start with GSE only and eventually perform a SM transform (_task out of scope_).

In [7]:
trange = ['2016-08-09', '2016-08-10']

# update local data dir if using no_update flag
pyspedas.mms.mms_config.CONFIG['local_data_dir'] = '../pydata'

# load B-field data from FGM (Fluxgate Magnetometer)
# use no_update=True to load local data
fgm_vars = pyspedas.mms.fgm(
    trange       =  trange,
    time_clip    =   False, # could truncate residuals at trange ends
    data_rate    =  'srvy', # survey frequency
    level        =    'l2', # ensure L2 products
    probe        =     '1', # could be ['1','2','3','4'] for all spacecraft
    no_update    =    True, # load local data
)

25-Aug-20 11:01:54: Searching for local files...


Loading: ../pydata/mms1/fgm/srvy/l2/2016/08/mms1_fgm_srvy_l2_20160809_v4.47.0.cdf
The lengths of x and y do not match!
mms1_fgm_rdeltahalf_srvy_l2 is currently not in pytplot.
Loaded variables:
mms1_fgm_b_gse_srvy_l2
mms1_fgm_b_gsm_srvy_l2
mms1_fgm_b_dmpa_srvy_l2
mms1_fgm_b_bcs_srvy_l2
mms1_fgm_flag_srvy_l2
mms1_fgm_r_gse_srvy_l2
mms1_fgm_r_gsm_srvy_l2
mms1_fgm_hirange_srvy_l2
mms1_fgm_bdeltahalf_srvy_l2
mms1_fgm_stemp_srvy_l2
mms1_fgm_etemp_srvy_l2
mms1_fgm_mode_srvy_l2
mms1_fgm_rdeltahalf_srvy_l2


In [None]:
# If you'd like to visualize this data, look at this variable:
#    mms1 - MMS probe 1
#    fgm  - FGM instrument
#    b    - magnetic field
#    gse  - GSE coords
#    srvy - Survey data rate
#    l2   - Level 2 datba

#pytplot.tplot('mms1_fgm_b_gse_srvy_l2')

At this point, we can visualize inline some of this data. Let's go ahead and do that.

In [None]:
alldata = pytplot.get_data('mms1_fgm_b_gsm_srvy_l2')

# load data into variables for each magnetic field component and the x-axis (t)
t = alldata[0]
B_x = alldata[1][:,0]
B_y = alldata[1][:,1]
B_z = alldata[1][:,2]
Bt = alldata[1][:,3]

# convert to datetime axis
t_utc = np.array([dt.datetime.utcfromtimestamp(i) for i in t])

#----------------------------------------------------------------------
# plotting below
#----------------------------------------------------------------------

# create a new plot
p = figure(
    x_axis_type="datetime",
    x_range=(dt.datetime(2016, 8, 9, 9),
             dt.datetime(2016, 8, 9, 10)),
    y_range=(0,45),
    width=900
)

# add some renderers
p.line(t_utc, B_y, line_color="gold", line_width=3, legend_label="B_y")
p.line(t_utc, B_z, line_color="green", line_width=3, legend_label="B_z")
p.legend.location = 'top_left'
show(p)

I plotted this, but due to storage space of the notebook as well as resources, I have taken a screenshot to include here. The code above will produce the Bokeh plot as below.

![plot one](plot_fgm.png)

If you compare this with the article, you will see the behavior of the magnetic fields are within agreement, but the magnitude (initially at 9:00 UTC) does not agree. This is because of a difference in coordinate systems. The plot above is in GSE whereas the article's coordinates are SM (we verified). Let's go ahead and do the same (obtaining data, reading in the data, and then visualizing it) for the ion moments.

In [2]:
trange = ['2016-08-09', '2016-08-10']

# update local data dir if using no_update flag
pyspedas.mms.mms_config.CONFIG['local_data_dir'] = '../pydata'

# Speed & Density data (GSE and GSM coords) for FPI instrument
# FPI: Fast Plasma Investigation
fpi_vars = pyspedas.mms.fpi(
    trange       =     trange,
    time_clip    =      False, # could truncate residuals at trange ends
    data_rate    =     'fast', # fast frequency
    level        =       'l2', # ensure L2 products
    probe        =        '1', # all 4 MMS spacecraft
    no_update    =       True, # load local data
    datatype     = 'dis-moms', # CDF filename conventions (dis  -> Dual Ion Spectrometer)
                               #                          (moms -> moments)
)

25-Aug-20 11:04:58: Searching for local files...


Loading: ../pydata/mms1/fpi/fast/l2/dis-moms/2016/08/mms1_fpi_fast_l2_dis-moms_20160809100000_v3.3.0.cdf
Loading: ../pydata/mms1/fpi/fast/l2/dis-moms/2016/08/mms1_fpi_fast_l2_dis-moms_20160809080000_v3.3.0.cdf
Cannot find x axis.
No attribute named DEPEND_TIME or DEPEND_0 in                           variable mms1_dis_compressionloss_fast
Cannot find x axis.
No attribute named DEPEND_TIME or DEPEND_0 in                           variable mms1_dis_compressionloss_fast
Loaded variables:
mms1_dis_errorflags_fast
mms1_dis_startdelphi_count_fast
mms1_dis_startdelphi_angle_fast
mms1_dis_energyspectr_px_fast
mms1_dis_energyspectr_mx_fast
mms1_dis_energyspectr_py_fast
mms1_dis_energyspectr_my_fast
mms1_dis_energyspectr_pz_fast
mms1_dis_energyspectr_mz_fast
mms1_dis_energyspectr_omni_fast
mms1_dis_spectr_bg_fast
mms1_dis_numberdensity_bg_fast
mms1_dis_numberdensity_fast
mms1_dis_densityextrapolation_low_fast
mms1_dis_densityextrapolation_high_fast
mms1_dis_bulkv_dbcs_fast
mms1_dis_bulkv_spinton

In [None]:
pytplot.tplot_names()

In [None]:
speed = pytplot.get_data('mms1_dis_bulkv_gse_fast')
#density = pytplot.get_data('mms1_dis_numberdensity_fast')

In [None]:
speed

In [None]:
# load data into variables for each magnetic field component and the x-axis (t)
t = alldata[0]
B_x = alldata[1][:,0]
B_y = alldata[1][:,1]
B_z = alldata[1][:,2]
Bt = alldata[1][:,3]

# convert to datetime axis
t_utc = np.array([dt.datetime.utcfromtimestamp(i) for i in t])

#----------------------------------------------------------------------
# plotting below
#----------------------------------------------------------------------

from bokeh.plotting import figure, output_notebook, show
output_notebook()

# create a new plot
p = figure(
    x_axis_type="datetime",
    x_range=(dt.datetime(2016, 8, 9, 9),
             dt.datetime(2016, 8, 9, 10)),
    y_range=(0,45),
    width=900
)

# add some renderers
p.line(t_utc, B_y, line_color="gold", line_width=3, legend_label="B_y")
p.line(t_utc, B_z, line_color="green", line_width=3, legend_label="B_z")
p.legend.location = 'top_left'
show(p)
pytplot.tplot(['mms1_dis_bulkv_gse_fast', 'mms1_dis_numberdensity_fast'])