# **Near Real-time Plasma Flow Analysis using DSCOVR Data**

DSCOVR continues to operate beyond its expected lifespan, occasionally producing hardware faults that may result from space weather events. In this notebook, we analyze raw data from NASA's DSCOVR mission to forecast geomagnetic storms, taking into account hardware faults.

For this project, we draw upon prior research, specifically:

- *i.* [Prominence of the training data preparation in geomagnetic storm prediction using deep neural networks](https://www.nature.com/articles/s41598-022-11721-8)
- *ii.* [Prediction of Geomagnetic Storm Using Neural Networks:Comparison of the Efficiency of the Satellite and GroundBased Input Parameters](https://iopscience.iop.org/article/10.1088/1742-6596/134/1/012041/pdf)

Our objective is to implement the model in a real-time or near real-time environment, enabling us to process DSCOVR data and make predictions.

## **Understanding Solar Wind**

Solar wind frequently interacts with Earth's magnetosphere, leading to geomagnetic storms that can disrupt various technologies, including satellites and electrical power grids. Consequently, the National Oceanic and Atmospheric Administration (NOAA) operates a space weather station known as the Deep Space Climate Observatory (DSCOVR). DSCOVR employs various sensors to facilitate the prediction of these storms by gathering data on the speed, temperature, and density of incoming solar plasma.

DSCOVR orbits at a unique location, Lagrange point one, situated 1.5 million kilometers from Earth between the Earth and the Sun. This strategic position allows it to record data on incoming solar plasma before it reaches Earth's vicinity. NOAA leverages this data to simulate the state of Earth's magnetic field and atmosphere, potentially providing early warnings of geomagnetic storms.

In general, the $D_{st}$ index is used to measure geomagnetic activity. By utilizing solar wind parameters and magnetic field data, we may be able to predict the $D_{st}$ index. Previous studies suggest using interplanetary parameters as inputs [[1]](https://www.nature.com/articles/s41598-022-11721-8), such as the interplanetary magnetic field ($IMF$), solar wind ($SW$), and in some studies, the IMF $B_z$ component, $SW$ electric field, temperature, speed, and density.

In any event, the most crucial aspect is the data preparation for training and validation, as it plays a significant role in ensuring optimal ML model performance.

### **References**
[1] [Prominence of the training data preparation in geomagnetic storm prediction using deep neural networks](https://www.nature.com/articles/s41598-022-11721-8)
- [DSCOVR: Deep Space Climate Observatory](https://www.nesdis.noaa.gov/current-satellite-missions/currently-flying/dscovr-deep-space-climate-observatory)
- [DSCOVR (Deep Space Climate Observatory) -eoPortal](https://www.eoportal.org/satellite-missions/dscovr)
- [Deep Space Climate Observatory (DSCOVR)](https://www.nist.gov/measuring-cosmos/deep-space-climate-observatory-dscovr)

## **Data Resources**

For this project, we will directly utilize raw DSCOVR data as input. Given our objective of building a near-real-time system, we have opted to integrate the most recent data by implementing a function that can download datasets from the [Experimental Data Repository](https://www.spaceappschallenge.org/develop-the-oracle-of-dscovr-experimental-data-repository/).

These datasets encompass data starting from 2016 and continue to receive updates to the present day. They will be stored in the `dataset` folder. Each `.csv` file contains 53 columns, with column 0 representing the time in UTC in the following format:  `YYYY-MM-DD hh:mm:ss`. Columns 1-3 correspond to the magnetic field components measured in nanoteslas (nT) at the time indicated in column 0. The remaining columns contain dimensional measurements from the Faraday cup plasma detector.

## **Data Retrieval and Preprocessing**

For this prototype, we utilise the experimental data provided during the competition. It is worth noting that in a real-time production environment, the functionality of the following function can be extended to facilitate the real-time acquisition and storage of data in a MongoDB database for efficient retrieval and processing.

A virtual environment was created and the required libraries were installed to develop this project. Navigate to the project folder and type the following:

```
python -m venv ./venv

pip install -r requirements.txt

``` 

To activate the virtual environment please type:

```
source ./venv/bin/activate
```

In [7]:
# importing required libraries for data retrival and pro-processing

import numpy as np
import pandas as pd
import matplotlib as mtp

In [9]:
# check versions 

print('Pandas: {}'.format(pd.__version__))
print('Numpy: {}'.format(np.__version__))
print('matplotlib: {}'.format(mtp.__version__))

Pandas: 2.1.1
Numpy: 1.26.0
matplotlib: 3.8.0


In [19]:
import datetime as dt
import os
import requests
from tqdm import tqdm 

root_url = 'https://opensource.gsfc.nasa.gov/spaceappschallenge/'
start_year = 2016 # The data start from 2016 as the project
today = dt.date.today()
current_year = today.year

dataset_folder = '../dataset'
os.makedirs(dataset_folder, exist_ok=True)

def fetch_experimental_dscovr_data():
    """Function to download and store experimental data in the dataset folder as .csv files"""
    for year in range(start_year, current_year+1, 1):
        url = root_url + "dsc_fc_summed_spectra_{}_v01.zip".format(year)
        filename = os.path.join(dataset_folder, "dscovr_data_{}.zip".format(year))

        # Check if the file already exists, if not, download it
        if not os.path.exists(filename):
            print("Downloading data for year {}...".format(year))
            response = requests.get(url, stream=True)
            total_size = int(response.headers.get('content-length', 0))
            # Display a progress bar when downloading experimental DSCOVR dataset
            with open(filename, 'wb') as file, tqdm(
                    desc=filename,
                    total=total_size,
                    unit='B',
                    unit_scale=True,
                    unit_divisor=1024,
            ) as bar:
                for data in response.iter_content(chunk_size=1024):
                    file.write(data)
                    bar.update(len(data))
            print("Download complete for year {}.".format(year))
        else:
            print("Data for year {} already exists.".format(year))

fetch_experimental_dscovr_data()

    

Downloading data for year 2016...


../dataset/dscovr_data_2016.zip: 100%|██████████| 20.5M/20.5M [00:08<00:00, 2.47MB/s]


Download complete for year 2016.
Downloading data for year 2017...


../dataset/dscovr_data_2017.zip: 100%|██████████| 38.4M/38.4M [00:12<00:00, 3.31MB/s]


Download complete for year 2017.
Downloading data for year 2018...


../dataset/dscovr_data_2018.zip: 100%|██████████| 38.3M/38.3M [00:12<00:00, 3.20MB/s]


Download complete for year 2018.
Downloading data for year 2019...


../dataset/dscovr_data_2019.zip: 100%|██████████| 18.2M/18.2M [00:05<00:00, 3.66MB/s]


Download complete for year 2019.
Downloading data for year 2020...


../dataset/dscovr_data_2020.zip: 100%|██████████| 35.2M/35.2M [01:12<00:00, 506kB/s] 


Download complete for year 2020.
Downloading data for year 2021...


../dataset/dscovr_data_2021.zip: 100%|██████████| 49.8M/49.8M [00:19<00:00, 2.61MB/s]


Download complete for year 2021.
Downloading data for year 2022...


../dataset/dscovr_data_2022.zip: 100%|██████████| 55.2M/55.2M [00:18<00:00, 3.17MB/s]


Download complete for year 2022.
Downloading data for year 2023...


../dataset/dscovr_data_2023.zip: 100%|██████████| 17.2M/17.2M [00:04<00:00, 4.34MB/s]

Download complete for year 2023.



