### Data extraction
This notebook serves as overview for this task, while python scripts for obtaining data are separated into classes and their methods. Each class manages data from one dedicated instrument. Extraced data are then saved into destination folders in **data_processed**    
    
Script uses 2 essential python libraries: **Hapi** and **Hvpy**.   

#### HAPI
The Heliophysics Data Application Programmer’s Interface (HAPI) is standard interface for serving space weather time series data. HAPI provides easy access to data from various sources via Application programming interface (API). Besides time series data, HAPI also provides links for images provided by Helioviewer API and their timestamps. Links are provided for grey-scaled images, we do not use those, timestamps are used.

#### Hvpy
Hvpy python package provides high-level interface to Helioviewer API. This python package is very useful, because it provides easy way to create color layer on top of desired image. By default, Helioviewer API returns grey scaled image. This color layer represents data source instrument's color. Hvpy also allows you to modify the scale of the desired image in arcseconds per pixel.

In [1]:
import os

In [2]:
from extractDataSrc.Mdi import Mdi
from extractDataSrc.Lasco import Lasco
from extractDataSrc.InSitu import InSitu
from extractDataSrc.Eit195 import Eit195

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# check if destination folders for images are present
destination_folders = ["data_processed/eit", "data_processed/mdi/mag", "data_processed/mdi/con" ,"data_processed/lasco/c2", "data_processed/lasco/c3", "data_processed/in_situ"]

for folder in destination_folders:
    os.makedirs(folder, exist_ok=True)

In [4]:
helper_folders = ["data_csv", "data_fits/eit"]

for folder in destination_folders:
    os.makedirs(folder, exist_ok=True)

In [3]:
start_datetime = "2001-03-21 00:00:00"
stop_datetime = "2001-04-10 23:59:59"

### Extracting SOHO/MDI magnetograms and continuum
We utilized same approach for obtaining SOHO/MDI images. User can choose product, magnetogram or continuum intensity image and provide time range. Script downloads timestamps of all images located at Helioviewer via HAPI for desired time range. Timestamp is  used for hvpy’s createScreenshot function to obtain image.

We observed that magnetogram images might be corrupted. On occasion, sun’s disk are not fully displayed in the image. Utilizing momentum function from cv2 python package, we check for symmetry or asymmetry. Asymmetrical images are removed.

In [None]:
mdi = Mdi(start_datetime, stop_datetime)
mdi.extract_data("mag")

In [None]:
mdi.extract_data("con")

### Extracting SOHO/Lasco c2 and c3 images

For images captured by SOHO/LASCO instrument, user can select specific coronagraph, either C2 or C3 and provide time range. For this time range, script downloads timestamps of all images located at Helioviewer via HAPI. Then hvpy’s createScreenshot function is used to obtain colorful image for every timestamp provided by HAPI. We observed that from time to time, hvpy throws an Exception on HTTP504 server error. In this case, we wait for 60s before attempting to continue. If this wait is not helpful, script repeats this behavior four times. This implementation seems to be sufficient, as we have not seen any further complications.

In [8]:
lasco = Lasco(start_datetime, stop_datetime)

In [9]:
lasco.extract_data("c2")

1311 / 1311
success


In [10]:
lasco.extract_data("c3")

859 / 859
success


### Extracting SOHO/V_p, N_p and WIND/B_z time-series data

Time series data of in-situ measurements are collected via HAPI. User can select specific timeseries data to collect: V_p, N_p or B_z. Data are collected from HAPI’s CDAWeb server. For proton velocity and proton number density, we have choosen dataset “SOHO_CELIAS-PM_30S” with parameters “V_p” or “N_p”. These in-situ measurements are created in 30s intervals. To obtain magnetic field time series data, dataset “WI_H0_MFI@0” is selected with parameter “BGSM”. This provided measurements are in 60s interval. User can also choose to create plot in png format for each in-situ measurements.

In [8]:
insitu = InSitu(start_datetime, stop_datetime)
insitu.extract_data("V_p", make_png=True)
insitu.extract_data("N_p", make_png=True)
insitu.extract_data("B_z", make_png=True)

### Extracting SOHO/EIT 195 images via VSO links to images

We found out, that some images of SOHO/EIT 195 provided by Helioviewer are not in desired quality in terms of resolution. Because of this, we used Virtual Solar Observatory to obtain these images. Links for EIT 195A fits files at specific time range from VSO are collected via VSO website [Virtual Solar Observatory](https://sdac.virtualsolar.org/cgi/search) (note: more detailed toturial for this step is in [in this pdf](vso_links_manual.pdf)). This csv file with links should be placed in “data_csv” folder. Script goes trought this csv file, filters out links to fits files and downloads them into “data_fits” folder. Then, script transforms fits files into png image files and saves them into target subfolder in “data_processed”.

EIT images might be corrupted. On occasion, there is a missing segment of pixels in the image, possibly error in processing data from SOHO. This process causes the color intensity to become much higher during the data acquisition stage compared to uncorrupted images. Brightness for all saved png images is checked, corrupted images are excluded. After this quality check, all fits files are removed.

In [7]:
eit195 = Eit195(start_datetime, stop_datetime)
eit195.extract_data("data_csv/eit_event1.csv", quality_check=True)

transforming and asving image... 2298
