<div>
<img src="./images/sunpy_logo.png" width="500" align="left"/>
</div>

# 1. Searching and downloading data with sunpy



In this notebook, an introduction to how you can search for and download data with sunpy. We will begin with an intoduction to `astropy.units` (which are used throughout the sunpy ecosystem), and then look about how to use `Fido` and build queries for data. In particular, this notebook will look at the following:

1. Introduction to `astropy.units`
2. Overview of `Fido` 
3. Constructing a data search query and inspecting it
4. More complex queries and the HEK
5. Extending Fido - the SOAR archive

In [None]:
import astropy.units as u

from sunpy.net import Fido, attrs as a
from sunpy.time import parse_time

# sunpy_soar is an affiliated package of SunPy 
# and registers the SOAR to be searched by Fido
import sunpy_soar

import numpy as np

## 1.1 Astropy Units - a quick overview
[`astropy.units`](https://docs.astropy.org/en/stable/units/) provides a means to deal with and handle numbers/arrays etc that have an associated physical quantity (e.g. km, seconds, Kelvin). Throughout SunPy, any physical input or outputs is an [`astropy.Quantity`](https://docs.astropy.org/en/stable/units/quantity.html#quantity). Lets look at how we can create and convert between astropy units. Above we have imported `astropy.units` as `u`

In [None]:
distance_in_km = 10*u.km

In [None]:
distance_in_km

In [None]:
distance_in_km.unit

In [None]:
distance_in_km.value

We can convert between equivalent units

In [None]:
distance_in_km.cgs

In [None]:
distance_in_km.to(u.parsec)

In [None]:
distance_in_km.to(u.Mm)

However you can only convert between physical units that make sense for example:

In [None]:
#distance_in_km.to(u.second)

In [None]:
time_in_sec = 60*u.s

In [None]:
(distance_in_km/time_in_sec).unit

In [None]:
(10*u.Angstrom).to(u.nm)

# 1.2 Overview of sunpy's Fido Unified Downloader
Fido is sunpy's interface for searching and downloading solar physics data.
It offers a unified interface for searching and fetching data irrespective of the underlying client or webservice from where the data is obtained.
You can also search and accesses multiple instruments and all available data providers in a single query.
It supplies a single, easy, consistent and extendable way to get most forms of solar physics data the community need.

For more information about Fido and how to use it check out the documentation on our website: https://docs.sunpy.org/en/stable/tutorial/acquiring_data/index.html

Fido offers access to data available through:

* VSO (Virtual Solar Observatory)
* JSOC (through drms)
* Individual data providers from web accessible sources (http, ftp, etc)
* CDAWeb
* HEK
* HELIO
  
As described here Fido provides access to many sources of data through different clients, these clients can be defined inside sunpy or in other packages (e.g. DKIST data can be accessed using Fido through [DKIST User Tools](https://docs.dkist.nso.edu/projects/python-tools/en/latest/tutorial/2_search_and_asdf_download.html)).


#### Importantly, Solar Orbiter data can be accessed through the client defined in the `sunpy_soar` affiliated package.
The SOAR client is registered once we install `sunpy_soar` above. Without installing it, it wont be registered within Fido.

Lets first inspect the clients that are available through Fido:

In [None]:
Fido

### Using attributes to search for data with Fido

Sunpy uses specified **attributes** to search for data using Fido. The range of these attributes is located in the `attrs` submodule. These `attr` parameters can be combined together to construct data search queries, such as searching over a certain time period, for data from a certain instrument with a certain wavelength etc.

Different clients and provides will have client-specific attributes, but the core attributes are:

* `a.Time`
* `a.Instrument`
* `a.Wavelength`


Lets look at how these attributes work in more detail.

First we can look at `a.Time`, which is used to specify the timerange of a query.

In [None]:
a.Time("2022-04-02 12:00", "2022-04-02 15:00")

We can inspect the instrument attribute to see what instrument `attrs` are currently supported through sunpy. Here we can see the instrument name (i.e. the name to be passed to the `a.Instrument` attribute, the client from which the data is available to access, and the full name of the instrument.)

In [None]:
a.Instrument

sunpy also now provides tab completion to auto-fill the attribute name

In [None]:
a.Instrument.eit

To search for certain wavelengths, we need to specify the input as an `astropy.Quantity` which is a the combination of a value and an associated unit. This is something is universal in the sunpy stack - that every physical input/output is a `Quantity`.

In [None]:
a.Wavelength(17.1*u.angstrom)

## 3. Constructing a search query
 ### A simple query

Lets create a simple query to search for data from AIA over a particular time period

In [None]:
result = Fido.search(a.Time("2022-04-02 12:00", "2022-04-02 15:00"), 
                     a.Instrument("AIA"))

In [None]:
result

Now lets make our query a bit more specific, say, say we only want one wavelength band from AIA. This can be achieved by specifying the `Wavelength` attribute within the search. The `a.Wavelength` attribute is passed as an `astropy.Quantity`:

In [None]:
result = Fido.search(a.Time("2022-04-02 12:00", "2022-04-02 15:00"), 
                     a.Instrument("AIA"), 
                     a.Wavelength(304*u.angstrom))

In [None]:
result

We can further specify this query by choosing the cadence (time-sampling) of the data we want to search for and download. This can be achieved by using the Sample attribute. Similar to the Wavelength attributes, this needs to be an astropy Quantity. Lets further specify the search above to only search for data with a cadence of 10 minutes.

In [None]:
result = Fido.search(a.Time("2022-04-02 12:00", "2022-04-02 15:00"), 
                     a.Instrument("AIA"), 
                     a.Wavelength(171*u.angstrom),
                     a.Sample(10*u.min))

In [None]:
len(result[0])

## 1.3 Downloading the data

Now we can show how data that is queried above can be downloaded. Once the data you have searched for (and filtered etc) is constructed into a query using `Fido.search`, you can then easily download them using `Fido.fetch`.

The data is downloaded via asynchronous and parallel download streams (via parfive), and also allows for failed data downloads to be recognized so that files can be re-requested if not downloaded.

Lets now look at how a `UnifiedResponse` from a `Fido.search` can be passed to `Fido.fetch` to download the data

In [None]:
files = Fido.fetch(result)

These files are downloaded to a local location set in the sunpy.config.file, which by default is ~/sunpy/data/. Fido.fetch returns a parfile.Results object which gives the path to where the files are downloaded to

In [None]:
print(files[0])

You can also define what directory you want the files to be saved to by passing the directory path to the path keyword in Fido.fetch. For example, I want to download these files to a local directory `./AIA/<name_of_file>`

In [None]:
Fido.fetch(result, path="./{instrument}/{file}")

## 1. 4 More complex queries

In addition to making a query to one client for one instrument, `Fido` allows the flexibility to search for data from multiple instruments, wavelengths, times etc, even when the data is being obtained through different clients.

This query can be constructed by using the pipe `|` operator, which joins queries together just like the OR operator.

Lets now make a query that searches for both GOES/XRS and AIA data over a particular time period

In [None]:
result = Fido.search(a.Time("2022-04-02 12:00", "2022-04-02 15:00"), 
                     a.Instrument.xrs  | (a.Instrument.aia & a.Wavelength(304*u.angstrom) & a.Sample(10*u.min)))

In [None]:
len(result)

In [None]:
result

In [None]:
result[0]

In [None]:
result.all_colnames

Lets download the GOES XRS data first

In [None]:
Fido.fetch(result[0, 0], path="./{instrument}/{file}")

Now lets say we only want to download one AIA file at a particular time, we can also search the table for certain conditions. Lets say we just want the file that closest to 2022-04-02 13:00. 

In [None]:
(np.abs(result[1]["Start Time"] - parse_time("2022-04-02 13:00"))).argmin()

In [None]:
Fido.fetch(result[1, 6], path="./")


## 1. 5 Using External Fido Clients 

Within `sunpy` core, we support a number of clients to common data providers. However, the `Fido` search interface is extensible such that external packages can write that their own clients that extend `Fido` in order to additional data sources. One such example is the `sunpy_soar` package which adds a client for the Solar Orbter Archive (SOAR).


## SOAR archive searching using sunpy!

In [None]:
import sunpy_soar

Note that after importing `sunpy_soar`, the SOAR is now listed as a client that `Fido` will search.

In [None]:
Fido

In [None]:
eui_query = Fido.search(a.Time("2022-04-02 12:00", "2022-04-02 15:00"), 
                        a.soar.Product("EUI-FSI174-IMAGE"), 
                        a.Level(2))

In [None]:
eui_query

In [None]:
Fido.fetch(eui_query, path="./{instrument}/{file}")

We can also search for other data products, for example the Solar Orbiter MAG

In [None]:
mag_query = Fido.search(a.Time("2022-04-02", "2022-04-05"), 
                        a.soar.Product("MAG-RTN-NORMAL-1-MINUTE"), 
                        a.Level(2))

In [None]:
mag_query

In [None]:
mag_files = Fido.fetch(mag_query, path="./{instrument}/{file}")

In [None]:
mag_files

# Accessing data from the CDAWeb with sunpy - which is very helpful for in-situ data

There is also a CDAWeb client within sunpy. CDAWeb data can be accessed when the `cdaweb.Dataset` attribute is provided to the search.

The data available from the SOAR is also available from the CDAWeb. You may be used to working with this (especially if you mainly work with in-situ observations), so lets go through how the data can also be accessed this way. This is handy, as you can also access many other in-situ measurements from this too.

In [None]:
res_cdaw = Fido.search(a.Time("2022-04-02", "2022-04-05"), 
                       a.cdaweb.Dataset('SOLO_L2_MAG-RTN-NORMAL-1-MINUTE'))

In [None]:
res_cdaw

In [None]:
mag_cdaw_files = Fido.fetch(res_cdaw)

## Accessing Metadata queries e.g. information from the HEK

As well as Fido providing an interface to search for data files that can be downloaded, Fido also allows you to query metadata. Currently Fido supports metadata searching from the HEK, HELIO and JSOC.

Similar to what we have seen so far, the search results of these clients are a UnifiedResponse object which can then be indexed and the QueryResponse table accessed like an astropy table. Lets look at an example of how we can use Fido to query the HEK.

Lets query for the active regions defined by SWPC over the past month. This can be done by using the HEK client specific attributes a.hek.attrs

In [None]:
from sunpy.net import Fido, attrs as a

In [None]:
result_hek = Fido.search(a.Time("2022-04-02", "2022-04-03"), 
                         a.hek.FL, a.hek.FRM.Name=='SSW Latest Events')

In [None]:
result_hek["hek"]["event_starttime", "event_peaktime",
                               "event_endtime", "fl_goescls", "ar_noaanum", "frm_name"]

In [None]:
result_hek = Fido.search(a.Time("2022-04-02", "2022-04-03"), 
                         a.hek.CE)

In [None]:
result_hek[0][0]

In [None]:
result_hek[0][0]["hpc_bbox"]