# How to Open/Read a CDF File

This document should serve to cover the basics of how to open and read data saved in the cdf format.
This will assume you have the following packages installed in your python environment:
- cdflib
- spacepy
- matplotlib
- numpy
- wget


## What is a CDF file?

The Common Data Format (CDF), this is a format that has been championed by NASA recently. It is very common for more recent data recorded by in situ measurements as well as reprocessed data from older missions. More information on CDF and tools can be found here:
https://cdf.gsfc.nasa.gov/

The CDF allows for a more flexible storage of data variables that may or may not have different lengths. For example, if you had temperature measurements at several points in space recorded at certain times, a standard way to store this information in an ascii table would be like so:
    <img src="basicDataTable.jpg">
As you can see, this relies on repeated measurement values to properly show each unique combination taking up a record number. A cdf file uses pointers within the file connecting higher-dimension arrays (zVariables) to each other and to each record number:
    <img src="cdfDataTable.jpg">
This allows us to be much more flexible with the way we concatenate data as well as cuts down on storage space. It also is necessary in order to coherently store higher-order arrays.

Instructions (extremely detailed) on CDF can be found here: https://spdf.gsfc.nasa.gov/pub/software/cdf/doc/cdf380/cdf380ug.pdf

## Install CDF library

CDF library version 3.8 can be downloaded here: https://spdf.gsfc.nasa.gov/pub/software/cdf/dist/cdf38_0/ 

For Windows, it is suggested you use the InstallMate installer to automatically set paths. For MacOS, the universal installer is probably easiest.


## Install python packages to read CDF

The two python packages we will use here to read CDF files are the pycdf package found within the spacepy toolset, and cdflib. These can be installed easily using pip, and possible conda (depending on your version of python). Depending on how you have your environment set up, it should be as simple as:
```
pip install cdflib
pip install spacepy
```

## Download a cdf

For the purposes of this demonstration, we will start with a cdf from the Parker Solar Probe FIELDS team. The release notes and information about the data can be found here:
https://fields.ssl.berkeley.edu/data/
We will first look at the 1-minute downsampled DC magnetic field measurements in RTN coordinates. This is a useful data product for analyzing the survey-level observations of large scale structures and polarity reversals. The RTN coordinate system is helpful to remove any anomalous signals due to spacecraft maneuvers or motion, and is described relative to other popular coordinate systmes here:
http://www.srl.caltech.edu/ACE/ASC/coordinate_systems.html
For a more in-depth explanation at how to transform between coordinate systems (particulary geocentric orientations) see this article by Hapgood:
https://www.sciencedirect.com/science/article/pii/003206339290012D

The FIELDS Level-2 data archive can be found here:
https://research.ssl.berkeley.edu/data/psp/data/sci/fields/l2/

For this demonstration, we will choose a datafile from Encounter 2, on 2019-04-01. The most basic way to access the file we are interested is to navigate to the relevant folder with any web browser and manually download the file to your local directory. The highest-version available is located at this address:
https://research.ssl.berkeley.edu/data/psp/data/sci/fields/l2/mag_RTN_1min/2019/04/psp_fld_l2_mag_RTN_1min_20190401_v02.cdf

Optionally, we can use wget to download the file. This is particularly helpful for when we want to pull multiple files at a time or switch between different timeranges of interest without having to change our code in multiple places.

Below, we outline a script that downloads the relevant file with wget after checking to see if the file is already present in the local folder.

In [3]:
import wget
import os

year  = '2019'
month = '04'
day   = '01'

# By using "fstrings" we can easily substitute variables into the filename
remoteDirectory = f'https://research.ssl.berkeley.edu/data/psp/data/sci/fields/l2/mag_RTN_1min/{year}/{month}/'
remoteFilename = f'psp_fld_l2_mag_RTN_1min_{year}{month}{day}_v02.cdf'

if os.path.isfile(remoteFilename):
    print ("File exists already")
    magRTN_file = remoteFilename
else:
    print ("File doesn't exist, downloading")
    magRTN_file = wget.download(remoteDirectory + remoteFilename)

File doesn't exist, downloading
100% [..............................................................................] 94347 / 94347

## Open a cdf file
Open with cdflib, look at variables

Open with spacepy, look at variables

Look at global metadata

In [11]:
from spacepy import pycdf
cdf_magRTN_pycdf = pycdf.CDF(magRTN_file)

print('These are the variables within this file:\n')
print(cdf_magRTN_pycdf)

print('\n\n-------------------------------------\n\n')
print('These is the global metadata record of the file:\n')
print(cdf_magRTN_pycdf.attrs)

These are the variables within this file:

component_index_RTN: CDF_INT4 [3] NRV
epoch_mag_RTN_1min: CDF_TIME_TT2000 [1440]
epoch_quality_flags: CDF_TIME_TT2000 [1440]
label_RTN: CDF_CHAR*3 [3] NRV
psp_fld_l2_mag_RTN_1min: CDF_REAL4 [1440, 3]
psp_fld_l2_quality_flags: CDF_UINT4 [1440]


-------------------------------------


These is the global metadata record of the file:

Acknowledgement: 
Data_type: L2>Level 2 Data [CDF_CHAR]
Data_version: 02 [CDF_CHAR]
Dependencies: None [CDF_CHAR]
Descriptor: MAG_RTN_1min>Fluxgate Magnetometer data in RTN coordinates [CDF_CHAR]
Discipline: Solar Physics>Heliospheric Physics [CDF_CHAR]
            Space Physics>Interplanetary Studies [CDF_CHAR]
File_naming_convention: source_datatype_descriptor_yyyyMMdd [CDF_CHAR]
Generated_by: PSP FIELDS SOC [CDF_CHAR]
Generation_date: Wed Jun 23 23:31:53 2021 [CDF_CHAR]
HTTP_LINK: http://fields.ssl.berkeley.edu/data/ [CDF_CHAR]
Instrument_type: Magnetic Fields (space) [CDF_CHAR]
LINK_TEXT: PSP/FIELDS SOC [CDF_CH

In [5]:
import cdflib
cdf_magRTN_cdflib = cdflib.CDF(magRTN_file)

print(cdf_magRTN_cdflib)

<cdflib.cdfread.CDF object at 0x000001F8654E8A30>


## Examine attributes

Print attributes of a particular element

Print some general statement assigning element attributes to variables.


## Plot data

Show basic plot of whole data file, pull in variables from above to add titles and units

## Subset of data

Use the bisect function or manually choose indices to just plot a small portion of the data, using same plotting tools as above