# 5. Download data <a name="download"></a>

The `alminer.download_data` function allows the user to download the data from the archive directly to a location on the local disk. 

<h3>General notes about the download function:</h3>

 * The default download location is the 'data' subdirectory in the current working directory. The desired location can be changed by setting the *location* parameter to the desired path.
 * To check the amount of disk space needed, the *dryrun* parameter can be toggled to *True* which will only stage the data and write to the terminal how much space is required.
 * By default, tar files (including both raw and FITS data products) associated with uids in the provided DataFrame will be downloaded.
 * To download only the FITS data products, the *fitsonly* parameter can be toggled to *True*.
 * It is possible to provide a list of strings (to the *filename_must_include* parameter) that the user wants to be included in the filenames that are downloaded. This is useful to restrict the download further, for example, to data that have been primary beam corrected ('.pbcor') or that have the science target ('_sci' or the ALMA target name). The choice is largely dependent on the cycle and type of reduction that was performed, and data products that exist on the archive as a result.
 * A list of URLs (files) to be downloaded from the archive can be printed to the terminal by setting *print_urls=True*.

<style>
summary > * {
  display: inline;
</style>

<br>
<details style="background-color:#f8f8f8;">
<summary style="display:list-item;background-color:#f8f8f8;">Details of <code>download_data</code> function</summary>

* <u>Description</u>: Download ALMA data from the archive to a location on the local machine. <br>
<br>
* <u>Command</u>:<br>
    * alminer.**download_data**(_observations, fitsonly=False, dryrun=False, print_urls=False, filename_must_include='', location='./data'_)<br>
<br>
* <u>Parameters</u>:<br>
    * **observations (pandas.DataFrame)** : This is likely the output of e.g. `conesearch`, `target`, `catalog`, and `keysearch` functions. <br> 
    * **fitsonly (bool, optional, default: False)** : Download individual fits files only (*fitsonly=True*). This option will not download the raw data (e.g. 'asdm' files), weblogs, or README files. <br>
    * **dryrun (bool, optional, default: False)** : Allow the user to do a test run to check the size and number of files to download without actually downloading the data (*dryrun=True*). To download the data, set *dryrun=False*. <br>
    * **print_urls (bool, optional, default: False)** : Write the list of urls to be downloaded from the archive to the terminal. <br>
    * **filename_must_include (list of str, optional, default: '')** : A list of strings the user wants to be contained in the url filename. This is useful to restrict the download further, for example, to data that have been primary beam corrected ('.pbcor') or that have the science target or calibrators (by including their names). The choice is largely dependent on the cycle and type of reduction that was performed and data products that exist on the archive as a result. In most recent cycles, the science target can be filtered out with the flag '_sci' or its ALMA target name. <br>
    * **location (str, optional, default: './data')** : directory where the downloaded data should be placed. <br>
</details>

<h3> Load libraries & create a query </h3>

To explore these options, we will first query the archive using one of the methods presented in the previous section and use the results in the remainder of this tutorial.

In [1]:
import alminer

observations = alminer.keysearch({'target_name':['G31.41']})

alminer.keysearch results 
--------------------------------
Number of projects = 11
Number of observations = 26
Number of unique subbands = 138
Total number of subbands = 164
2 target(s) with ALMA data = ['G31.41+0.31', 'G31.41+0.3']
--------------------------------


In [2]:
# This step is not necessary, but currently pyvo.dal.query contains 
# an Astropy deprecation warning that we prefer to ignore
import warnings
warnings.filterwarnings("ignore") 

### Example 5.1: download all data products (raw + products)

In [3]:
alminer.download_data(observations, fitsonly=False, dryrun=True, 
                      location='./data', print_urls=False)

This is a dryrun. To begin download, set dryrun=False.
Download location = ./data
Total number of Member OUSs to download = 26
Selected Member OUSs: ['uid://A001/X12a/X209', 'uid://A002/Xa5ac37/X2e', 'uid://A001/X2fb/X852', 'uid://A001/X2fb/X850', 'uid://A001/X2fb/X84c', 'uid://A001/X2fb/Xcb5', 'uid://A001/X87a/X69c', 'uid://A001/X87a/X696', 'uid://A001/X87a/X69a', 'uid://A001/X87a/X740', 'uid://A001/X87a/X744', 'uid://A001/X1234/X200', 'uid://A001/X129e/X58f', 'uid://A001/X1284/X6d9', 'uid://A001/X1284/X6cd', 'uid://A001/X1284/X6c9', 'uid://A001/X1284/X6c1', 'uid://A001/X1284/X6d5', 'uid://A001/X1284/X6c5', 'uid://A001/X1284/X6dd', 'uid://A001/X1284/X6d1', 'uid://A001/X1284/X209d', 'uid://A001/X1284/X20a1', 'uid://A001/X133d/X327', 'uid://A001/X133d/X325', 'uid://A001/X133d/X21b4']
Number of files to download = 125
Needed disk space = 3.4 TB
--------------------------------


### Example 5.2: download only continuum FITS images for the science target

In [None]:
alminer.download_data(observations, fitsonly=True, dryrun=True, location='./data', 
                      filename_must_include=['_sci', '.pbcor', 'cont', 'G31.41'], 
                      print_urls=True)

