<div>
<img src="images/Logo_Gaia_may_23_2022.png" width="300" align="right"/>    
</div>


# Cone search > Bulk download 

<br />
<br />
<br />
<br />
<br />


**Release number:** 
v1.1 (2022-08-06)


**Applicable Gaia Data Releases:**
Gaia EDR3, Gaia DR3

**Author:**
Héctor Cánovas Cabrera; hector.canovas@esa.int

**Summary:** 

This code computes the list of Gaia (E)DR3 files associated to a circular region in the sky defined by the user. The granularity of this region is set by the [HEALPix](https://healpix.sourceforge.io) level selected.

Input parameters:
* target catalogue (e.g., gaia_source, auxiliary/agn_cross_id, or auxiliary/frame_rotator_source),
* the cone-search parameters (centre and radius), and
* the desired healpix level.

Once the variables above are set the notebook creates a reference file that contains the min/max [HEALPix](https://healpix.sourceforge.io) index (levels: 6,7,8, and 9) encompassed by each gaia_source file available in the (E)DR3 [bulk download directory.](http://cdn.gea.esac.esa.int/Gaia/). The convertion between the different [HEALPix](https://healpix.sourceforge.io) levels is done by means of bit-shifting operations.


**Useful URLs:**

* [Questions or suggestions](https://www.cosmos.esa.int/web/gaia/questions)
* [Tutorials, documentation, and more](https://www.cosmos.esa.int/web/gaia-users/archive)
* [Known issues in the Gaia data](https://www.cosmos.esa.int/web/gaia-users/known-issues)
* [Gaia data credits and acknowledgements](https://www.cosmos.esa.int/web/gaia-users/credits)

In [1]:
import os
from datetime import datetime
import numpy as np
import pandas as pd
from astropy import units as u
from astropy_healpix import HEALPix

## Set input variables

Default input paramers:
* DR3 = True ; Default Value. Set it to False to retrieve EDR3 files
* target_table = 'gaia_source' ; Alternative values: 'Astrophysical_parameters/astrophysical_parameters', 'Variability/vari_cepheid', etc - see all the content in: http://cdn.gea.esac.esa.int/Gaia/gdr3/ & http://cdn.gea.esac.esa.int/Gaia/gedr3/
* Cone-search parameters: radius = 0.5 degrees, centred in the Large Magallanic Cloud (in ICRS coordinates).
* Healpix-level = 6 (choose a larger one to increase granularity, and viceversa).

In [2]:
# Set input parameters below ===========
DR3            = True            # Set it to False to select EDR3
target_table   = 'gaia_source'   # Alternative values: 'Astrophysical_parameters/astrophysical_parameters/', etc
hpx_level      = 6
lon            =  80.894 * u.deg # Right Ascencion (ICRS)
lat            = -69.756 * u.deg # Declination (ICRS)
radius         = 0.5     * u.deg

print(f'Input Variables: ')
print(f'* HEALPix level                      = {hpx_level} ')
print(f'* ICRS longitude (~ Right Ascension) = {lon} ')
print(f'* ICRS latitude  (~ Declination)     = {lat} ')
print(f'* Radius                             = {radius} ')
print()

Input Variables: 
* HEALPix level                      = 6 
* ICRS longitude (~ Right Ascension) = 80.894 deg 
* ICRS latitude  (~ Declination)     = -69.756 deg 
* Radius                             = 0.5 deg 



In [5]:
# Download basic parameters ============
output_file    = 'bulk_download_files.txt'
output_dir     = 'downloads'

if os.path.isdir(f'{output_dir}'):
    now          = datetime.now()
    output_dir_2 = output_dir + now.strftime("_%Y-%m-%d")
    print(f'>> Warning: {output_dir} directory exist. Creating alternative directory: {output_dir_2}')
    print()
    output_dir   = output_dir_2
else:
    os.system(f'mkdir {output_dir}') 

print(f'Files will be downloaded to: {output_dir}')    

Files will be downloaded to: downloads


## Create reference file

In [9]:
if DR3: 
    gaia_dr_flag = 'DR3'
else:
    gaia_dr_flag = 'EDR3'


print('='*120)
print(f'Preparing selection of Gaia {gaia_dr_flag}: ""{target_table}" files')
print('='*120)


url_prefix      = f'http://cdn.gea.esac.esa.int/Gaia/g{gaia_dr_flag.lower()}/{target_table}/'
md5sum_file_url = url_prefix + '_MD5SUM.txt'
md5sum_file     = pd.read_csv(md5sum_file_url, header=None, delim_whitespace=True, names=['md5Sum', 'file'])

if DR3:
    md5sum_file.drop(md5sum_file.tail(1).index,inplace=True) # The last row in the "_MD5SUM.txt" file in the DR3 directories includes the md5Sum value of the _MD5SUM.txt file
md5sum_file

Preparing selection of Gaia DR3: ""gaia_source" files


Unnamed: 0,md5Sum,file
0,52d800747f32a025bdcf1594bbbad29b,GaiaSource_000000-003111.csv.gz
1,aa91dfad8cc5122f109aacb462fb8304,GaiaSource_003112-005263.csv.gz
2,0ee8a887c3db8cb5110354c42289b0a2,GaiaSource_005264-006601.csv.gz
3,11ced35b2cead1950331222f6c5690f0,GaiaSource_006602-007952.csv.gz
4,b178d83fbe020b1f131c0aedcdc0cd29,GaiaSource_007953-010234.csv.gz
...,...,...
3381,20c69195b88742d586b0227d1831893a,GaiaSource_783518-784479.csv.gz
3382,bf8bf6562d676df85787a23e62a05b79,GaiaSource_784480-784992.csv.gz
3383,158b295db422e9d024b8f6c8429aaa9d,GaiaSource_784993-785417.csv.gz
3384,57c5737e81e2548ff0f1d57d7b667096,GaiaSource_785418-786096.csv.gz


In [10]:
# Extract HEALPix level-8 from file name ======================================
healpix_8_min  = [int(file[file.find('_')+1:file.rfind('-')])     for file in md5sum_file['file']]
healpix_8_max  = [int(file[file.rfind('-')+1:file.rfind('.csv')]) for file in md5sum_file['file']]
reference_file = pd.DataFrame({'file':md5sum_file['file'], 'healpix8_min':healpix_8_min, 'healpix8_max':healpix_8_max}).reset_index(drop=True)

# Compute HEALPix levels 6,7, and 9 ===========================================
reference_file['healpix7_min'] = [inp >> 2 for inp in reference_file['healpix8_min']]
reference_file['healpix7_max'] = [inp >> 2 for inp in reference_file['healpix8_max']]

reference_file['healpix6_min'] = [inp >> 2 for inp in reference_file['healpix7_min']]
reference_file['healpix6_max'] = [inp >> 2 for inp in reference_file['healpix7_max']]

reference_file['healpix9_min'] = [inp << 2       for inp in reference_file['healpix8_min']]
reference_file['healpix9_max'] = [(inp << 2) + 3 for inp in reference_file['healpix8_max']]

# Generate reference file =====================================================
ncols          = ['file', 'healpix6_min', 'healpix6_max', 'healpix7_min', 'healpix7_max', 'healpix8_min', 'healpix8_max', 'healpix9_min', 'healpix9_max']
reference_file = reference_file[ncols]
reference_file

Unnamed: 0,file,healpix6_min,healpix6_max,healpix7_min,healpix7_max,healpix8_min,healpix8_max,healpix9_min,healpix9_max
0,GaiaSource_000000-003111.csv.gz,0,194,0,777,0,3111,0,12447
1,GaiaSource_003112-005263.csv.gz,194,328,778,1315,3112,5263,12448,21055
2,GaiaSource_005264-006601.csv.gz,329,412,1316,1650,5264,6601,21056,26407
3,GaiaSource_006602-007952.csv.gz,412,497,1650,1988,6602,7952,26408,31811
4,GaiaSource_007953-010234.csv.gz,497,639,1988,2558,7953,10234,31812,40939
...,...,...,...,...,...,...,...,...,...
3381,GaiaSource_783518-784479.csv.gz,48969,49029,195879,196119,783518,784479,3134072,3137919
3382,GaiaSource_784480-784992.csv.gz,49030,49062,196120,196248,784480,784992,3137920,3139971
3383,GaiaSource_784993-785417.csv.gz,49062,49088,196248,196354,784993,785417,3139972,3141671
3384,GaiaSource_785418-786096.csv.gz,49088,49131,196354,196524,785418,786096,3141672,3144387


## Compute Healpix indexes associated to the selected  circular region

In [11]:
print('='*120)
print(f'Computing HEALPix Level {hpx_level} encompasing a Cone Search (Radius, longitude, latitude): {radius.value} {radius.unit},  {lon.value} {lon.unit}, {lat.value} {lat.unit}')
print('='*120)

hp             = HEALPix(nside=2**hpx_level, order='nested')
hp_cone_search = hp.cone_search_lonlat(lon, lat, radius = radius)

Computing HEALPix Level 6 encompasing a Cone Search (Radius, longitude, latitude): 0.5 deg,  80.894 deg, -69.756 deg


## Download files

A .txt file with the list of files to be downloaded will be firts created. This file will be read and a secuencial download of all the files listed will start. A progress message will be in the terminal from where this Notebook was launched.

In [12]:
f = open(output_file, "w")

subset     = []
for index in reference_file.index:
    row = reference_file.iloc[index]
    hp_min, hp_max = row[f'healpix{hpx_level}_min'], row[f'healpix{hpx_level}_max']
    if np.any(np.logical_and(hp_min <= hp_cone_search, hp_cone_search <= hp_max)):
        bulk_file = url_prefix + row['file'] + '\n'
        f.write(bulk_file)
        subset.append(bulk_file)
f.close()

print('='*120)
print(f'A total of {len(subset)} files for download were written in {output_file}')
print('='*120)

A total of 8 files for download were written in bulk_download_files.txt


In [None]:
print('='*120)
print(f'Bulk download files are stored in directory: {output_dir}')
print('='*120)

os.system(f'wget -i {output_file} -P {output_dir}/ -q  --show-progress --progress=bar:force 2>&1')