# Data retrieval example

In order to run lephare we must download some input data. 

In this short notebook we a simple example which uses pooch to check if the required files have already been downloaded and to download them if not.

In [None]:
import os
import lephare

In [None]:
# Helper function for use in this notebook


def partial_print(print_list, number_lines):
    print(f"{len(print_list)} lines in list:\n")
    if len(print_list) < 2 * number_lines:
        for line in print_list:
            print(line)
    else:
        for line in print_list[:number_lines]:
            print(line)
        print("...")
        for line in print_list[-number_lines:]:
            print(line)

## Getting a list of file names to download

There are a couple ways to do this:
1. Using a list file, such as `QSO_MOD.list`
2. Using a list of target subdirectories, such as `["sed/GAL/", "filt/lsst/"]`

_(If you don't know what subset of data you need, we have included methods
    of downloading all of the data at once in the last section of this notebook.)_

In [None]:
# Getting a list of file names from a list file
# The list file can be a url or a path to a local list file

list_file = "https://raw.githubusercontent.com/OliviaLynn/LEPHARE-data/91006fcdf6a4b36932f1b5938e8d2084aca4a2e0/sed/QSO/QSO_MOD.list"
file_names = lephare.data_retrieval.read_list_file(list_file, prefix="")

partial_print(file_names, 3)

In [None]:
file_names[0]

In [None]:
# Or, alternatively, you can download files by subdirectory
# Here, we specify our desired subdirectories and get a list of the files they contain

# target_dirs = ["sed/GAL/", "filt/lsst/"]
# file_names = lephare.data_retrieval.filter_files_by_prefix(registry_file, target_dirs)
# partial_print(file_names, 4)

## Download the registry file
This will default to the default registry location at the default base url,
then output as the default registry file name, but these can be overridden
with the url and outfile keywords.

In [None]:
lephare.data_retrieval.download_registry_from_github()

# Or specify:
# lephare.data_retrieval.download_registry_from_github(url="my_url", outfile="my_file")

## Download the data files

In [None]:
# The parameters here are already the function's default values,
# but we explictly define them for examples' sake:
base_url = lephare.data_retrieval.DEFAULT_BASE_DATA_URL
registry_file = lephare.data_retrieval.DEFAULT_REGISTRY_FILE
data_path = lephare.LEPHAREDIR

retriever = lephare.data_retrieval.make_retriever(
    base_url=base_url, registry_file=registry_file, data_path=data_path
)

lephare.data_retrieval.download_all_files(retriever, file_names)

In [None]:
# If you run into problems with the registry, you can disable the requirement
# by setting ignore_registry=True:

# lephare.data_retrieval.download_all_files(retriever, file_names, ignore_registry=True)

# (Note that this is not recommended, as pooch will be unable to verify whether
# or not your local files are up-to-date, and each will be re-downloaded.)

## If you don't know what subset you need and want to get all the data at once

In [None]:
# Run get_auxiliary_data with no keymap set to clone the entire data repository
# to the default local data directory:

# data_retrieval.get_auxiliary_data()

In [None]:
# Or, grab the zip file from OSF: https://osf.io/mvpks/files/osfstorage

In [None]:
# Or, clone the data repo: https://github.com/lephare-photoz/lephare-data