# lksearch Configuration and working with Cloud Data
lksearch has a `config` class and configuration file that can be used to configure the default behaviour of the package including how lksearch treats cloud data and where (or if) local files are cached. 

## lksearch File Download and Cache
The `lksearch` file cache is a directory where files are downloaded to.  This directory also serves as a cache directory, and if a file matching the name of the file to be downloaded exists we treat this as a cached file and by default do not overwrite the current file on disk.  

The default file download and cache directory is located at:
`~/.lksearch/cache`

This can be verified using the get_cache_dir convenience function in the config sub-module, e.g.:

In [1]:
from lksearch import config as lkconfig

lkconfig.get_cache_dir()

'/Users/tapritc2/.lksearch/cache'

### Clearing the Cache & Corrupted Files
If you wish to delete an individual file that you downloaded (for example, if you are concerned that a previously downloaded file is corrupted), the easiest way to do that is using the `Local Path` information in the manifest returned by the `.download()` function.

In [2]:
import os
from lksearch import K2Search

##First lets download a few files
K218 = K2Search("K2-18").HLSPs.timeseries
manifest = K218.download()
manifest

Unnamed: 0,Local Path,Status,Message,URL
0,/Users/tapritc2/.lksearch/cache/mastDownload/H...,COMPLETE,,
1,/Users/tapritc2/.lksearch/cache/mastDownload/H...,COMPLETE,,
2,/Users/tapritc2/.lksearch/cache/mastDownload/H...,COMPLETE,,


In [3]:
# The manifest returned by download() is a pandas DataFrame
# We will access the first local path using iloc as so
os.remove(manifest.iloc[0]["Local Path"])

If you want to clear *everything* from your cache, you can use the `config.clearcache()` function to completely empty your cache of downloaded files. by default this will run in "test" mode and print what you will be deleting.  To confirm deletion, run with `test=False` optional parameter.  

In [4]:
lkconfig.clearcache()

Running in test mode, rerun with test=False to clear cache
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESS
removing /Users/tapritc2/.lksearch/cache/mastDownload/K2
removing /Users/tapritc2/.lksearch/cache/mastDownload/Kepler
removing /Users/tapritc2/.lksearch/cache/mastDownload/TESSCut
removing /Users/tapritc2/.lksearch/cache/mastDownload/HLSP


**Passing `test=False` will then fully delete the above directories** 

e.g. `lkconfig.clearcache(test=False)`

#### Caching and File Downloads
By default, lksearch will use the `~astroquery.mast.Observations` built in caching functionality which includes a file-size check to verify the completeness of a download and static nature of a file.  This means that if a file is truncated mid-download, or gets updated on the MAST archive (rare but not unheard of), this will detect a mismatch and re-download the file.  This does, however, add some small overhead to file downloads and can add additional remote calls.  

If this behaviour is undesired, the CHECK_CACHED_FILE_SIZES configuration parameter can be set to `False` - this will cause the download function to check for the presence of a local file  with the appropriate name/location of the desired download, and if it exists pass the local file location with a modified `Status` tag.  

This will be lest robust but slightly faster and with potentially fewer remote calls.

For example, with file size checking:

In [5]:
from lksearch import conf

conf.reload()
conf.CHECK_CACHED_FILE_SIZES = True
K218[0].download()

Unnamed: 0,Local Path,Status,Message,URL
0,/Users/tapritc2/.lksearch/cache/mastDownload/H...,COMPLETE,,


And without file size checking:

In [6]:
conf.CHECK_CACHED_FILE_SIZES = False
K218[0].download()

Unnamed: 0,Local Path,Status,Message,URL
0,/Users/tapritc2/.lksearch/cache/mastDownload/H...,UNKNOWN,"File exists in local cache, has not been valid...",


### lksearch Configuration and Configuration File
lksearch has a number of configuration parameters, these are contained in the `~lksearch.Conf` [class](https://lightkurve.github.io/lksearch/apidoc.html#lksearch.Conf).  One can modify these parameters for a given python session by updating the values in the Conf class.  To modify these configuration parameters default values, lksearch also has an optional configuration file that is built on-top of `~astropy.config` using `~astropy.config.ConfigNamespace`. This file does not exist by default, but a default version can be created using the `config.create_config_file` helper function.  Modifications to the values in this file will then update the default `~lksearch.Conf` values.  

In [7]:
lkconfig.create_config_file(overwrite=True)

This file can be found in the below location.  To edit this, please see the astropy.config documentation.  

In [8]:
lkconfig.get_config_dir()

'/Users/tapritc2/.lksearch/config'

In [9]:
lkconfig.get_config_file()

'/Users/tapritc2/.lksearch/config/lksearch.cfg'

## lksearch Cloud Configuration
`lksearch` has three configuration parameters that are particularly relevant to cloud-based science platforms.  These are:
    - `CLOUD_ONLY`: Only Download cloud based data. If `False`, will download all data. If `True`, will only download data located on a cloud (Amazon S3) bucket
    - `PREFER_CLOUD`: Prefer Cloud-based data product retrieval where available
    - `DOWNLOAD_CLOUD`: Download cloud based data. If `False`, download() will return a pointer to the cloud based datainstead of downloading it - intended usage for cloud-based science platforms (e.g. TiKE)

`CLOUD_ONLY` governs whether or not non-cloud based data will be possible to be downloaded.  Many science files have both a cloud-based location (typically on Amazon S3) and a MAST archive location. By default this is `False`, and all products will be downloaded regardless of whether the file is available via cloud-hosting or MAST archive hosting. If `CLOUD_ONLY` is `True`, only files available for download on a cloud-based platform will be retrieved.  This configuration parameter is passed through to the `~astroquery.mast` parameter of the same name.  

`PREFER_CLOUD` governs the default download behaviour in the event that a data product is available from both a cloud-based location and a MAST-hosted archive location.  If `True` (default), then `lksearch` will preferentially download files from the cloud-host rather than the MAST-hosted Archive. This configuration parameter is passed through to the `~astroquery.mast` parameter of the same name.  

`DOWNLOAD_CLOUD` governs whether files that are hosted on the cloud are downloaded locally. If this value is `True` (default), cloud-hosted files are downloaded normally.  If `False`, then files hosted on a cloud based platform are not downloaded, and a URI containing the path to the desired file on the cloud-host is returned instead of the local path to the file.  This path can then be used to read the file remotely (see `~astropy.io.fits` [working with remote and cloud hosted files](https://docs.astropy.org/en/stable/io/fits/#working-with-remote-and-cloud-hosted-files:~:text=with%20large%20files-,Working%20with%20remote%20and%20cloud%2Dhosted%20files,-Unsigned%20integers) for more information). This ability may be most relevant when using `lksearch` on a cloud-based science platform where the remote read is very rapid and short-term local storage comparatively expensive.  

Using this `DOWNLOAD_CLOUD` functionality, we can find a cloud-hosted file and read it directly into memory like so:"

In [10]:
# First, lets update our configuration to not download a cloud-hosted file
from lksearch import Conf, TESSSearch

Conf.DOWNLOAD_CLOUD = False

# Now, lets find some data. We use this target earlier in the tutorial.
toi = TESSSearch("TOI 1161")

# What happens when we try to download it in our updated configuration?
cloud_result = toi.timeseries.mission_products[0].download()
cloud_result

Unnamed: 0,Local Path,Status,Message,URL
0,/Users/tapritc2/.lksearch/cache/mastDownload/T...,UNKNOWN,"File exists in local cache, has not been valid...",
0,0 s3://stpubdata/tess/public/tid/s0014/0000...,COMPLETE,Link to S3 bucket for remote read,


As we can see above, instead of downloading the above file we have instead returned an amazon S3 URI for its cloud hosted location.  If we want to access the file, we can do it using the remote-read capabilities of `~astropy.io.fits`.  

(Note: to do this you will need to install `fsspec` and `s3fs`.)

In [11]:
import astropy.io.fits as fits

with fits.open(
    cloud_result["Local Path"].values[0], use_fsspec=True, fsspec_kwargs={"anon": True}
) as hdu:
    for item in hdu:
        print(item.fileinfo())

{'file': <astropy.io.fits.file._File <fsspec.implementations.local.LocalFileOpener object at 0x150b62740>>, 'filemode': 'readonly', 'hdrLoc': 0, 'datLoc': 5760, 'datSpan': 0}
{'file': <astropy.io.fits.file._File <fsspec.implementations.local.LocalFileOpener object at 0x150b62740>>, 'filemode': 'readonly', 'hdrLoc': 5760, 'datLoc': 20160, 'datSpan': 1935360}
{'file': <astropy.io.fits.file._File <fsspec.implementations.local.LocalFileOpener object at 0x150b62740>>, 'filemode': 'readonly', 'hdrLoc': 1955520, 'datLoc': 1961280, 'datSpan': 2880}
