<a href="https://colab.research.google.com/github/OllyK/Cata2Data/blob/colab/Copy_of_Create_LoTTS_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Create a LoTTS Dataset Using Cata2Data

To start, create a local clone of this repository

Install cata2data into your local environment (We recommend that you should use a venv on your local machine).



In [None]:
!git clone https://github.com/mb010/Cata2Data.git && pip install ./Cata2Data && cp Cata2Data/examples/lotssdr2/data.py .

## Download the data

Use the `data_scrapper.py` script to download the image files. If you want to just download one pointing (instead of all 841 pointings; 434 GB), then call it using the --test flag:

In [None]:
%%python Cata2Data/examples/lotssdr2/data_scrapper.py --dir downloaded_data/ --test

This will have downloaded a .fits image file

In [None]:
!ls downloaded_data/public/DR2/mosaics/P000+23/

Next, you need to download the catalog directly from the website at this link (3.9 GB). This dataloader is currently built to work with the [Radio-optical cross match](https://lofar-surveys.org/dr2_release.html#:~:text=Radio%2Doptical%20crossmatch%20catalogue) catalog described in [Hardcastle et al. 2023](https://arxiv.org/abs/2309.00102).

In [None]:
!wget -P downloaded_data/ https://lofar-surveys.org/public/DR2/catalogues/combined-release-v1.1-LM_opt_mass.fits

## Split the Catalogue

This will take the full catalog and split it into one catalog per image and save those into the folder where each of those images is stored. This is what Cata2Data currently expects - lists of images and catalogs with equal length to use to construct a dataloader.

In [None]:
%%python /content/Cata2Data/examples/lotssdr2/catalog_splitter.py --catalog_path downloaded_data/combined-release-v1.1-LM_opt_mass.fits --image_paths downloaded_data/public/DR2/mosaics/P000+23/

## Construct the dataset

Running the example cell below will construct a dataset from the data that has been downloaded. The LoTTSDataset class is imported from the [data.py file](https://github.com/mb010/Cata2Data/blob/main/examples/lotssdr2/data.py) before being populated with data from the `downloaded_data` directory. We then plot images for the first ten members of the dataset and print the first ten rows of the corresponding dataframe.

In [None]:
from data import LoTTSDataset
from torchvision.transforms import v2
import torch

transforms = v2.Compose(
    [
        v2.ToImage(),
        v2.ToDtype(torch.float32),
        v2.Resize(size=(64, 64)),
    ]
)

data = LoTTSDataset(
    data_folder="downloaded_data",  # Change this to where you saved your data
    cutout_scaling=1.5,
    transform=transforms,
)

for i in range(len(data)):
    if i > 10:
        break
    data.plot(
        i,
        contours=True,
        sigma_name="Isl_rms",
        min_sigma=2,
        title=data.df.iloc[i]["Source_Name"] + data.df.iloc[i]["S_Code"],
    )

data.df.head(10)