# **AstrID:**  *data gathering*

In [None]:
# Import custom functions to extract Image arrays and Pixel Mask arrays from our created fits files dataset
from dataGathering import extractImageArray, extractPixelMaskArray, extract_star_catalog
from dataGathering import getStarData, getImagePlot, getPixelMaskPlot
from dataGathering import displayRawImage, displayRawPixelMask, displayImagePlot, displayPixelMaskPlot, displayPixelMaskOverlayPlot
from dataGathering import importDataset

## Getting Data for Training the Model

The `getStarData` function is a crucial part of our data preparation pipeline. It is responsible for generating and saving FITS files that contain both image data and star catalog data. These FITS files are then used to train our model. Below, we explain the functionality of the `getStarData` function and how it is used to prepare the dataset for training and validation.

### Functionality of `getStarData`



The `getStarData` function performs the following steps:

1. **Directory Creation**:
   - Creates a new directory named `data` if it does not already exist. This directory will store the generated FITS files.

2. **Coordinate Generation**:
   - Generates random coordinates (RA and Dec) while avoiding the galactic plane to ensure a diverse set of sky regions.

3. **Image Data Fetching**:
   - Uses the `SkyView` service to fetch image data from the DSS survey for the generated coordinates. The image data is saved as a FITS file.

4. **Star Catalog Fetching**:
   - Uses the `Vizier` service to fetch star catalog data for the same coordinates. The star catalog data is appended to the FITS file as a binary table HDU.

5. **Pixel Mask Creation**:
   - Creates a pixel mask indicating the positions of stars in the image. The pixel mask is saved as an additional HDU in the FITS file.

6. **Star Overlay Plot**:
   - Generates a plot of the image with star positions overlaid. This plot is saved as an image file and then converted to FITS format, appended to the original FITS file.



#### Using `getStarData` to Prepare the Dataset

To prepare the dataset for training the model, we use the `getStarData` function with the parameter `'data'`. This generates a set of FITS files containing image data, star catalog data, and pixel masks. These files are stored in the [fits](data/fits/) directory.

```python
# Generate training data
getStarData(catalog_type='II/246', iterations=250, filename='data')
```


In [None]:
getStarData('II/246', 250, 'data')


For validation purposes, we use the `getStarData` function with the filename parameter `'validate'`. This generates a separate set of FITS files for validation, ensuring that the files have the name `validate0.fits` for the [validateModel.ipynb](validateModel.ipynb) notebook.

```python
# Generate validation data
getStarData(catalog_type='II/246', iterations=50, filename='validate')
```


In [None]:
getStarData('II/246', 50, 'validate')


#### Summary

The `getStarData` function is essential for generating the FITS files that form the basis of our training and validation datasets. By fetching image data and star catalog data, and creating pixel masks, it ensures that our model has the necessary data to learn and validate its performance. The generated FITS files are then used in the [trainModel.ipynb](trainModel.ipynb) and [validateModel.ipynb](validateModel.ipynb) notebooks to train and validate the model, respectively.

### Importing Images and Star Data from the Dataset

In this section, we will import the images, masks, and star data from our prepared dataset using the `importDataset` function. This function reads the FITS files from the specified directory and extracts the necessary data for training our model. We will also display the key functions from the [`dataGathering`](dataGathering.py) module that are used in this process.

### Major Functionality of the `dataGathering` Module

The [`dataGathering`](dataGathering.py) module contains several important functions that facilitate the preparation and visualization of our dataset. Below, we provide an overview of the major functionalities:

1. **Data Extraction Functions**:
   - `extractImageArray`: Extracts image data from FITS files.
   - `extractPixelMaskArray`: Extracts pixel mask data from FITS files.
   - `extract_star_catalog`: Extracts star catalog data from FITS files.

2. **Data Import Function**:
   - `importDataset`: Imports the dataset by reading FITS files from a specified directory and extracting images, masks, and star data.

3. **Visualization Functions**:
   - `getImagePlot`: Generates a plot of the image data.
   - `getPixelMaskPlot`: Generates a plot of the pixel mask data.
   - `displayRawImage`: Displays the raw image data.
   - `displayRawPixelMask`: Displays the raw pixel mask data.
   - `displayImagePlot`: Displays the image plot.
   - `displayPixelMaskPlot`: Displays the pixel mask plot.
   - `displayPixelMaskOverlayPlot`: Displays an overlay plot of the image and pixel mask.

4. **Star Data Functions**:
   - `getStarData`: Generates and saves FITS files containing image data and star catalog data.
   - `importDataset`: Imports the dataset by reading FITS files from a specified directory and extracting images, masks, and star data.

These functions work together to streamline the process of preparing and visualizing our dataset, ensuring that we have high-quality data for training and validating our model.

---
### Now the data can be retrieved using the below function:

In [None]:
# Create images and masks arrays lists
images = []
masks = []

# Create df to store the star data inside each fits file
star_data = []

# Createa a list of all the fits files in the dataset
fits_files = []

images, masks, star_data, fits_files = importDataset(dataset_path='data/fits/', dataset_name='data')