# Data Preparation

Photon and spacecraft data are all that a user needs for the analysis. Preparing these data for analysis depends on the type of analysis you wish to perform (e.g. point source, extended source, GRB spectral analysis, timing analysis, etc). The different cuts to the data are described in detail in the following [link](https://fermi.gsfc.nasa.gov/ssc/data/analysis/documentation/Cicerone/Cicerone_Data/LAT_DP.html).

Data preparation consists of two steps:
* ([gtselect](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtselect.txt)): Used to make cuts based on columns in the event data file such as time, energy, position, zenith angle, instrument coordinates, event class, and event type (new in Pass 8).
* ([gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt)): In addition to cutting the selected events, gtmktime makes cuts based on the spacecraft file and updates the Good Time Interval (GTI) extension.


Here we give an example of how to prepare the data for the analysis of a point source. For your particular source analysis you have to prepare your data performing similar steps, but with the cuts suggested in Cicerone for your case.

## 1. Event Selection with gtselect

In this section, we look at making basic data cuts using [gtselect](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtselect.txt). By default, gtselect prompts for cuts on:
* Time
* Energy
* Position (RA,Dec,radius)
* Maximum Zenith Angle

However, by using the following hidden parameters (or using the '_Show Advanced Parameters_' check box in GUI mode), you can also make cuts on:

* Minimum Event class ID (``evclsmin``)
* Maximum Event class ID (``evclsmax``)
* Event conversion type ID (``convtype``)
* Minimum pulse phase (``phasemin``)
* Maximum pulse phase (``phasemax``)

For this example, we use data that was extracted from the [LAT Data Server](https://fermi.gsfc.nasa.gov/cgi-bin/ssc/LAT/LATDataQuery.cgi). The original selection used the following information:

* Search Center (RA,DEC) = (193.98,-5.82)
* Radius = 20 degrees
* Start Time (MET) = 239557417 seconds (2008-08-04 T15:43:37)
* Stop Time (MET) = 255398400 seconds (2009-02-04 T00:00:00)
* Minimum Energy = 100 MeV
* Maximum Energy = 500000 MeV



In [1]:
!wget https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/dataPreparation/L1506091032539665347F73_PH00.fits
!wget https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/dataPreparation/L1506091032539665347F73_PH01.fits
!wget https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/dataPreparation/L1506091032539665347F73_SC00.fits

--2025-09-04 09:37:11--  https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/dataPreparation/L1506091032539665347F73_PH00.fits
Resolving fermi.gsfc.nasa.gov (fermi.gsfc.nasa.gov)... 129.164.179.26
Connecting to fermi.gsfc.nasa.gov (fermi.gsfc.nasa.gov)|129.164.179.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9336960 (8.9M) [application/fits]
Saving to: ‘L1506091032539665347F73_PH00.fits’


2025-09-04 09:37:16 (2.02 MB/s) - ‘L1506091032539665347F73_PH00.fits’ saved [9336960/9336960]

--2025-09-04 09:37:16--  https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/dataPreparation/L1506091032539665347F73_PH01.fits
Resolving fermi.gsfc.nasa.gov (fermi.gsfc.nasa.gov)... 129.164.179.26
Connecting to fermi.gsfc.nasa.gov (fermi.gsfc.nasa.gov)|129.164.179.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16439040 (16M) [application/fits]
Saving to: ‘L1506091032539665347F73_PH01.fits’


2025-09-04 09:37:25 (1.91 MB/s) - ‘L150

In [2]:
!mkdir data
!mv *.fits ./data

If more than one photon file was returned by the [LAT Data Server](https://fermi.gsfc.nasa.gov/cgi-bin/ssc/LAT/LATDataQuery.cgi), we will need to provide an input file list in order to use all the event data files in the same analysis. This text file can be generated by typing:

In [3]:
!ls ./data/*_PH* > ./data/events.txt

In [4]:
!cat ./data/events.txt

./data/L1506091032539665347F73_PH00.fits
./data/L1506091032539665347F73_PH01.fits


This input file list can be used in place of a single input events (or FT1) file by placing an `@` symbol before the text filename. The output from [gtselect](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtselect.txt) will be a single file containing all events from the combined file list that satisfy the other specified cuts.

For a simple point source analysis, it is recommended that you only include events with a high probability of being photons. This cut is performed by selecting "source" class events with the the [gtselect](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtselect.txt) tool by including the hidden parameter ``evclass`` on the command line. For LAT Pass 8 data, `source` events are specified as event class 128 (the default value).

Additionally, in Pass 8, you can supply the hidden parameter `evtype` (event type) which is a sub-selection on `evclass`. For a simple analysis, we wish to include all front+back converting events within all PSF and Energy subclasses. This is specified as `evtype` 3 (the default value).

The recommended values for both `evclass` and `evtype` may change as LAT data processing develops.

Now run [gtselect](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtselect.txt) to select the data you wish to analyze. For this example, we consider the "source class" photons within a 20 degree acceptance cone of the blazar 3C 279. We apply the **gtselect** tool to the data file as follows:

In [5]:
%%bash
gtselect evclass=128 evtype=3
    @./data/events.txt
    ./data/3C279_region_filtered.fits
    193.98
    -5.82
    20
    INDEF
    INDEF
    100
    500000
    90

#### Parameters:
# Input file or files (if multiple files are in a .txt file,
#        don't forget the @ symbol)
# Output file
# RA for new search center
# Dec or new search center
# Radius of the new search region
# Start time (MET in s)
# End time (MET in s)
# Lower energy limit (MeV)
# Upper energy limit (MeV)
# Maximum zenith angle value (degrees)

Input FT1 file[@./data/binned_events.txt]     @./data/events.txt
Output FT1 file[./data/3C279_back_filtered.fits]     ./data/3C279_region_filtered.fits
RA for new search center (degrees) (0:360) [193.98]     193.98
Dec for new search center (degrees) (-90:90) [-5.82]     -5.82
radius of new search region (degrees) (0:180) [15]     20
start time (MET in s) (0:) [239557417]     INDEF
end time (MET in s) (0:) [302572802]     INDEF
lower energy limit (MeV) (0:) [100]     100
upper energy limit (MeV) (0:) [500000]     500000
maximum zenith angle value (degrees) (0:180) [90]     90
Done.


The filtered data will be found in the file `./data/3C279_region_filtered.fits`.

**Note**: If you don't want to make a selection on a given parameter, just enter a zero (0) as the value.

In this step we also selected the maximum zenith angle value as suggested in the [Cicerone](https://fermi.gsfc.nasa.gov/ssc/data/analysis/documentation/Cicerone/Cicerone_Data_Exploration/Data_preparation.html). Gamma-ray photons coming from the Earth limb ("albedo gammas") are a strong source of background. You can minimize this effect with a zenith angle cut. The value of `zmax` = 90 degrees is suggested for reconstructing events above 100 MeV and provides a sufficient buffer between your region of interest (ROI) and the Earth's limb.

In the next step, [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) will remove any time period for which our ROI overlaps this buffer region. While increasing the buffer (reducing `zmax`) may decrease the background rate from albedo gammas, it will also reduce the amount of time your ROI is completely free of the buffer zone and thus reduce the livetime on the source of interest.

**Notes**:

* The RA and Dec of the search center must exactly match that used in the dataserver selection. If they are not the same, multiple copies of the source position will appear in your prepared data file which will cause later stages of analysis to fail. See "DSS Keywords" below.


* The radius of the search region selected here must lie entirely within the region defined in the dataserver selection. They can be the same values, with no negative effects.


* The time span selected here must lie within the time span defined in the dataserver selection. They can be the same values with no negative effects.


* The energy range selected here must lie within the time span defined in the dataserver selection. They can be the same values with no negative effects.

**BE AWARE**: [gtselect](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtselect.txt) writes descriptions of the data selections to a series of _Data Sub-Space_ (DSS) keywords in the `EVENTS` extension header.

These keywords are used by the exposure-related tools and by [gtlike](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtlike.txt) for calculating various quantities, such as the predicted number of detected events given by the source model. These keywords MUST be same for all of the filtered event files considered in a given analysis.

[gtlike](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtlike.txt) will check to ensure that all of the DSS keywords are the same in all of the event data files. For a discussion of the DSS keywords see the [Data Sub-Space Keywords page](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/dss_keywords.html).

There are multiple ways to view information about your data file. For example:
* You may obtain the value of start and end time of your file by using the fkeypar tool. This tool is part of the [FTOOLS](http://heasarc.nasa.gov/lheasoft/ftools/ftools_menu.html) software package and is used to read the value of a FITS header keyword and write it to an output parameter file. For more information on  `fkeypar`, type: 

`fhelp fkeypar`

* The [gtvcut](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtvcut.txt) tool can be used to view the DSS keywords in a given extension, where the EVENTS extension is assumed by default. This is an excellent way to to find out what selections have been made already on your data file (by either the dataserver, or previous runs of gtselect).

    * NOTE: If you wish to view the (very long) list of good time intervals (GTIs), you can use the hidden parameter `suppress_gtis=no` on the command line. The full list of GTIs is suppressed by default.

## 2. Time Selection with gtmktime

Good Time Intervals (GTIs):

* A GTI is a time range when the data can be considered valid. The GTI extension contains a list of these GTI's for the file. Thus the sum of the entries in the GTI extension of a file corresponds to the time when the data in the file is "good."

* The initial list of GTI's are the times that the LAT was collecting data over the time range you selected. The LAT does not collect data while the observatory is transiting the South Atlantic Anomaly (SAA), or during rare events such as software updates or spacecraft maneuvers.

**Notes**:
* Your object will most likely not be in the field of view during the entire time that the LAT was taking data.

* Additional data cuts made with gtmktime will update the GTIs based on the cuts specified in both gtmktime and gtselect.

* The Fermitools use the GTIs when calculating exposure. If the GTIs have not been properly updated, the exposure correction made during science analysis may be incorrect.

[gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) is used to update the GTI extension and make cuts based on spacecraft parameters contained in the spacecraft (pointing and livetime history) file. It reads the spacecraft file and, based on the filter expression and specified cuts, creates a set of GTIs. These are then combined (logical and) with the existing GTIs in the Event data file, and all events outside this new set of GTIs are removed from the file. New GTIs are then written to the GTI extension of the new file.

Cuts can be made on any field in the spacecraft file by adding terms to the filter expression using C-style relational syntax:

    ! -> not, && -> and, || -> or, ==, !=, >, <, >=, <=

    ABS(), COS(), SIN(), etc., also work

>**NOTE**: Every time you specify an additional cut on time, ROI, zenith angle, event class, or event type using [gtselect](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtselect.txt), you must run [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) to reevaluate the GTI selection.

Several of the cuts made above with **gtselect** will directly affect the exposure. Running [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) will select the correct GTIs to handle these cuts.

It is also especially important to apply a zenith cut for small ROIs (< 20 degrees), as this brings your source of interest close to the Earth's limb. There are two different methods for handling the complex cut on zenith angle:

* One method is to exclude time intervals where the buffer zone defined by the zenith cut intersects the ROI from the list of GTIs. In order to do that, run [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) and answer "yes" at the prompt:

```
> gtmktime
...
> Apply ROI-based zenith angle cut [] yes
```

>**NOTE**: If you are studying a very broad region (or the whole sky) you would lose most (all) of your data when you implement the ROI-based zenith angle cut in [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt).
>
>In this case you can allow all time intervals where the cut intersects the ROI, but the intersection lies outside the FOV. To do this, run _gtmktime_ specifying a filter expression defining your analysis region, and answer "no" to the question regarding the ROI-based zenith angle cut:
>
>`> Apply ROI-based zenith angle cut [] no`
>
>Here, RA_of_center_ROI, DEC_of_center_ROI and radius_ROI correspond to the ROI selection made with gtselect, zenith_cut is defined as 90 degrees (as above), and limb_angle_minus_FOV is (zenith angle of horizon - FOV radius) where the zenith angle of the horizon is 113 degrees.

* Alternatively, you can apply the zenith cut to the livetime calculation while running [gtltcube](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtltcube.txt). This is the method that is currently recommended by the LAT team (see the [Livetimes and Exposure](https://fermi.gsfc.nasa.gov/ssc/data/analysis/documentation/Cicerone/Cicerone_Likelihood/Exposure.html) section of the [Cicerone](https://fermi.gsfc.nasa.gov/ssc/data/analysis/documentation/Cicerone/)), and is the method we will use most commonly in these analysis threads. To do this, answer "no" at the [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) prompt:

`> Apply ROI-based zenith angle cut [] no`

You'll then need to specify a value for gtltcube's `zmax` parameter when calculating the livetime cube:

`> gtltcube zmax=90`

[gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) also provides the ability to exclude periods when some event has negatively affected the quality of the LAT data. To do this, we select good time intervals (GTIs) by using a logical filter for any of the [quantities in the spacecraft file](https://fermi.gsfc.nasa.gov/ssc/data/analysis/documentation/Cicerone/Cicerone_Data/LAT_Data_Columns.html#SpacecraftFile). Some possible quantities for filtering data are:

* `DATA_QUAL` - quality flag set by the LAT instrument team (1 = ok, 2 = waiting review, 3 = good with bad parts, 0 = bad)

* `LAT_CONFIG` - instrument configuration (0 = not recommended for analysis, 1 = science configuration)

* `ROCK_ANGLE` - can be used to eliminate pointed observations from the dataset.

>**NOTE**: A history of the rocking profiles that have been used by the LAT can be found in the [SSC's LAT observations page.](https://fermi.gsfc.nasa.gov/ssc/observations/types/allsky/)

The current [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) filter expression recommended by the LAT team is:

**(DATA_QUAL>0)&&(LAT_CONFIG==1).**

>**NOTE**: The "DATA_QUAL" parameter can be set to different values, based on the type of object and analysis the user is interested into (see this page of the Cicerone for the most updated detailed description of the parameter's values). Typically, setting the parameter to 1 is the best option. For GRB analysis, on the contrary, the parameter should be set to ">0".

Here is an example of running [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt) on the 3C 279 filtered events file. For convenience, we rename the spacecraft file to  `spacecraft.fits`.

In [6]:
!mv ./data/L1506091032539665347F73_SC00.fits ./data/spacecraft.fits

Now, we run **gtmktime**:

In [7]:
%%bash
gtmktime
    ./data/spacecraft.fits
    (DATA_QUAL>0)&&(LAT_CONFIG==1)
    no
    ./data/3C279_region_filtered.fits
    ./data/3C279_region_filtered_gti.fits
    
#### Parameters specified above are:
# Spacecraft file
# Filter expression
# Apply ROI-based zenith angle cut
# Event data file
# Output event file name

Spacecraft data file[./data/L181126210218F4F0ED2738_SC00.fits]     ./data/spacecraft.fits
Filter expression[(DATA_QUAL>0)&&(LAT_CONFIG==1)]     (DATA_QUAL>0)&&(LAT_CONFIG==1)
Apply ROI-based zenith angle cut[no]     no
Event data file[./data/3C279_back_filtered.fits]     ./data/3C279_region_filtered.fits
Output event file name[./data/3C279_back_filtered_gti.fits]     ./data/3C279_region_filtered_gti.fits


In [8]:
!ls ./data/

3C279_region_filtered_gti.fits    L1506091032539665347F73_PH00.fits
3C279_region_filtered.fits        L1506091032539665347F73_PH01.fits
events.txt                        spacecraft.fits


The filtered event file, [3C279_region_filtered_gti.fits,](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/data/dataPreparation/3C279_region_filtered_gti.fits) output from [gtmktime](https://fermi.gsfc.nasa.gov/ssc/data/analysis/scitools/help/gtmktime.txt)  can be downloaded from the Fermi SSC site.

After the data preparation, it is advisable to examine your data before beginning detailed analysis. The [Explore LAT data](https://github.com/fermi-lat/AnalysisThreads/blob/master/DataSelection/3.ExploreLATData/explore_latdata.ipynb) tutorial has suggestions on methods of getting a quick preview of your data.