<H1>Downloading Data from ExoFOP</H1>

This notebook will serve as a guide for downloading tables of data from ExoFOP in Python using the `pandas` package. This can be used to fetch data for  input into projects that have been developed to use TESS data, but here we provide an example of how this can provide a simple way to allow the TOI data to be visualized.

We begin by importing the `pandas` package, as well as `matplotlib.pyplot` for visualizations.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

We first start by defining the url that we would like to download from. The url listed here is for the list of all TOIs and provides columns that are delimited using a pipe, '|'. As some text fields routinely use commas, the pipe-delimited format is more robust for being read in and parsed in Python. The url for a pipe-delimited format of a table on ExoFOP can be found by looking for an option under 'download table' similar to the TOI table's option of "All rows (pipe)" (red box).

<IMG SRC=notebook_images/download_link.png>

We use the read_csv function in pandas to read from this url and store it as a pandas dataframe, a <A HREF="https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe">2-D data structure with labeled columns</A>. As additional parameters, we specify the delimiter and set the index column to 1. Setting the index column in this way means that the index column will be the TOI number. This will mean that we can use the TOI number to easily look up a line of data rather than try to match column values.

By printing out the length of the dataframe, we can check how many TOIs are on ExoFOP. We can also look at all the data for a single TOI by using the TOI as an index, in this case TOI 664.01. This will show all columns that were gathered from the TOI table, and the values in those columns for TOI 664.01.

In [None]:
url="https://exofop.ipac.caltech.edu/tess/download_toi.php?sort=toi&output=pipe"
TOI_df=pd.read_csv(url, delimiter='|', index_col=1)
print("Number of TOIs:", len(TOI_df))
print(TOI_df.loc[664.01])

Pandas dataframes can also be easily parsed by columns in order to look at certain properties across all TOIs. First, we print out the 'TFOPWG Disposition' column, which will list the index (the TOI number) and the 'TFOPWG Disposition' for the TOIs in the table. We can also select a paricular TOI by using the TOI number to specify a single item. TOIs that have a blank TFOPWG Disposition will show up here as 'NaN', and these generally represent TOIs that are still active candidates that have not yet been identified as either a false positive or a confirmed planet.

We can also use some additional funtions of pandas data frames to explore the 'TFOPWG Disposition' column. The value_counts() function provides a count of how many occurences of each disposition are in the TOI list. By default, value_counts() will ignore any 'NaN' values but we can include them by setting dropna=False. Generally, these can be thought of as the same as having a PC disposition.

Pandas also has a plot function that can be used in conjunction with the value_counts table to plot them as a bar chart.Pandas also has a value_counts function that can be used for counting occurances in a column, and can be combined with a plot function to generate bar graphs.

In this case, we see that most TOIs are active candidates with no disposition set yet, and the majority of TOIs that do have a TFOPWG Disposition are 'FP's, or false positives.

In [None]:
print("All TOI dispositions:", TOI_df['TFOPWG Disposition'])
print("\nDisposition for TOI 664.01", TOI_df['TFOPWG Disposition'][664.01])
print(TOI_df['TFOPWG Disposition'].value_counts(dropna=False))
TOI_df['TFOPWG Disposition'].value_counts(dropna=False).plot(kind='bar')
plt.xlabel('Disposition')
plt.ylabel('Count')

We can also create 2D plots of pandas diagrams using matplotlib.pyplot (which we have imported as plt).

In this case, we are making a scatterplot of planet period and planet radius. The x and y axes have been set to a log scale.

In [None]:
plt.scatter(TOI_df['Period (days)'], TOI_df['Planet Radius (R_Earth)'], alpha=0.5)
plt.xlabel('Period (days)')
plt.ylabel('Planet Radius (R_Earth)')
plt.xscale('log')
plt.yscale('log')
plt.xlim(left=.2, right=100)
plt.show()