# Automated Exoplanet Survey Pipeline

This notebook contains a complete, automated pipeline to search for exoplanet candidates in data from NASA's Kepler Space Telescope. The process involves:
1.  Defining a list of target stars.
2.  Looping through each star to download, clean, and prepare its light curve data.
3.  Running a Box-Least Squares (BLS) periodogram to search for transit signals.
4.  Identifying signals that cross a statistical detection threshold.
5.  Saving the results (period, signal strength) for promising candidates to a CSV file.
6.  Automatically generating and saving discovery plots (periodogram and folded transit) for each candidate.

## Step 1: Setup - Importing Libraries

First, we import all the necessary tools for our survey.

In [5]:
import lightkurve as lk
import numpy as np
import matplotlib.pyplot as plt
import csv
import os

print("Libraries imported successfully.")

Libraries imported successfully.


## Step 2: Survey Configuration

Here, we define the parameters for our survey. We'll specify our list of target stars, the names for our output files, and the statistical threshold for what we consider a significant detection.

In [6]:
# Let's pick a few known Kepler planet hosts to test our pipeline
target_stars = ['Kepler-10', 'Kepler-8', 'Kepler-4', 'Kepler-7']

output_csv_file = 'survey_candidates.csv'
output_plot_dir = 'candidate_plots'

# We'll only consider a signal significant if its normalized power is high.
# A value of 15 is a high bar, ensuring we only get strong signals.
DETECTION_THRESHOLD = 15

print("Survey parameters configured.")
print(f"Target stars: {target_stars}")
print(f"Detection threshold (Normalized Power): > {DETECTION_THRESHOLD}")

Survey parameters configured.
Target stars: ['Kepler-10', 'Kepler-8', 'Kepler-4', 'Kepler-7']
Detection threshold (Normalized Power): > 15


## Step 3: Initializing Output Files

Before we start the survey, we'll create the directory where our plots will be saved and initialize the CSV file with a header row.

In [7]:
if not os.path.exists(output_plot_dir):
    os.makedirs(output_plot_dir)
    print(f"Created directory: '{output_plot_dir}'")
else:
    print(f"Directory '{output_plot_dir}' already exists.")

with open(output_csv_file, 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Star_Name', 'Planet_Period_days', 'Signal_Power_Normalized'])

print(f"Results will be saved to '{output_csv_file}'")

Directory 'candidate_plots' already exists.
Results will be saved to 'survey_candidates.csv'


## Step 4: The Main Survey Loop

This is the core of our pipeline. The code in the cell below will loop through each star in our `target_stars` list and perform the full analysis. This process will take several minutes to complete as it involves downloading and processing a significant amount of data.

In [8]:
print(f"Starting the survey. Processing {len(target_stars)} stars...")

for star_name in target_stars:
    print(f"\n------------------\nAnalyzing: {star_name}")
    try:
        search_result = lk.search_lightcurve(star_name, author='Kepler')
        lc_collection = search_result[0:5].download_all()
        
        lc_raw = lc_collection.stitch().remove_outliers()
        cadence = np.median(np.diff(lc_raw.time.value))
        window_length = int(1 / cadence)
        lc = lc_raw.flatten(window_length=window_length)
        print(f"Data ready for {star_name}.")

        periods = np.linspace(0.5, 20, 10000) # Search for periods up to 20 days
        bls = lc.to_periodogram(method="bls", period=periods, duration=[0.05, 0.3])
        
        power_normalized = bls.power / np.std(bls.power)
        max_power = np.max(power_normalized)
        best_period = bls.period_at_max_power
        
        print(f"Analysis complete for {star_name}. Max signal power: {max_power:.2f}")

        if max_power > DETECTION_THRESHOLD:
            print(f"STRONG CANDIDATE FOUND for {star_name}!")
            print(f"Period: {best_period:.4f} days, Power: {max_power:.2f}")

            with open(output_csv_file, 'a', newline='') as f:
                writer = csv.writer(f)
                writer.writerow([star_name, f"{best_period.value:.4f}", f"{max_power:.2f}"])
            
            # Save the periodogram plot
            plt.figure()
            plt.plot(bls.period, power_normalized)
            plt.title(f"Periodogram for {star_name}")
            plt.xlabel("Period (days)")
            plt.ylabel("Normalized BLS Power")
            plt.savefig(os.path.join(output_plot_dir, f'{star_name}_periodogram.png'))
            plt.close() # Close the plot to save memory

            # Save the folded light curve plot
            plt.figure()
            folded_lc = lc.fold(period=best_period)
            folded_lc.plot()
            plt.title(f"Folded Transit for {star_name} (P={best_period.value:.2f}d)")
            plt.savefig(os.path.join(output_plot_dir, f'{star_name}_folded.png'))
            plt.close()
            
            print(f"Saved results and plots for {star_name}.")
        else:
            print(f"Signal for {star_name} is below threshold. Moving on.")

    except Exception as e:
        print(f"Could not process {star_name}. Reason: {e}")

print("\n------------------\nSurvey complete!")

Starting the survey. Processing 4 stars...

------------------
Analyzing: Kepler-10
Data ready for Kepler-10.


`period` contains 204459 points.Periodogram is likely to be large, and slow to evaluate. Consider setting `frequency_factor` to a higher value.


Analysis complete for Kepler-10. Max signal power: 60.82
STRONG CANDIDATE FOUND for Kepler-10!
Period: 0.8374 d days, Power: 60.82
Saved results and plots for Kepler-10.

------------------
Analyzing: Kepler-8
Data ready for Kepler-8.
Analysis complete for Kepler-8. Max signal power: 52.69
STRONG CANDIDATE FOUND for Kepler-8!
Period: 3.5228 d days, Power: 52.69
Saved results and plots for Kepler-8.

------------------
Analyzing: Kepler-4
Data ready for Kepler-4.
Analysis complete for Kepler-4. Max signal power: 28.48
STRONG CANDIDATE FOUND for Kepler-4!
Period: 3.2127 d days, Power: 28.48
Saved results and plots for Kepler-4.

------------------
Analyzing: Kepler-7


`period` contains 204458 points.Periodogram is likely to be large, and slow to evaluate. Consider setting `frequency_factor` to a higher value.


Data ready for Kepler-7.
Analysis complete for Kepler-7. Max signal power: 43.39
STRONG CANDIDATE FOUND for Kepler-7!
Period: 4.8860 d days, Power: 43.39
Saved results and plots for Kepler-7.

------------------
Survey complete!


<Figure size 640x480 with 0 Axes>

<Figure size 640x480 with 0 Axes>

<Figure size 640x480 with 0 Axes>

<Figure size 640x480 with 0 Axes>

## Step 5: Review the Results

The survey is finished! You can now check the output files:

1.  **`survey_candidates.csv`**: Open this file (with Excel, Google Sheets, or a text editor) to see a table of the planets the pipeline detected.
2.  **`candidate_plots/` directory**: Look inside this folder to see the periodogram and folded transit plots for each successful detection. These are your visual proof of discovery.