# Solar Eclipses

What is the average duration of total darkness during a total solar eclipse? When did the longest solar eclipse occur?

The file `solar-eclipses.csv` provides information on all solar eclipses from 1901 to 2100, leveraging the ability of scientists to accurately predict the timing and location of future eclipses.

A solar eclipse happens when the Moon moves between the Earth and the Sun, blocking the Sun's light either partially or completely. There are three primary types of solar eclipses:

1. Total Solar Eclipse: The Moon fully covers the Sun, casting a shadow on Earth and revealing the Sun's corona.

2. Partial Solar Eclipse: The Moon obscures only part of the Sun, creating a crescent-shaped appearance.

3. Annular Solar Eclipse: The Moon covers the center of the Sun, leaving a ring-like appearance, known as a "ring of fire," around the edges. This occurs when the Moon is too far from Earth to completely cover the Sun.

The `duration` variable indicates the length of time that the entire moon covers the sun.


In [1]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'solar-eclipses.csv'.

# from google.colab import files
# uploaded = files.upload()

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', 1000)
df = pd.read_csv('solar-eclipses.csv')
df

Unnamed: 0,date,eclipse_type,magnitude,duration,region
0,05-18-1901,Total,1.068,06m29s,"s Asia, Australia, e Africa [Total: Indonesia, N Guinea, Madagascar]"
1,11-11-1901,Annular,0.922,11m01s,"ne Africa, Asia, w Europe [Annular: ne Africa, India, Sri Lanka, se Asia]"
2,04-08-1902,Partial,0.064,,northern Canada
3,05-07-1902,Partial,0.859,,"New Zealand, South Pacific"
4,10-31-1902,Partial,0.696,,"c Asia, e Europe"
...,...,...,...,...,...
439,10-24-2098,Partial,0.006,,Southern Ocean (near Antarctica)
440,03-21-2099,Annular,0.932,07m32s,"Australia, N.Z., Antarctica, N. America [Annular: Central Pacific]"
441,09-14-2099,Total,1.068,05m18s,"Americas, Africa [Total: Canada, U.S., Central Atlantic]"
442,03-10-2100,Annular,0.934,07m29s,"Australia, N. America [Annular: Central Pacific, U.S.]"


### Project Ideas:

- When did the longest solar eclipse occur? The longest total eclipse?
	- Hint: convert duration to seconds. You can use `str.replace('s', '')` to remove the 's' from the duration values.

- What is the average duration of total solar eclipses? 

- Show the next 10 solar eclipses?
	- Hint: convert date to datetime.


In [8]:
# YOUR CODE HERE (add additional cells as needed)

# When did the longest solar eclipse occur? The longest total eclipse?
# Hint: convert duration to seconds. You can use `str.replace('s', '')` to remove the 's' from the duration values.

import pandas as pd
import numpy as np

# Load data
df = pd.read_csv("solar-eclipses.csv")

# Function to convert eclipse duration to seconds
def parse_duration(duration):
    if pd.isnull(duration):
        return np.nan
    try:
        minutes = int(duration.split('m')[0])
        seconds = int(duration.split('m')[1].replace('s', ''))
        return minutes * 60 + seconds
    except:
        return np.nan

# Apply the conversion
df['duration_seconds'] = df['duration'].apply(parse_duration)


In [9]:
# Longest eclipse overall

longest = df.loc[df['duration_seconds'].idxmax()]
print("Longest eclipse overall:\n", longest[['date', 'eclipse_type', 'duration']])

Longest eclipse overall:
 date            12-14-1955
eclipse_type       Annular
duration            12m09s
Name: 126, dtype: object


In [10]:
# Longest total eclipse

total_df = df[df['eclipse_type'] == 'Total']
longest_total = total_df.loc[total_df['duration_seconds'].idxmax()]
print("Longest total eclipse:\n", longest_total[['date', 'duration']])

Longest total eclipse:
 date        06-20-1955
duration        07m08s
Name: 125, dtype: object


In [11]:
# Average duration of total eclipses

avg_total_duration = total_df['duration_seconds'].mean()
print("Average total eclipse duration (seconds):", avg_total_duration)

Average total eclipse duration (seconds): 211.96296296296296


In [12]:
# Next 10 eclipses

df['date'] = pd.to_datetime(df['date'], format='%m-%d-%Y')
df_sorted = df.sort_values('date')

next_10 = df_sorted[df_sorted['date'] >= pd.Timestamp.today()].head(10)
print(next_10[['date', 'eclipse_type', 'duration']])

          date eclipse_type duration
275 2025-09-21      Partial      NaN
276 2026-02-17      Annular   02m20s
277 2026-08-12        Total   02m18s
278 2027-02-06      Annular   07m51s
279 2027-08-02        Total   06m23s
280 2028-01-26      Annular   10m27s
281 2028-07-22        Total   05m10s
282 2029-01-14      Partial      NaN
283 2029-06-12      Partial      NaN
284 2029-07-11      Partial      NaN
