![alt text](../img/explore-space-using-python.svg "Over the moon")
# Predict meteor showers by using Python and Visual Studio Code

In this module, you will:

* The basics of meteor showers: what they are and why we see them
* How to choose and collect appropriate data
* Strategies to cleanse and manipulate your data

It's important to identify the kind of data that we want to find. We know a few things:

* Meteor showers are typically caused by meteoroids that melt off of comets.
* Comets have an orbit around the sun that's observable and predictable.
* A bright Moon makes a meteor shower harder to see.
* The orbit and spin of Earth affects where a meteor shower can be seen from Earth.

Although meteoroids can come from comets, asteroids, moons, and planets, this module focuses on meteoroids that come from popular comets.

## Load Libraries


In [46]:
import numpy as np
import pandas as pd

## Import Data

In [47]:
meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases    = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities         = pd.read_csv('data/cities.csv')

## Appending to dataframes

We know that Fei Fei travels to Lunaria after the Moon Festival. Though we don't know exactly how long it takes her to prototype, test, and build a rocket to the Moon, we can make a guess.

The 2020 Chinese Moon Festival was on October 1. Because the rest of the dates that we use in this module are from 2020, let's use that date.

We need data for each of the DataFrames that we reference. So let's start with the meteor shower in the film. For Chang'e's meteor shower, let's choose the Draco constellation because it's where the Draconids meteor shower is likely to radiate from in early October. We'll use that meteor shower as inspiration for our fictional one:

In [48]:
change_meteor_shower = {
    'name'               : 'Chang\'e',
    'radiant'            : 'Draco',
    'bestmonth'          : 'october',
    'startmonth'         : 'october',
    'startday'           : 1,
    'endmonth'           : 'october',
    'endday'             : 31,
    'hemisphere'         : 'northern',
    'preferredhemisphere':'northern'
}

meteor_showers = meteor_showers.append(change_meteor_shower, ignore_index=True)

In [49]:
draco_constellation = {
    'constellation' : 'Draco',
    'bestmonth'     : 'july',
    'latitudestart' : 90,
    'latitudeend'   : -15,
    'besttime'      : 2100,
    'hemisphere'    : 'northern'
}

constellations = constellations.append(draco_constellation, ignore_index=True)

## Convert to Numbers
Some data makes sense as strings, like city names or meteor shower names. But other data makes more sense as integers, like months or Moon phases

In [50]:
months = {
    'january'   : 1, 
    'february'  : 2, 
    'march'     : 3, 
    'april'     : 4, 
    'may'       : 5,
    'june'      : 6, 
    'july'      : 7, 
    'august'    : 8, 
    'september' : 9, 
    'october'   : 10, 
    'november'  : 11, 
    'december'  : 12
}

meteor_showers.bestmonth  = meteor_showers.bestmonth.map(months)
meteor_showers.startmonth = meteor_showers.startmonth.map(months)
meteor_showers.endmonth   = meteor_showers.endmonth.map(months)
moon_phases.month         = moon_phases.month.map(months)
constellations.bestmonth  = constellations.bestmonth.map(months)

meteor_showers.head()

Unnamed: 0,name,radiant,bestmonth,startmonth,startday,endmonth,endday,hemisphere,preferredhemisphere
0,Lyrids,Lyra,4,4,21,4,22,northern,northern
1,Eta Aquarids,Aquarius,5,4,19,5,28,"northern, southern",southern
2,Orionids,Orion,10,10,2,11,7,"northern, southern","northern, southern"
3,Perseids,Perseus,8,7,14,8,24,northern,northern
4,Leonids,Leo,11,11,6,11,30,"northern, southern","northern, southern"


# Converting days to datetime

In [51]:
meteor_showers['startdate'] = pd.to_datetime(2020 * 10000 + meteor_showers.startmonth *100 + meteor_showers.startday, format='%Y%m%d')
meteor_showers['enddate']   = pd.to_datetime(2020 * 10000 + meteor_showers.endmonth   *100 + meteor_showers.endday,   format='%Y%m%d')

moon_phases['date']         = pd.to_datetime(2020 * 10000 + moon_phases.month * 100 + moon_phases.day, format='%Y%m%d')

## Mapping Hemispheres

In [52]:
hemispheres = {
    'northern' : 0, 
    'southern' : 1, 
    'northern, southern' : 3
}
meteor_showers.hemisphere = meteor_showers.hemisphere.map(hemispheres)
constellations.hemisphere = constellations.hemisphere.map(hemispheres)

## Convert Moon Phases to Number
Represent the percentage of the Moon that's visible

In [53]:
phases = {
    'new moon'      : 0,
    'third quarter' : 0.5, 
    'first quarter' : 0.5,
    'full moon'     : 1.0
}
moon_phases['percentage'] = moon_phases.moonphase.map(phases)
moon_phases.head()

Unnamed: 0,month,day,moonphase,specialevent,date,percentage
0,1,1,,,2020-01-01,
1,1,2,first quarter,,2020-01-02,0.5
2,1,3,,,2020-01-03,
3,1,4,,,2020-01-04,
4,1,5,,,2020-01-05,


## Remove Unnecessary Columns

In [54]:
meteor_showers = meteor_showers.drop([
    'startmonth', 
    'startday', 
    'endmonth', 
    'endday', 
    'hemisphere'
], axis=1)

moon_phases = moon_phases.drop([
    'month',
    'day',
    'moonphase',
    'specialevent'
], axis=1)

constellations = constellations.drop(['besttime'], axis=1)

## Missing Data
moon_phases dataframe has many missing percentages.

You see that the cycle of the Moon phases goes from 0 to 0.5 to 1 to 0.5 and then back to 0. So, you could conceivably make every value between 0 and 0.5 be 0.25. And you could make every value between 0.5 and 1 be 0.75.

You could get more detailed by figuring out a more accurate percentage on your own:

1. Import the math Python library
1. Create a variable to save the last phase that you saw.
1. Loop through each row and column in the moon_phases DataFrame.
1. If the value in the percentage column of a row is nan (null), then replace it with the last phase that you saw.
1. If the value isn't nan, then save the value as the last phase that you saw.
1. Show the info for the moon_phases DataFrame:

In [55]:
moon_phases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 2 columns):
date          366 non-null datetime64[ns]
percentage    50 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB


In [56]:
lastPhase = 0

for index, row in moon_phases.iterrows():
    if pd.isnull(row['percentage']):
        moon_phases.at[index,'percentage'] = lastPhase
    else:
        lastPhase = row['percentage']

moon_phases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 2 columns):
date          366 non-null datetime64[ns]
percentage    366 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB


# Write a predictor function
Now that you've cleaned up your datasets, you can begin to create a function that you'll use to make your prediction.

But first, make sure you know exactly what you want to predict: In a given city, on what date would you most likely see which meteor showers?

This module introduces a simplified way to examine data. Without using a lot of predictions, our method is a lot like a complex lookup table. You can later expand on the model with data like weather to make it more like a classical machine learning model.

The function that we write needs to:

1. Determine the latitude of a city.
1. Use that latitude to figure out which constellations are visible to that city.
1. Use the constellations to determine which meteor showers are visible to that city.
1. Use the meteor showers to determine the dates that they're visible.
1. Use the dates to find the optimal date that has the least amount of light from the Moon.

In [29]:
def predict_best_meteor_shower_viewing(city):
     # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['city'] == city, 'latitude'].iloc[0]
    return latitude

In [30]:
print(predict_best_meteor_shower_viewing('Abu Dhabi'))

24.47


## Use Latitude to determine constalation
Now that we have a city latitude, the next step is to use the latitude to determine which constellations are viewable in the city.

```
constellation_list = constellations.loc[
        (constellations['latitudestart'] >= latitude) & 
        (constellations['latitudeend']   <= latitude), 
        'constellation'].tolist()
```


In [31]:
def predict_best_meteor_shower_viewing(city):
     # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['city'] == city, 'latitude'].iloc[0]

    # Get the list of constellations that are viewable from that latitude
    constellation_list = constellations.loc[
        (constellations['latitudestart'] >= latitude) & 
        (constellations['latitudeend']   <= latitude), 
        'constellation'].tolist()

    return constellation_list

In [32]:
print(predict_best_meteor_shower_viewing('Abu Dhabi'))

['Lyra', 'Aquarius', 'Orion', 'Perseus']


## Create Output String

In [57]:
def predict_best_meteor_shower_viewing(city):
    # Create an empty string to return the message back to the user
    meteor_shower_string = ""

    if city not in cities.values:
        meteor_shower_string = f"Unfortunately, {city} isn't available for a prediction at this time."
        return meteor_shower_string

     # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['city'] == city, 'latitude'].iloc[0]

    # Get the list of constellations that are viewable from that latitude
    constellation_list = constellations.loc[
        (constellations['latitudestart'] >= latitude) & 
        (constellations['latitudeend']   <= latitude), 
        'constellation'].tolist()

      # If no constellations are viewable, let the user know
    if not constellation_list:
        meteor_shower_string = f"Unfortunately, there are no meteor showers viewable from {city}."
        return meteor_shower_string

    return constellation_list

In [58]:
print(predict_best_meteor_shower_viewing('Oxnard'))

Unfortunately, Oxnard isn't available for a prediction at this time.


## Determine which meteor showers are visible

Meteor showers are often associated with a constellation that's used to indicate where in the sky you should look for the meteor shower. So we can use these constellations to determine which meteor showers are visible.

In any given city, you're likely to see multiple constellations. So for this next part, loop through each of the constellations that were found in the previous step.

In [59]:
def predict_best_meteor_shower_viewing(city):
    # Create an empty string to return the message back to the user
    meteor_shower_string = ""

    if city not in cities.values:
        meteor_shower_string = f"Unfortunately, {city} isn't available for a prediction at this time."
        return meteor_shower_string

     # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['city'] == city, 'latitude'].iloc[0]

    # Get the list of constellations that are viewable from that latitude
    constellation_list = constellations.loc[
        (constellations['latitudestart'] >= latitude) & 
        (constellations['latitudeend']   <= latitude), 
        'constellation'].tolist()

      # If no constellations are viewable, let the user know
    if not constellation_list:
        meteor_shower_string = f"Unfortunately, there are no meteor showers viewable from {city}."
        return meteor_shower_string

    meteor_shower_string = f"In {city} you can see the following meteor showers:\n"

    for constellation in constellation_list:
        # Find the meteor shower that is nearest to that constellation
        meteor_shower = meteor_showers.loc[meteor_showers['radiant'] == constellation, 'name'].iloc[0]

        # Find the start and end dates for that meteor shower
        meteor_shower_startdate = meteor_showers.loc[meteor_showers['radiant'] == constellation, 'startdate'].iloc[0]
        meteor_shower_enddate   = meteor_showers.loc[meteor_showers['radiant'] == constellation, 'enddate'].iloc[0]

        # Find the Moon phases for each date within the viewable time frame of that meteor shower
        moon_phases_list = moon_phases.loc[(moon_phases['date'] >= meteor_shower_startdate) & (moon_phases['date'] <= meteor_shower_enddate)]

        # Find the first date where the Moon is the least visible
        best_moon_date = moon_phases_list.loc[moon_phases_list['percentage'].idxmin()]['date']

        # Add that date to the string to report back to the user
        meteor_shower_string += meteor_shower + " is best seen if you look towards the " + constellation + " constellation on " +  best_moon_date.to_pydatetime().strftime("%B %d, %Y") + ".\n"

    return meteor_shower_string

In [60]:
print(predict_best_meteor_shower_viewing('Abu Dhabi'))

In Abu Dhabi you can see the following meteor showers:
Lyrids is best seen if you look towards the Lyra constellation on April 22, 2020.
Eta Aquarids is best seen if you look towards the Aquarius constellation on April 22, 2020.
Orionids is best seen if you look towards the Orion constellation on October 16, 2020.
Perseids is best seen if you look towards the Perseus constellation on July 20, 2020.
Chang'e is best seen if you look towards the Draco constellation on October 16, 2020.



In [61]:
print(predict_best_meteor_shower_viewing('Beijing'))

In Beijing you can see the following meteor showers:
Lyrids is best seen if you look towards the Lyra constellation on April 22, 2020.
Eta Aquarids is best seen if you look towards the Aquarius constellation on April 22, 2020.
Orionids is best seen if you look towards the Orion constellation on October 16, 2020.
Perseids is best seen if you look towards the Perseus constellation on July 20, 2020.
Chang'e is best seen if you look towards the Draco constellation on October 16, 2020.



## Complying with the Movie
Let's make one last change to our predictive algorithm to align with the film. Fei Fei travels to the Moon when it's big and bright, so we should make the viewing closer to 1. Change the predictive function after you get the moon_date_list and before the return statement:

In [62]:
def predict_best_meteor_shower_viewing(city):
    # Create an empty string to return the message back to the user
    meteor_shower_string = ""

    if city not in cities.values:
        meteor_shower_string = f"Unfortunately, {city} isn't available for a prediction at this time."
        return meteor_shower_string

     # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['city'] == city, 'latitude'].iloc[0]

    # Get the list of constellations that are viewable from that latitude
    constellation_list = constellations.loc[
        (constellations['latitudestart'] >= latitude) & 
        (constellations['latitudeend']   <= latitude), 
        'constellation'].tolist()

      # If no constellations are viewable, let the user know
    if not constellation_list:
        meteor_shower_string = f"Unfortunately, there are no meteor showers viewable from {city}."
        return meteor_shower_string

    meteor_shower_string = f"In {city} you can see the following meteor showers:\n"

    for constellation in constellation_list:
        # Find the meteor shower that is nearest to that constellation
        meteor_shower = meteor_showers.loc[meteor_showers['radiant'] == constellation, 'name'].iloc[0]

        # Find the start and end dates for that meteor shower
        meteor_shower_startdate = meteor_showers.loc[meteor_showers['radiant'] == constellation, 'startdate'].iloc[0]
        meteor_shower_enddate   = meteor_showers.loc[meteor_showers['radiant'] == constellation, 'enddate'].iloc[0]

        # Find the Moon phases for each date within the viewable time frame of that meteor shower
        moon_phases_list = moon_phases.loc[(moon_phases['date'] >= meteor_shower_startdate) & (moon_phases['date'] <= meteor_shower_enddate)]

        if meteor_shower == 'Chang\'e':
            # For the film meteor shower, find the date where the Moon is the most visible
            best_moon_date = moon_phases_list.loc[moon_phases_list['percentage'].idxmax()]['date']

            # Add that date to the string to report back to the user
            meteor_shower_string += "Though the Moon will be bright, " + meteor_shower + "'s meteor shower is best seen if you look towards the " + constellation + " constellation on " +  best_moon_date.to_pydatetime().strftime("%B %d, %Y") + ".\n"
        else:
            # Find the first date where the Moon is the least visible
            best_moon_date = moon_phases_list.loc[moon_phases_list['percentage'].idxmin()]['date']

            # Add that date to the string to report back to the user
            meteor_shower_string += meteor_shower + " is best seen if you look towards the " + constellation + " constellation on " +  best_moon_date.to_pydatetime().strftime("%B %d, %Y") + ".\n"

    return meteor_shower_string

In [63]:
print(predict_best_meteor_shower_viewing('Beijing'))

In Beijing you can see the following meteor showers:
Lyrids is best seen if you look towards the Lyra constellation on April 22, 2020.
Eta Aquarids is best seen if you look towards the Aquarius constellation on April 22, 2020.
Orionids is best seen if you look towards the Orion constellation on October 16, 2020.
Perseids is best seen if you look towards the Perseus constellation on July 20, 2020.
Though the Moon will be bright, Chang'e's meteor shower is best seen if you look towards the Draco constellation on October 01, 2020.

