### Exercise 1: Collect data related to meteor showers

<font size="3"> **Unit 4 of 10** </font>

<font size="2">Step by step process of **Exercise 1** can be found here: https://learn.microsoft.com/en-us/training/modules/predict-meteor-showers-using-python/4-collect-data</font>

-----

<font size="3"> **References/Vocabs:** </font>
<font size="2"><p>
Now it's time to get data ready to create your prediction model. Remember, ideally a meteor expert would guide this step. But even without an expert, we can make a best guess at what data would help us identify the best date to see a meteor shower.

Before we gather data, it's important to identify the kind of data that we want to find. We know a few things:

- Meteoroids that melt off of comets typically cause meteor showers.
- Comets have an orbit around the sun that's observable and predictable.
- A bright Moon makes a meteor shower harder to see.
- The orbit and spin of Earth affects where a meteor shower can be seen from Earth.

<font size="3"> **> Select comets to focus on** </font>
<font size="2"><p>
Although meteoroids can come from comets, asteroids, moons, and planets, this module focuses on meteoroids that come from popular comets. We often use four comets *__(Comet Halley, Comet Swift-Tuttle, Comet Thatcher, Comet Tempel-Tuttle)__* to predict when and where meteor showers will be visible.  </p></font>

<font size="3"> **> Data files** </font>
<font size="2"><p>
We've started to gather some data for the example in this module. On your own, try to find other data that you can use to explore the predictions of meteor showers. For example, you can create new data files containing data for the current year or future years.

Here's the data we've already gathered:

- **moonphases.csv** - This file contains the Moon phases for every day of 2020. The missing data will be added in the next unit. (Data acquired from timeanddate.com) <p>
- **meteorshowers.csv** - This file contains data for each of the five meteor showers that we described earlier. Data includes their preferred viewing month, the months when they're visible, and the preferred hemisphere for viewing. (Data acquired from NASA) <p>
- **constellations.csv** - This file contains data for the four constellations that are radiants for the five meteor showers. Data includes the latitudes for which they're visible and the month for the best viewing. (Data acquired from Wikipedia.)<p>
- **cities.csv** - This file contains a list of country/regional capitals/major cities and their associated latitudes. (Data acquired from Wikipedia)  </p></font>

<font size="3"> **> Other data to consider** </font>
<font size="2"><p>
This module focuses on the four data files. But you **_can also gather other types of data that might affect the likelihood of viewing a meteor shower:_**

- Weather
- Other comets or known meteors
- City light pollution
 </p></font>

### Exercise 2: Cleanse meteor data

<font size="3"> **Unit 5 of 10** </font>

<font size="2">Step by step process of **Exercise 2** can be found here: https://learn.microsoft.com/en-us/training/modules/predict-meteor-showers-using-python/5-prep-data</font>

-----

#### Exercise 2.1

In [63]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

#### Exercise 2.2

In [64]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

meteor_showers.info()
meteor_showers.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Name                  5 non-null      object
 1   Radiant               5 non-null      object
 2   Best_Month            5 non-null      object
 3   Start_Month           5 non-null      object
 4   Start_Day             5 non-null      int64 
 5   End_Month             5 non-null      object
 6   End_Day               5 non-null      int64 
 7   Hemisphere            5 non-null      object
 8   Preferred_Hemisphere  5 non-null      object
dtypes: int64(2), object(7)
memory usage: 492.0+ bytes


Unnamed: 0,Name,Radiant,Best_Month,Start_Month,Start_Day,End_Month,End_Day,Hemisphere,Preferred_Hemisphere
0,Lyrids,Lyra,april,april,21,april,22,Northern,Northern
1,Eta Aquarids,Aquarius,may,april,19,may,28,"Northern, Southern",Southern
2,Orionids,Orion,october,october,2,november,7,"Northern, Southern","Northern, Southern"
3,Perseids,Perseus,august,july,14,august,24,Northern,Northern
4,Leonids,Leo,november,november,6,november,30,"Northern, Southern","Northern, Southern"


#### Exercise 2.3

In [65]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

moon_phases.info()
moon_phases.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Month          366 non-null    object
 1   Day            366 non-null    int64 
 2   Moon_Phase     50 non-null     object
 3   Special_Event  10 non-null     object
dtypes: int64(1), object(3)
memory usage: 11.6+ KB


Unnamed: 0,Month,Day,Moon_Phase,Special_Event
0,january,1,,
1,january,2,first quarter,
2,january,3,,
3,january,4,,
4,january,5,,


#### Exercise 2.4

In [66]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

constellations.info()
constellations.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Constellation   5 non-null      object
 1   Best_Month      5 non-null      object
 2   Latitude_Start  5 non-null      int64 
 3   Latitude_End    5 non-null      int64 
 4   Best_Time       5 non-null      object
 5   Hemisphere      5 non-null      object
dtypes: int64(2), object(4)
memory usage: 372.0+ bytes


Unnamed: 0,Constellation,Best_Month,Latitude_Start,Latitude_End,Best_Time,Hemisphere
0,Lyra,august,90,-40,21:00,Northern
1,Aquarius,october,65,-90,21:00,Southern
2,Orion,january,85,-75,21:00,Northern
3,Perseus,december,90,-35,21:00,Northern
4,Leo,april,90,65,21:00,Northern


#### Exercise 2.5

In [67]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

cities.info()
cities.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256 entries, 0 to 255
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   City      256 non-null    object 
 1   Latitude  256 non-null    float64
 2   Country   256 non-null    object 
dtypes: float64(1), object(2)
memory usage: 6.1+ KB


Unnamed: 0,City,Latitude,Country
0,Abu Dhabi,24.47,United Arab Emirates
1,Abuja,9.07,Nigeria
2,Accra,5.55,Ghana
3,Adamstown,-25.07,Pitcairn Islands
4,Addis Ababa,9.02,Ethiopia


#### Exercise 2.6

<font size="3"> **Convert to numbers** </font>
<font size="2"><p>
We can see from the calls to **_head()_** *that a lot of information is written in words (strings) instead of numbers (integers).* Some data makes sense as strings, like city names or meteor shower names. But other data makes more sense as integers, like months or Moon phases.

You can quickly convert the month columns to numbers:

1. Create a map of months to numbers. We can see from the output of head() that the months are all lowercase.
2. Map the map of months to the columns that have months in them.
3. Save the result to the DataFrame.

</p></font>

In [68]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers.info()
meteor_showers.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Name                  5 non-null      object
 1   Radiant               5 non-null      object
 2   Best_Month            5 non-null      int64 
 3   Start_Month           5 non-null      int64 
 4   Start_Day             5 non-null      int64 
 5   End_Month             5 non-null      int64 
 6   End_Day               5 non-null      int64 
 7   Hemisphere            5 non-null      object
 8   Preferred_Hemisphere  5 non-null      object
dtypes: int64(5), object(4)
memory usage: 492.0+ bytes


Unnamed: 0,Name,Radiant,Best_Month,Start_Month,Start_Day,End_Month,End_Day,Hemisphere,Preferred_Hemisphere
0,Lyrids,Lyra,4,4,21,4,22,Northern,Northern
1,Eta Aquarids,Aquarius,5,4,19,5,28,"Northern, Southern",Southern
2,Orionids,Orion,10,10,2,11,7,"Northern, Southern","Northern, Southern"
3,Perseids,Perseus,8,7,14,8,24,Northern,Northern
4,Leonids,Leo,11,11,6,11,30,"Northern, Southern","Northern, Southern"


#### Exercise 2.7

<font size="2"> Before you continue, *__convert months and days in the meteor_showers DataFrame to a type called datetime__*, which tracks dates.

These columns will contain a month and day in 2020: </font>

In [69]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.End_Month * 100 + meteor_showers.End_Day, format = '%Y%m%d')

meteor_showers.head()

Unnamed: 0,Name,Radiant,Best_Month,Start_Month,Start_Day,End_Month,End_Day,Hemisphere,Preferred_Hemisphere,Start Date,End Date
0,Lyrids,Lyra,4,4,21,4,22,Northern,Northern,2020-04-21,2020-04-22
1,Eta Aquarids,Aquarius,5,4,19,5,28,"Northern, Southern",Southern,2020-04-19,2020-05-28
2,Orionids,Orion,10,10,2,11,7,"Northern, Southern","Northern, Southern",2020-10-02,2020-11-07
3,Perseids,Perseus,8,7,14,8,24,Northern,Northern,2020-07-14,2020-08-24
4,Leonids,Leo,11,11,6,11,30,"Northern, Southern","Northern, Southern",2020-11-06,2020-11-30


#### Exercise 2.8

In [70]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
cities = pd.read_csv('data/cities.csv')
constellations = pd.read_csv('data/constellations.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
moon_phases['Date'] = pd.to_datetime(2020 * 10000 + moon_phases.Month * 100 + moon_phases.Day, format = '%Y%m%d')

meteor_showers.head()
moon_phases.head()


Unnamed: 0,Month,Day,Moon_Phase,Special_Event,Date
0,1,1,,,2020-01-01
1,1,2,first quarter,,2020-01-02
2,1,3,,,2020-01-03
3,1,4,,,2020-01-04
4,1,5,,,2020-01-05


#### Exercise 2.9

In [71]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
cities = pd.read_csv('data/cities.csv')
constellations = pd.read_csv('data/constellations.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.End_Month * 100 + meteor_showers.End_Day, format = '%Y%m%d')

moon_phases['Date'] = pd.to_datetime(2020 * 10000 + moon_phases.Month * 100 + moon_phases.Day, format = '%Y%m%d')

hemispheres = {'northern':0, 'southern':1, 'northern, southern':3}
meteor_showers.Hemisphere = meteor_showers.Hemisphere.map(hemispheres)
constellations.Hemisphere = constellations.Hemisphere.map(hemispheres)

meteor_showers.head()
constellations.head()

Unnamed: 0,Constellation,Best_Month,Latitude_Start,Latitude_End,Best_Time,Hemisphere
0,Lyra,8,90,-40,21:00,
1,Aquarius,10,65,-90,21:00,
2,Orion,1,85,-75,21:00,
3,Perseus,12,90,-35,21:00,
4,Leo,4,90,65,21:00,


#### Exercise 2.10

In [72]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
cities = pd.read_csv('data/cities.csv')
constellations = pd.read_csv('data/constellations.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.End_Month * 100 + meteor_showers.End_Day, format = '%Y%m%d')

moon_phases['Date'] = pd.to_datetime(2020 * 10000 + moon_phases.Month * 100 + moon_phases.Day, format = '%Y%m%d')

hemispheres = {'northern':0, 'southern':1, 'northern, southern':3}
meteor_showers.Hemisphere = meteor_showers.Hemisphere.map(hemispheres)
constellations.Hemisphere = constellations.Hemisphere.map(hemispheres)

phases = {'new moon':0,'third quarter':0.5, 'first quarter':0.5,'full moon':1.0}
moon_phases['Percentages'] = moon_phases.Moon_Phase.map(phases)

moon_phases.head()

Unnamed: 0,Month,Day,Moon_Phase,Special_Event,Date,Percentages
0,1,1,,,2020-01-01,
1,1,2,first quarter,,2020-01-02,0.5
2,1,3,,,2020-01-03,
3,1,4,,,2020-01-04,
4,1,5,,,2020-01-05,


#### Exercise 2.11
<font size="2">**Remove unnecessary data.** Some of the data from these .csv files isn't useful.</font>

In [73]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
cities = pd.read_csv('data/cities.csv')
constellations = pd.read_csv('data/constellations.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.End_Month * 100 + meteor_showers.End_Day, format = '%Y%m%d')

moon_phases['Date'] = pd.to_datetime(2020 * 10000 + moon_phases.Month * 100 + moon_phases.Day, format = '%Y%m%d')

hemispheres = {'northern':0, 'southern':1, 'northern, southern':3}
meteor_showers.Hemisphere = meteor_showers.Hemisphere.map(hemispheres)
constellations.Hemisphere = constellations.Hemisphere.map(hemispheres)

phases = {'new moon':0,'third quarter':0.5, 'first quarter':0.5,'full moon':1.0}
moon_phases['Percentage'] = moon_phases.Moon_Phase.map(phases)

meteor_showers = meteor_showers.drop(['Start_Month','Start_Day','End_Month', 'End_Day', 'Hemisphere'], axis = 1)
moon_phases = moon_phases.drop(['Month','Day','Moon_Phase','Special_Event'], axis = 1)
constellations = constellations.drop(['Best_Time'], axis = 1)

moon_phases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        366 non-null    datetime64[ns]
 1   Percentage  50 non-null     float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB


#### Exercise 2.12
<font size="2">You see that the cycle of the Moon phases goes from 0 to 0.5 to 1 to 0.5 and then back to 0. So, you could conceivably make every value between 0 and 0.5 be 0.25. And you could make every value between 0.5 and 1 be 0.75.

You could get more detailed by figuring out a more accurate percentage on your own:

1. Create a variable to save the last phase that you saw.

2. Loop through each row and column in the **_moon_phases_** DataFrame.

3. If the value in the percentage column of a row is NaN (null), then replace it with the last phase that you saw.

4. If the value isn't NaN, then save the value as the last phase that you saw.

5. Show the info for the **_moon_phases_** DataFrame: </font>

In [74]:
import numpy as  np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
cities = pd.read_csv('data/cities.csv')
constellations = pd.read_csv('data/constellations.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.End_Month * 100 + meteor_showers.End_Day, format = '%Y%m%d')

moon_phases['Date'] = pd.to_datetime(2020 * 10000 + moon_phases.Month * 100 + moon_phases.Day, format = '%Y%m%d')

hemispheres = {'northern':0, 'southern':1, 'northern, southern':3}
meteor_showers.Hemisphere = meteor_showers.Hemisphere.map(hemispheres)
constellations.Hemisphere = constellations.Hemisphere.map(hemispheres)

phases = {'new moon':0,'third quarter':0.5, 'first quarter':0.5,'full moon':1.0}
moon_phases['Percentage'] = moon_phases.Moon_Phase.map(phases)

meteor_showers = meteor_showers.drop(['Start_Month', 'Start_Day', 'End_Month', 'End_Day', 'Hemisphere'], axis = 1)
moon_phases = moon_phases.drop(['Month', 'Day', 'Moon_Phase', 'Special_Event'], axis = 1)
constellations = constellations.drop(['Best_Time'], axis = 1)

lastPhase = 0

for index, row in moon_phases.iterrows():
    if pd.isnull(row['Percentage']):
        moon_phases.at[index, 'Percentage'] = lastPhase
    else:
        lastPhase = row['Percentage']

moon_phases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        366 non-null    datetime64[ns]
 1   Percentage  366 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB


<font size="2">Now your data is cleansed and ready to be analyzed!</font>

### Exercise 3:  Write a predictor function - Part 1

<font size="3"> **Unit 6 of 10** </font>

<font size="2">Step by step process of **Exercise 3** can be found here: https://learn.microsoft.com/en-us/training/modules/predict-meteor-showers-using-python/6-start-search</font>

-----
<font size="2">
Now that you've cleaned up your datasets, you can begin to create a function that you'll use to make your prediction.<p>

But first, make sure you know exactly what you want to predict: In a given city, on what date would you most likely see which meteor showers?<p>

This module introduces a simplified way to examine data. Without using a lot of predictions, our method is a lot like a complex lookup table. You can later expand on the model with data like weather to make it more like a classical machine learning model.</font>

#### Exercise 3.1

<font size = "3">**Write the prediction function**</font><p>
<font size = "2">_Let's review our four datasets:_</font>

In [75]:
moon_phases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        366 non-null    datetime64[ns]
 1   Percentage  366 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 5.8 KB


In [76]:
meteor_showers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Name                  5 non-null      object        
 1   Radiant               5 non-null      object        
 2   Best_Month            5 non-null      int64         
 3   Preferred_Hemisphere  5 non-null      object        
 4   Start Date            5 non-null      datetime64[ns]
 5   End Date              5 non-null      datetime64[ns]
dtypes: datetime64[ns](2), int64(1), object(3)
memory usage: 372.0+ bytes


In [77]:
constellations.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Constellation   5 non-null      object 
 1   Best_Month      5 non-null      int64  
 2   Latitude_Start  5 non-null      int64  
 3   Latitude_End    5 non-null      int64  
 4   Hemisphere      0 non-null      float64
dtypes: float64(1), int64(3), object(1)
memory usage: 332.0+ bytes


In [78]:
cities.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256 entries, 0 to 255
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   City      256 non-null    object 
 1   Latitude  256 non-null    float64
 2   Country   256 non-null    object 
dtypes: float64(1), object(2)
memory usage: 6.1+ KB


<font size = "2">The function that we write needs to:

1. Determine the latitude of a city.
2. Use that latitude to figure out which constellations are visible to that city.
3. Use the constellations to determine which meteor showers are visible to that city.
4. Use the meteor showers to determine the dates that they're visible.
5. Use the dates to find the optimal date that has the least amount of light from the Moon.

**_Use these steps to build your function._**</font>

#### Exercise 3.2

In [79]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['City'] == city,'Latitude'].iloc[0]

#### Exercise 3.3

<font size = "4">**cities['City'] == '[name of city]'**</font>

<font size = "2">The **cities['city'] == [name of city]** line of code creates a list of true and false values. **True** *__will be on the row where the city is equal to the city that's passed in as a parameter.__*</font>

In [80]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

print(cities['City'] == 'Abuja')

0      False
1       True
2      False
3      False
4      False
       ...  
251    False
252    False
253    False
254    False
255    False
Name: City, Length: 256, dtype: bool


#### Exercise 3.4

<font size = "4">**cities.loc[cities['City'] == '[name of city]']**</font>

<font size = "2">The **cities.loc[cities['city'] == '[name of city]']** line of code returns the rows where the preceding true or false value is **True**. **_In this case, only one row is returned because our cities DataFrame has one row for each city._**

In [81]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

print(cities.loc[cities['City'] == 'Manila'])

       City  Latitude      Country
126  Manila     14.58  Philippines


#### Exercise 3.5

<font size = "4">**cities.loc[cities['City'] == '[name of city]', 'Latitude']**</font>

<font size = "2">The **cities.loc[cities['City'] == '[name of city]', 'Latitude']** line of **_code returns only the latitude column._** It doesn't return the entire row.

In [82]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

print(cities.loc[cities['City'] == 'Cairo', 'Latitude'])

44    30.05
Name: Latitude, dtype: float64


#### Exercise 3.6

<font size = "4">**latitude = cities.loc[cities['City'] == '[name of city]', 'Latitude'].iloc[0]**</font>

<font size = "2">Finally, *__the entire line of code returns the specific value of that column at row 0__*:</font>

In [83]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

print(cities.loc[cities['City'] == 'Manila', 'Latitude'].iloc[0])

14.58


#### Exercise 3.6

<font size = "3">**Call the function**</font>

<font size = "2">Now that you have a value, test your function to make sure it's working as you expect it to. Return the current value, and then call the function:</font>

In [84]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

    return latitude

In [85]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

    return latitude

print(predict_best_meteor_shower_viewing('London'))

51.5


### Exercise 4:  Write a predictor function - Part 2

<font size="3"> **Unit 7 of 10** </font>

<font size="2">Step by step process of **Exercise 4** can be found here: https://learn.microsoft.com/en-us/training/modules/predict-meteor-showers-using-python/7-continue-search</font>

-----
<font size="2"> As a reminder, we're following these steps to *__find the optimal date to view meteor showers in a particular capital/major city:__*

1. Determine the latitude of the city.
2. Use that latitude to figure out which constellations are visible to that city.
3. Use the constellations to determine which meteor showers are visible to that city.
4. Use the meteor showers to determine the dates that they're visible.
5. Use the dates to find the optimal date that has the least amount of light from the Moon.

</font>

#### Exercise 4.1

<font size = "3">**Use latitude to determine constellation**</font>

<font size = "2">Now that we have a city latitude, the next step is to *__use the latitude to determine which constellations are viewable in the city.__*</font>

In [87]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

    # Get the list of constellations that are viewable from that latitude
    constellation_list = constellations.loc[(constellations['Latitude_Start'] >= latitude) & (constellations['Latitude_End'] <= latitude), 'Constellation'].tolist()

In [86]:
def predict_best_meteor_shower_viewing(city):
    # Get the latitude of the city from the cities DataFrame
    latitude = cities.loc[cities['City'] == city, 'Latitude'].iloc[0]

    # Get the list of constellations that are viewable from that latitude
    constellation_list = constellations.loc[(constellations['Latitude_Start'] >= latitude) & (constellations['Latitude_End'] <= latitude), 'Constellation'].tolist()
    
    return constellation_list

print(predict_best_meteor_shower_viewing('Manila'))

['Lyra', 'Aquarius', 'Orion', 'Perseus']


#### Exercise 4.2

<font size = "3">**Create an output string**</font>

<font size = "2">Before continuing through the data dive, **_create a string that will contain all of the meteor showers viewable from that city. Include the best dates to view the meteor showers._**<p>

At this point, we can also account for the fact that we aren't representing all cities or all constellations. So some user inputs could result in errors. To the top of your function, add the following conditional statement:</font>