### Exercise 1: Collect data related to meteor showers

<font size="3"> **Unit 4 of 10** </font>

<font size="2">Step by step process of **Exercise 1** can be found here: https://learn.microsoft.com/en-us/training/modules/predict-meteor-showers-using-python/4-collect-data</font>

-----

<font size="3"> **References/Vocabs:** </font>
<font size="2"><p>
Now it's time to get data ready to create your prediction model. Remember, ideally a meteor expert would guide this step. But even without an expert, we can make a best guess at what data would help us identify the best date to see a meteor shower.

Before we gather data, it's important to identify the kind of data that we want to find. We know a few things:

- Meteoroids that melt off of comets typically cause meteor showers.
- Comets have an orbit around the sun that's observable and predictable.
- A bright Moon makes a meteor shower harder to see.
- The orbit and spin of Earth affects where a meteor shower can be seen from Earth.

<font size="3"> **> Select comets to focus on** </font>
<font size="2"><p>
Although meteoroids can come from comets, asteroids, moons, and planets, this module focuses on meteoroids that come from popular comets. We often use four comets *__(Comet Halley, Comet Swift-Tuttle, Comet Thatcher, Comet Tempel-Tuttle)__* to predict when and where meteor showers will be visible.  </p></font>

<font size="3"> **> Data files** </font>
<font size="2"><p>
We've started to gather some data for the example in this module. On your own, try to find other data that you can use to explore the predictions of meteor showers. For example, you can create new data files containing data for the current year or future years.

Here's the data we've already gathered:

- **moonphases.csv** - This file contains the Moon phases for every day of 2020. The missing data will be added in the next unit. (Data acquired from timeanddate.com) <p>
- **meteorshowers.csv** - This file contains data for each of the five meteor showers that we described earlier. Data includes their preferred viewing month, the months when they're visible, and the preferred hemisphere for viewing. (Data acquired from NASA) <p>
- **constellations.csv** - This file contains data for the four constellations that are radiants for the five meteor showers. Data includes the latitudes for which they're visible and the month for the best viewing. (Data acquired from Wikipedia.)<p>
- **cities.csv** - This file contains a list of country/regional capitals/major cities and their associated latitudes. (Data acquired from Wikipedia)  </p></font>

<font size="3"> **> Other data to consider** </font>
<font size="2"><p>
This module focuses on the four data files. But you **_can also gather other types of data that might affect the likelihood of viewing a meteor shower:_**

- Weather
- Other comets or known meteors
- City light pollution
 </p></font>

### Exercise 2: Cleanse meteor data

<font size="3"> **Unit 5 of 10** </font>

<font size="2">Step by step process of **Exercise 2** can be found here: https://learn.microsoft.com/en-us/training/modules/predict-meteor-showers-using-python/5-prep-data</font>

-----

#### Exercise 2.1

In [7]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

#### Exercise 2.2

In [8]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

meteor_showers.info()
meteor_showers.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Name                  5 non-null      object
 1   Radiant               5 non-null      object
 2   Best_Month            5 non-null      object
 3   Start_Month           5 non-null      object
 4   Start_Day             5 non-null      int64 
 5   End_Month             5 non-null      object
 6   End_Day               5 non-null      int64 
 7   Hemisphere            5 non-null      object
 8   Preferred_Hemisphere  5 non-null      object
dtypes: int64(2), object(7)
memory usage: 492.0+ bytes


Unnamed: 0,Name,Radiant,Best_Month,Start_Month,Start_Day,End_Month,End_Day,Hemisphere,Preferred_Hemisphere
0,Lyrids,Lyra,april,april,21,april,22,Northern,Northern
1,Eta Aquarids,Aquarius,may,april,19,may,28,"Northern, Southern",Southern
2,Orionids,Orion,october,october,2,november,7,"Northern, Southern","Northern, Southern"
3,Perseids,Perseus,august,july,14,august,24,Northern,Northern
4,Leonids,Leo,november,november,6,november,30,"Northern, Southern","Northern, Southern"


#### Exercise 2.3

In [9]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

moon_phases.info()
moon_phases.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Month          366 non-null    object
 1   Day            366 non-null    int64 
 2   Moon_Phase     50 non-null     object
 3   Special_Event  10 non-null     object
dtypes: int64(1), object(3)
memory usage: 11.6+ KB


Unnamed: 0,Month,Day,Moon_Phase,Special_Event
0,january,1,,
1,january,2,first quarter,
2,january,3,,
3,january,4,,
4,january,5,,


#### Exercise 2.4

In [10]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

constellations.info()
constellations.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Constellation   5 non-null      object
 1   Best_Month      5 non-null      object
 2   Latitude_Start  5 non-null      int64 
 3   Latitude_End    5 non-null      int64 
 4   Best_Time       5 non-null      object
 5   Hemisphere      5 non-null      object
dtypes: int64(2), object(4)
memory usage: 372.0+ bytes


Unnamed: 0,Constellation,Best_Month,Latitude_Start,Latitude_End,Best_Time,Hemisphere
0,Lyra,august,90,-40,21:00,Northern
1,Aquarius,october,65,-90,21:00,Southern
2,Orion,january,85,-75,21:00,Northern
3,Perseus,december,90,-35,21:00,Northern
4,Leo,april,90,65,21:00,Northern


#### Exercise 2.5

In [2]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

cities.info()
cities.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256 entries, 0 to 255
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   City      256 non-null    object 
 1   Latitude  256 non-null    float64
 2   Country   256 non-null    object 
dtypes: float64(1), object(2)
memory usage: 6.1+ KB


Unnamed: 0,City,Latitude,Country
0,Abu Dhabi,24.47,United Arab Emirates
1,Abuja,9.07,Nigeria
2,Accra,5.55,Ghana
3,Adamstown,-25.07,Pitcairn Islands
4,Addis Ababa,9.02,Ethiopia


#### Exercise 2.6

<font size="3"> **Convert to numbers** </font>
<font size="2"><p>
We can see from the calls to **_head()_** *that a lot of information is written in words (strings) instead of numbers (integers).* Some data makes sense as strings, like city names or meteor shower names. But other data makes more sense as integers, like months or Moon phases.

You can quickly convert the month columns to numbers:

1. Create a map of months to numbers. We can see from the output of head() that the months are all lowercase.
2. Map the map of months to the columns that have months in them.
3. Save the result to the DataFrame.

</p></font>

In [13]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers.info()
meteor_showers.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Name                  5 non-null      object
 1   Radiant               5 non-null      object
 2   Best_Month            5 non-null      int64 
 3   Start_Month           5 non-null      int64 
 4   Start_Day             5 non-null      int64 
 5   End_Month             5 non-null      int64 
 6   End_Day               5 non-null      int64 
 7   Hemisphere            5 non-null      object
 8   Preferred_Hemisphere  5 non-null      object
dtypes: int64(5), object(4)
memory usage: 492.0+ bytes


Unnamed: 0,Name,Radiant,Best_Month,Start_Month,Start_Day,End_Month,End_Day,Hemisphere,Preferred_Hemisphere
0,Lyrids,Lyra,4,4,21,4,22,Northern,Northern
1,Eta Aquarids,Aquarius,5,4,19,5,28,"Northern, Southern",Southern
2,Orionids,Orion,10,10,2,11,7,"Northern, Southern","Northern, Southern"
3,Perseids,Perseus,8,7,14,8,24,Northern,Northern
4,Leonids,Leo,11,11,6,11,30,"Northern, Southern","Northern, Southern"


#### Exercise 2.7

<font size="2"> Before you continue, *__convert months and days in the meteor_showers DataFrame to a type called datetime__*, which tracks dates.

These columns will contain a month and day in 2020: </font>

In [16]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
constellations = pd.read_csv('data/constellations.csv')
cities = pd.read_csv('data/cities.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.End_Month * 100 + meteor_showers.End_Day, format = '%Y%m%d')

meteor_showers.head()

Unnamed: 0,Name,Radiant,Best_Month,Start_Month,Start_Day,End_Month,End_Day,Hemisphere,Preferred_Hemisphere,Start Date,End Date
0,Lyrids,Lyra,4,4,21,4,22,Northern,Northern,2020-04-21,2020-04-22
1,Eta Aquarids,Aquarius,5,4,19,5,28,"Northern, Southern",Southern,2020-04-19,2020-05-28
2,Orionids,Orion,10,10,2,11,7,"Northern, Southern","Northern, Southern",2020-10-02,2020-11-07
3,Perseids,Perseus,8,7,14,8,24,Northern,Northern,2020-07-14,2020-08-24
4,Leonids,Leo,11,11,6,11,30,"Northern, Southern","Northern, Southern",2020-11-06,2020-11-30


#### Exercise 2.8

In [19]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
cities = pd.read_csv('data/cities.csv')
constellations = pd.read_csv('data/constellations.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
moon_phases['Date'] = pd.to_datetime(2020 * 10000 + moon_phases.Month * 100 + moon_phases.Day, format = '%Y%m%d')

meteor_showers.head()
moon_phases.head()


Unnamed: 0,Month,Day,Moon_Phase,Special_Event,Date
0,1,1,,,2020-01-01
1,1,2,first quarter,,2020-01-02
2,1,3,,,2020-01-03
3,1,4,,,2020-01-04
4,1,5,,,2020-01-05


#### Exercise 2.9

In [22]:
import numpy as np
import pandas as pd

meteor_showers = pd.read_csv('data/meteorshowers.csv')
moon_phases = pd.read_csv('data/moonphases.csv')
cities = pd.read_csv('data/cities.csv')
constellations = pd.read_csv('data/constellations.csv')

months = {'january':1, 'february':2, 'march':3, 'april':4, 'may':5, 'june':6, 'july':7, 'august':8, 'september':9, 'october':10, 'november':11, 'december':12}
meteor_showers.Best_Month = meteor_showers.Best_Month.map(months)
meteor_showers.Start_Month = meteor_showers.Start_Month.map(months)
meteor_showers.End_Month = meteor_showers.End_Month.map(months)
moon_phases.Month = moon_phases.Month.map(months)
constellations.Best_Month = constellations.Best_Month.map(months)

meteor_showers['Start Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.Start_Month * 100 + meteor_showers.Start_Day, format = '%Y%m%d')
meteor_showers['End Date'] = pd.to_datetime(2020 * 10000 + meteor_showers.End_Month * 100 + meteor_showers.End_Day, format = '%Y%m%d')

moon_phases['Date'] = pd.to_datetime(2020 * 10000 + moon_phases.Month * 100 + moon_phases.Day, format = '%Y%m%d')

hemispheres = {'northern':0, 'southern':1, 'northern, southern':3}
meteor_showers.Hemisphere = meteor_showers.Hemisphere.map(hemispheres)
constellations.Hemisphere = constellations.Hemisphere.map(hemispheres)

meteor_showers.head()
constellations.head()

Unnamed: 0,Constellation,Best_Month,Latitude_Start,Latitude_End,Best_Time,Hemisphere
0,Lyra,8,90,-40,21:00,
1,Aquarius,10,65,-90,21:00,
2,Orion,1,85,-75,21:00,
3,Perseus,12,90,-35,21:00,
4,Leo,4,90,65,21:00,


#### Exercise 2.10

In [None]:
import numpy as np
import pandas as pd

mon