# Worksheet 4: I Just Need Some Space
In this worksheet, you will practice how to access data using functions. You will also experiment with plotting and text processing. Also, *space!*

## 0. Install Astroquery

We'll be using a code library called Astroquery. The Astroquery library is used to retrieve information from astronomical archives. It is likely that the library is not currently on your machine. Let's fix that. Open up a terminal, and run `pip install astroquery`.

You should see a bunch of lines print out very quickly. Just look at the last few lines - does it say something like "error" or "failure"? If yes, ask an instructor for help. If no, you are set.

Each time you start or restart your notebook, you should run the following cell to import all of the necessary libraries and set the relevant configurations.

In [None]:
# Reload code from other files as they are updated.
%load_ext autoreload
%autoreload 2

# Show plots directly in this notebook.
%matplotlib inline

# Import the libraries we use in this worksheet.
from astroquery.jplhorizons import Horizons
import matplotlib.pyplot as plt
import pandas as pd

## 1a. Practice using functions
Let's practice using this library. Because this is a worksheet, we will walk you through the steps. If you have questions, please take a look at [the documentation.](https://astroquery.readthedocs.io/en/latest/jplhorizons/jplhorizons.html) The specific Astroquery tool we will be using is called JPL Horizons. 

This accesses a service called Horizons, run out of NASA's Jet Propulsion Laboratory. This is an incredible free service that provides (among other things) the positions of every known thing in our solar system. (After class, if you would like, you may optionally listen to Carrie Nugent [talking to the guy](https://www.listentospacepod.com/episodes/2016/6/12/episode-49-every-known-object-in-the-solar-system-with-jon-giorgini?rq=giorgini) who made Horizons). Horizons has been used to solve problems like "Can I figure out when this painting was painted by the position of the stars?" 

Today, we will ask Horizons, via the Astroquery library, to give us the positions of things. Let's first ask it for the x, y, and z positions of Earth over a specific time range. To do this, run the following cell:

In [None]:
earth = Horizons(id="399", location="@0", id_type="id",
                 epochs={"start": "2020-07-31", "stop": "2021-02-18",
                         "step": "1d"})

# Extract just the vectors from the information retrieved 
earth_xyz = earth.vectors()

What did this code do? It asked for the location of Earth (Horizons calls Earth "399") relative to the Solar System Barycenter (aka `location="@0"`). That is the center of mass of the Solar System, which is almost, but not quite, the location of the Sun. You asked for its location between July 30th, 2020 and February 18th, 2021, in intervals of 1 day. Then you saved the vector as `earth_xyz`. These values are in Astronomical Units, which is the average distance from the Earth to the Sun.

As a note, this code was not intended to be intuitive - you probably could have figured some of it out, but you mostly would have had to read the documentation. The above explanation is intended to save you some time reading documentation so you can practice other skills with your time.

Now, let's take a look at one of these columns by running the cell below:

In [None]:
earth_xyz['x']

## 1b. Practice Plotting

Now, let's try to plot Earth's path around the sun, for the time range we queried. Run the following cell:

In [None]:
# Define our plot, which consists of a figure and axis.
fig = plt.figure()
ax = fig.add_subplot(111)

# Run this cell without uncommenting any of the below commands.
# Then uncomment the below one by one, re-running this cell each time.
# See what happens!

#plt.axis("square")
#plt.xlim(-1.2, 1.2)
#plt.ylim(-1.2, 1.2)
#plt.xlabel("Distance (AU)")
#plt.ylabel("Distance (AU)")

#plot and show
plt.scatter(earth_xyz["x"], earth_xyz["y"], marker=".")
plt.show()

### Direction
Let's make one more change to the graph. In the cell below, please again plot Earth's path around the Sun. But this time, use the code:

```python
color_time = range(len(earth_xyz["x"]))
```

This defines a list that is as long as `earth_xyz["x"]`.

Then, replace your plot command with

```python
plt.scatter(earth_xyz["x"],earth_xyz["y"], c=color_time, marker=".",
            cmap="winter")
```

This uses the Winter colormap to show directionality. Earlier points will be blue, later points will be green. You can see and choose [more colormaps here](https://matplotlib.org/stable/tutorials/colors/colormaps.html). Feel free to experiment!

*Solution*

## 1c. Add Mars!

Great! Now you've plotted Earth's path around the sun. In the code box below, repeat the process, except for Mars (`id="499"`). Use the same date range when querying Horizons. Plot the positions of Mars in the same plot as Earth. You'll want to include two lines of code something like this:

```python
plt.scatter(earth_xyz["x"], earth_xyz["y"], c=color_time, marker="."',
            cmap="winter")
plt.scatter(mars_xyz["x"], mars_xyz["y"], c=color_time, marker=".",
            cmap="spring")
```

*Hint*: Mars has a bigger orbit than Earth, so you will need to adjust your graph accordingly.

*Solution*

## 1d. Mystery object
Great! Now add one more mystery thing. Query horizons for something with `id="-168"` over the same date range. Plot it next to Mars and Earth. Pick a new colormap.

*Solution*

## 1e. Follow up questions
- Given the object's path, and the date range it traveled, can you figure out what it is?
- Try querying horizons for the mystery object, but on 2020-07-30. What happens? Why?
Write your answers below.

*Solution*


## 2. Reading In Data

It is useful to be able to read in and manipulate data. There are many ways to do this in Python, and all have their advantages. In this example, we've chosen to use [Pandas](https://www.youtube.com/watch?v=4NJlUribp3c) for this example. Please see the [course website](https://softdes.olin.edu/docs/readings/web-data/#pandas-data-in-tables) for more information on Pandas.

We'll show you how to read in and manipulate data in a file, and then you'll give it a try using another file.

We will start with a record of fireballs (asteroids that burned up in the atmosphere) that released more than 3 kilotons of energy. 3 kilotons of energy is a lot, so this gives us just the largest fireballs on record. The data is from the [Center for Near Earth Object Studies](https://cneos.jpl.nasa.gov/fireballs/). We've provided this data for you in `cneos_fireball_data.csv`. 

Please open up this file in a text editor. The filetype `.csv` stands for "comma-separated values", so you'll see data seperated by commas (`,`). When reading in text, it's helpful to take a peek first using a text editor, as long as the file is reasonably sized (less than 100 MB or so).

Run the following cell to read in the data:

In [None]:
fireball_dataframe = pd.read_csv("cneos_fireball_data.csv")

Now run the following cell to peek at the data.

In [None]:
print(fireball_dataframe.head(5))

You'll notice that units are included in the columns. J is jules, deg. is degrees, and UT is [Universal Time](https://en.wikipedia.org/wiki/Universal_Time).

Run the following cell to get one column of data, and try changing the column name to get different columns.

In [None]:
column_name = "Altitude_(km)"
print(fireball_dataframe[column_name])

Now, we can use Pandas to start answering questions. For example, when was the largest impact?

In [None]:
max(fireball_dataframe["Calculated_Total_Impact_Energy_(kt)"])

Great! Now let's ask Pandas which row of column `Calculated_Total_Impact_Energy_(kt)` has the greatest value. We will use `idxmax`, which is a function specific to Pandas that gives the highest value in a column. If we wanted the minimum, we'd use a function called `idxmin` instead. 
 
The syntax here might look a little different than other ways of running functions that you've seen in this course. Specifically, it follows the format `dataframe["column_name"].function()`. We will get to this later in the course, but for now, you can think of this as running `idxmax()` using information in `dataframe["column_name"]` as an argument.

Now run the following cell to find the row with the highest calculated total impact energy.

In [None]:
fireball_dataframe['Calculated_Total_Impact_Energy_(kt)'].idxmax()

Finally, we'll ask Pandas what date corresponds with row 10.

In [None]:
fireball_dataframe['Peak_Brightness_Date/Time_(UT)'][10]

Neat! That day was the [Chelyabinsk Meteor impact](https://en.wikipedia.org/wiki/Chelyabinsk_meteor). Carrie Nugent has a piece of it that she'll show you if you ask!

## 3. Your Turn!

In the code cell that follows, write code to read in the text file `cneos_closeapproach_data.csv` using Pandas. This is also from the [CNEOS website](https://cneos.jpl.nasa.gov/ca/). It is all the close approaches of asteroids in the next 30 days.

*Solution*

In [None]:
# Solution code here

Which object comes the closest to Earth? For this, you want Close Approach (CA) Distance in LD (Lunar Distances, the distance between the Earth and the Moon). Write your answer in the Markdown cell below.

*Solution*

When will the object above come the closest to earth? Write your answer in the Markdown cell below.

*Solution*

What object is the hardest to see? This is the object with the highest value of [H Magnitude](https://cneos.jpl.nasa.gov/glossary/h.html). Write your answer in the Markdown cell below.

*Solution*

## 4. (Optional) Challenge questions

1. The Voyager 1 and 2 spacecraft are ID -31 and -32, respectively. Earth is 399, Mars is 499, Jupiter is 599, etc. They were both launched in 1977, and are now outside our solar system, traveling through interstellar space. In the cell below, write code to plot the trajectories of Voyager 1 and 2, and, optionally, along with the planets they flew by.

2. Try to remake these plots in three dimensions using matplotlib. You can get the z coordinates of Earth, for example, by using `earth_xyz["z"]`. You may find [this tutorial](https://towardsdatascience.com/an-easy-introduction-to-3d-plotting-with-matplotlib-801561999725) useful. Do this in the code cell below.

3. Make a new plot of the same data that looks prettier by removing the tick marks and making the background black. You could also consider changing the colors, shapes, and sizes of the points. Do this in the code cell below.

4. Speaking of plots, plot a histogram of fireball impact energies. Do this in the code cell below.