# Describe The Stars

For this exercise, our population will be the [240 stars selected by NASA](https://www.kaggle.com/datasets/brsdincer/star-type-classification) for their examples of star classification. The aim is to use scatter diagrams to look for anything 'interesting' in the data.

The first thing to do is to look at the data. So run this code block to see a summary of it.

In [None]:
import pandas as pd

stars = pd.read_csv("../../data/smaller-datasets/Stars.csv")
print(stars.info())
print(stars.head())


So we can see that we have 7 columns. 

```
Temperature, Luminosity, Radius, Abs_magnitude, Color, Spectral_Class, Type
```

You can google some of these terms if you want to learn more about them. Or you can continue with the next part of the investigation.

We are going to pick two of the numerical columns and plot a scatter diagram. (There is a reason why the x-axis is reversed. It has to do with a famous diagram I hope to get to later in the course.)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from ipywidgets import interact, Dropdown, SelectionSlider

@interact(
        x = Dropdown(
            options=["Temperature", "Luminosity", "Radius", "Abs_magnitude"],
            value="Temperature",
            description="x-axis",
            continuous_update=False
        ),
        y = Dropdown(
            options=["Temperature", "Luminosity", "Radius", "Abs_magnitude"],
            value="Abs_magnitude",
            description="y-axis",
            continuous_update=False
        ),
        use_log = SelectionSlider(
            options = ["given values", "logarithmic"],
            value = "given values",
            description= "mode"
        )
)
def pick_xy(x, y, use_log):
    x_vals = stars[x]
    y_vals = stars[y]

    fig, ax = plt.subplots()
    if use_log == "logarithmic":
        if y == "Color":
            ax.scatter(np.log(x_vals), y_vals)
        else:
            ax.scatter(np.log(x_vals), np.log(y_vals))
        x = "log of " + x
        y = "log of " + y
    else:
        ax.scatter(x_vals, y_vals)
    plt.gca().invert_xaxis()
    plt.xlabel(x)
    plt.ylabel(y)
    plt.title("Scatter graph of "+x+" vs "+ y)
    plt.show()


There are a few selections which indicate a pattern to the data. Bear in mind: we aren't even using all the data yet. We have ignored 3 of the columns. 

Combinations I found interesting:
- `Abs_magnitude` vs `Temperature`
- `Temperature` vs `Luminosity` (especially in logarithmic mode)

## Exercise

Can you add `Color` to the y-axis so we can see if there is a relationship between colour and temperature?

<details>
<summary>Reveal suggested answer</summary>

Add the string "Color" to the list of options for the vaiable `y`:

```python
options=["Temperature", "Luminosity", "Radius", "Abs_magnitude", "Color"],
```
</details>

The blue stars seem to have a wide range of temperatures. It's hard to see how many reds we have, because they are bunched together. Try turning on logarithmic mode. It stretches out the reds and squashes the blues. 