# Stars

The file **stars.csv** contains information about 240 stars.

- **Temperature (K)** — the temperature in Kelvin;
- **Luminosity (L/Lo)** — the luminosity of the star relative to the solar luminosity L = 3.828 * 10^26 (W);
- **Radius (R/Ro)** — the radius of the star relative to the radius of the sun R = 6.9551 * 10^8 (m);
- **Absolute magnitude (Mv)** — the absolute magnitude of the star;
- **Star color** — the color of the star;
- **Star type** — the type of star, represented by a number from 0 to 5, where:
	- 0 — Red Dwarf,
	- 1 — Brown Dwarf,
	- 2 — White Dwarf,
	- 3 — Main Sequence,
	- 4 — Super Giants,
	- 5 — Hyper Giants;
- **Spectral Class** — the spectral class of the star (one of O, B, A, F, G, K, or M).

I took the task here: [fadeevlecturer.github.io](https://fadeevlecturer.github.io/python_lectures/docs/index.html)

**`Task:`**

1. **Clean the color column:** standardize the values in this column so that variations like 'Blue white', 'Blue White', and 'Blue-white' are treated as the same;
2. **Star type names:** create a new column where the star type is represented as a full string instead of a number;
3. **Convert spectral class to numbers:** add a new column where the spectral class is represented by numbers, using the following mapping:
	- O → 0,
	- B → 1,
	- A → 2,
	- F → 3,
	- G → 4,
	- K → 5,
	- M → 6;
4. **Count the number of stars:** for each star color, star type, and spectral class, calculate the number of stars;
5. **Star type analysis:** for each star type, find the minimum, average, and maximum values of the absolute magnitude;
6. **Spectral class analysis:** for each spectral class, find the minimum, average, and maximum values of the temperature;
7. **Correlation analysis:** compute pairwise correlations between all numerical columns.


**`Cleaning of the color column:`**

In [39]:
import pandas as pd
stars = pd.read_csv('stars.csv')
stars.sample(n=10)
stars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Temperature (K)         240 non-null    int64  
 1   Luminosity(L/Lo)        240 non-null    float64
 2   Radius(R/Ro)            240 non-null    float64
 3   Absolute magnitude(Mv)  240 non-null    float64
 4   Star type               240 non-null    int64  
 5   Star color              240 non-null    object 
 6   Spectral Class          240 non-null    object 
dtypes: float64(3), int64(2), object(2)
memory usage: 13.3+ KB


In [40]:
stars['Star color'].value_counts()

Star color
Red                   112
Blue                   55
Blue-white             26
Blue White             10
yellow-white            8
White                   7
Blue white              3
Yellowish White         3
white                   3
Whitish                 2
Orange                  2
yellowish               2
Pale yellow orange      1
White-Yellow            1
Blue                    1
Yellowish               1
Orange-Red              1
Blue white              1
Blue-White              1
Name: count, dtype: int64

In [41]:
replace_color_dict = {
  'Blue-white': 'Blue white',
  'Blue White': 'Blue white',
  'yellow-white': 'Yellowish white',
  'Yellowish White': 'Yellowish white',
  'white': 'White',
  'yellowish': 'Yellowish',
  'White-Yellow': 'White yellow',
  'Orange-Red': 'Orange red',
  'Blue-White': 'Blue white'
}
stars['Star color'] = stars['Star color'].replace(replace_color_dict)
stars['Star color'].unique()

array(['Red', 'Blue white', 'White', 'Yellowish white',
       'Pale yellow orange', 'Blue', 'Whitish', 'Orange', 'White yellow',
       'Blue ', 'Yellowish', 'Orange red', 'Blue white '], dtype=object)

**`Star type names`**

In [42]:
stars['Star type'].unique()

array([0, 1, 2, 3, 4, 5])

In [43]:
replace_type_dict = {
  0: 'Red Dwarf',
  1: 'Brown Dwarf',
  2: 'White Dwarf',
  3: 'Main Sequence',
  4: 'Super Giants',
  5: 'Hyper Giants'
}
stars['Star type names'] = stars['Star type'].replace(replace_type_dict)
stars['Star type names'].unique()

array(['Red Dwarf', 'Brown Dwarf', 'White Dwarf', 'Main Sequence',
       'Super Giants', 'Hyper Giants'], dtype=object)

In [44]:
stars.sample(n=10)

Unnamed: 0,Temperature (K),Luminosity(L/Lo),Radius(R/Ro),Absolute magnitude(Mv),Star type,Star color,Spectral Class,Star type names
73,3150,0.0088,0.35,11.94,1,Red,M,Brown Dwarf
131,3607,0.00023,0.38,10.34,1,Red,M,Brown Dwarf
54,3650,310000.0,1324.0,-7.79,5,Red,M,Hyper Giants
186,2968,0.000461,0.119,17.45,0,Red,M,Red Dwarf
92,4077,0.085,0.795,6.228,3,Yellowish,K,Main Sequence
17,3692,0.00367,0.47,10.8,1,Red,M,Brown Dwarf
195,3598,0.0027,0.67,13.667,1,Red,M,Brown Dwarf
183,3218,0.000452,0.0987,17.34,0,Red,M,Red Dwarf
16,2799,0.0018,0.16,14.79,1,Red,M,Brown Dwarf
158,13023,998.0,6.21,-1.38,3,Blue white,A,Main Sequence


**`Converting of spectral class to numbers:`**

In [45]:
stars['Spectral Class'].unique()

array(['M', 'B', 'A', 'F', 'O', 'K', 'G'], dtype=object)

In [46]:
replace_class_dict = {
  'M': 6,
  'B': 1,
  'A': 2,
  'F': 3,
  'O': 0,
  'K': 5,
  'G': 4
}
stars['Spectral Class numbers'] = stars['Spectral Class'].map(replace_class_dict)
stars['Spectral Class numbers'].unique()

array([6, 1, 2, 3, 0, 5, 4])

In [47]:
stars.sample(n=10)

Unnamed: 0,Temperature (K),Luminosity(L/Lo),Radius(R/Ro),Absolute magnitude(Mv),Star type,Star color,Spectral Class,Star type names,Spectral Class numbers
67,2817,0.00098,0.0911,16.45,0,Red,M,Red Dwarf,6
147,14732,0.00011,0.00892,12.89,2,White,F,White Dwarf,3
108,24345,142000.0,57.0,-6.24,4,Blue,O,Super Giants,0
26,8570,0.00081,0.0097,14.2,2,Blue white,A,White Dwarf,2
194,3523,0.0054,0.319,12.43,1,Red,M,Brown Dwarf,6
62,2983,0.00024,0.094,16.09,0,Red,M,Red Dwarf,6
106,24630,363000.0,63.0,-5.83,4,Blue,O,Super Giants,0
82,8930,0.00056,0.0095,13.78,2,White,A,White Dwarf,2
122,3218,0.00054,0.11,20.02,0,Red,M,Red Dwarf,6
81,10574,0.00014,0.0092,12.02,2,White,F,White Dwarf,3


In [48]:
stars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Temperature (K)         240 non-null    int64  
 1   Luminosity(L/Lo)        240 non-null    float64
 2   Radius(R/Ro)            240 non-null    float64
 3   Absolute magnitude(Mv)  240 non-null    float64
 4   Star type               240 non-null    int64  
 5   Star color              240 non-null    object 
 6   Spectral Class          240 non-null    object 
 7   Star type names         240 non-null    object 
 8   Spectral Class numbers  240 non-null    int64  
dtypes: float64(3), int64(3), object(3)
memory usage: 17.0+ KB
