# Stars

The file **stars.csv** contains information about 240 stars.

- **Temperature (K)** — the temperature in Kelvin;
- **Luminosity (L/Lo)** — the luminosity of the star relative to the solar luminosity L = 3.828 * 10^26 (W);
- **Radius (R/Ro)** — the radius of the star relative to the radius of the sun R = 6.9551 * 10^8 (m);
- **Absolute magnitude (Mv)** — the absolute magnitude of the star;
- **Star color** — the color of the star;
- **Star type** — the type of star, represented by a number from 0 to 5, where:
	- 0 — Red Dwarf,
	- 1 — Brown Dwarf,
	- 2 — White Dwarf,
	- 3 — Main Sequence,
	- 4 — Super Giants,
	- 5 — Hyper Giants;
- **Spectral Class** — the spectral class of the star (one of O, B, A, F, G, K, or M).

I took the task here: [fadeevlecturer.github.io](https://fadeevlecturer.github.io/python_lectures/docs/index.html)

**`Task:`**

1. **Clean the color column:** standardize the values in this column so that variations like 'Blue white', 'Blue White', and 'Blue-white' are treated as the same;
2. **Star type names:** create a new column where the star type is represented as a full string instead of a number;
3. **Convert spectral class to numbers:** add a new column where the spectral class is represented by numbers, using the following mapping:
	- O → 0,
	- B → 1,
	- A → 2,
	- F → 3,
	- G → 4,
	- K → 5,
	- M → 6;
4. **Count the number of stars:** for each star color, star type, and spectral class, calculate the number of stars;
5. **Star type analysis:** for each star type, find the minimum, average, and maximum values of the absolute magnitude;
6. **Spectral class analysis:** for each spectral class, find the minimum, average, and maximum values of the temperature;
7. **Correlation analysis:** compute pairwise correlations between all numerical columns.


**`Cleaning of the color column:`**

In [1]:
import pandas as pd
stars = pd.read_csv('stars.csv')
stars.sample(n=10)
stars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Temperature (K)         240 non-null    int64  
 1   Luminosity(L/Lo)        240 non-null    float64
 2   Radius(R/Ro)            240 non-null    float64
 3   Absolute magnitude(Mv)  240 non-null    float64
 4   Star type               240 non-null    int64  
 5   Star color              240 non-null    object 
 6   Spectral Class          240 non-null    object 
dtypes: float64(3), int64(2), object(2)
memory usage: 13.3+ KB


In [2]:
for value in sorted(stars['Star color'].unique()):
  print(value)

Blue
Blue 
Blue White
Blue white
Blue white 
Blue-White
Blue-white
Orange
Orange-Red
Pale yellow orange
Red
White
White-Yellow
Whitish
Yellowish
Yellowish White
white
yellow-white
yellowish


In [3]:
stars['Star color'] = stars['Star color'].str.strip()
for value in sorted(stars['Star color'].unique()):
  print(value)

Blue
Blue White
Blue white
Blue-White
Blue-white
Orange
Orange-Red
Pale yellow orange
Red
White
White-Yellow
Whitish
Yellowish
Yellowish White
white
yellow-white
yellowish


In [4]:
replace_color_dict = {
  'Blue-white': 'Blue white',
  'Blue White': 'Blue white',
  'yellow-white': 'Yellowish white',
  'Yellowish White': 'Yellowish white',
  'white': 'White',
  'yellowish': 'Yellowish',
  'White-Yellow': 'White yellow',
  'Orange-Red': 'Orange red',
  'Blue-White': 'Blue white'
}
stars['Star color'] = stars['Star color'].replace(replace_color_dict)
for value in sorted(stars['Star color'].unique()):
  print(value)

Blue
Blue white
Orange
Orange red
Pale yellow orange
Red
White
White yellow
Whitish
Yellowish
Yellowish white


**`Star type names`**

In [5]:
stars['Star type'].unique()

array([0, 1, 2, 3, 4, 5])

In [6]:
replace_type_dict = {
  0: 'Red Dwarf',
  1: 'Brown Dwarf',
  2: 'White Dwarf',
  3: 'Main Sequence',
  4: 'Super Giants',
  5: 'Hyper Giants'
}
stars['Star type names'] = stars['Star type'].replace(replace_type_dict)
stars['Star type names'].unique()

array(['Red Dwarf', 'Brown Dwarf', 'White Dwarf', 'Main Sequence',
       'Super Giants', 'Hyper Giants'], dtype=object)

In [7]:
stars.sample(n=10)

Unnamed: 0,Temperature (K),Luminosity(L/Lo),Radius(R/Ro),Absolute magnitude(Mv),Star type,Star color,Spectral Class,Star type names
223,23440,537430.0,81.0,-5.975,4,Blue,O,Super Giants
32,15276,1136.0,7.2,-1.97,3,Blue white,B,Main Sequence
239,37882,294903.0,1783.0,-7.8,5,Blue,O,Hyper Giants
85,9675,0.00045,0.0109,13.98,2,Blue white,A,White Dwarf
236,30839,834042.0,1194.0,-10.63,5,Blue,O,Hyper Giants
37,6380,1.35,0.98,2.93,3,Yellowish white,F,Main Sequence
237,8829,537493.0,1423.0,-10.73,5,White,A,Hyper Giants
49,33750,220000.0,26.0,-6.1,4,Blue,B,Super Giants
221,12749,332520.0,76.0,-7.02,4,Blue,O,Super Giants
88,13720,0.00018,0.00892,12.97,2,White,F,White Dwarf


**`Converting of spectral class to numbers:`**

In [8]:
stars['Spectral Class'].unique()

array(['M', 'B', 'A', 'F', 'O', 'K', 'G'], dtype=object)

In [9]:
replace_class_dict = {
  'M': 6,
  'B': 1,
  'A': 2,
  'F': 3,
  'O': 0,
  'K': 5,
  'G': 4
}
stars['Spectral Class numbers'] = stars['Spectral Class'].map(replace_class_dict)
stars['Spectral Class numbers'].unique()

array([6, 1, 2, 3, 0, 5, 4])

In [10]:
stars.sample(n=10)

Unnamed: 0,Temperature (K),Luminosity(L/Lo),Radius(R/Ro),Absolute magnitude(Mv),Star type,Star color,Spectral Class,Star type names,Spectral Class numbers
125,3225,0.00076,0.121,19.63,0,Red,M,Red Dwarf,6
142,18290,0.0013,0.00934,12.78,2,Blue,B,White Dwarf,1
196,3142,0.00132,0.258,14.12,1,Red,M,Brown Dwarf,6
110,3459,100000.0,1289.0,-10.7,5,Red,M,Hyper Giants,6
85,9675,0.00045,0.0109,13.98,2,Blue white,A,White Dwarf,2
89,19860,0.0011,0.0131,11.34,2,Blue,B,White Dwarf,1
1,3042,0.0005,0.1542,16.6,0,Red,M,Red Dwarf,6
73,3150,0.0088,0.35,11.94,1,Red,M,Brown Dwarf,6
215,32460,173800.0,6.237,-4.36,3,Blue,O,Main Sequence,0
66,2945,0.00032,0.093,18.34,0,Red,M,Red Dwarf,6


In [11]:
stars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Temperature (K)         240 non-null    int64  
 1   Luminosity(L/Lo)        240 non-null    float64
 2   Radius(R/Ro)            240 non-null    float64
 3   Absolute magnitude(Mv)  240 non-null    float64
 4   Star type               240 non-null    int64  
 5   Star color              240 non-null    object 
 6   Spectral Class          240 non-null    object 
 7   Star type names         240 non-null    object 
 8   Spectral Class numbers  240 non-null    int64  
dtypes: float64(3), int64(3), object(3)
memory usage: 17.0+ KB


**`Counting of the number of stars:`**

In [12]:
print(stars['Star type'].unique())
print(stars['Star color'].unique())
print(stars['Spectral Class'].unique())

[0 1 2 3 4 5]
['Red' 'Blue white' 'White' 'Yellowish white' 'Pale yellow orange' 'Blue'
 'Whitish' 'Orange' 'White yellow' 'Yellowish' 'Orange red']
['M' 'B' 'A' 'F' 'O' 'K' 'G']


In [20]:
print(f"{stars['Star type'].value_counts()}\n")
print(f"{stars['Star color'].value_counts().sort_index()}\n")
print(stars['Spectral Class'].value_counts().sort_index())

Star type
0    40
1    40
2    40
3    40
4    40
5    40
Name: count, dtype: int64

Star color
Blue                   56
Blue white             41
Orange                  2
Orange red              1
Pale yellow orange      1
Red                   112
White                  10
White yellow            1
Whitish                 2
Yellowish               3
Yellowish white        11
Name: count, dtype: int64

Spectral Class
A     19
B     46
F     17
G      1
K      6
M    111
O     40
Name: count, dtype: int64
