**Source - Asteroid Light Curve Database 
https://sbn.psi.edu/pds/resource/lc.html**

**The above database consists of multiple files. The following observations are based on the summary file (LC Summary file) from the database.**

In [None]:
import pandas as pd
import numpy as np

In [None]:
pd.set_option('display.max_columns', 1000)
pd.set_option('display.width', 2000)

In [None]:
LCSummary = pd.read_csv('lc_summary.csv',skiprows = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,18,19,20,21])


In [None]:
#Listing selected columns
print(LCSummary[['Number','Name','Family','Diam','Class']])

In [None]:
LCSummary.isnull().any()

All fields except 'Notes' have non-null values.

In [None]:
# How many rows in the dataset?
LCSummary['Number'].count()


In [None]:
#Just looking for the asteroid 'Ceres' details
LCSummary.loc[LCSummary['Name']=='Ceres']

In [None]:
#List all the collisional family types.
LCSummary.Family.unique()

**TNO families are not considered to be asteroids(based on wikipedia), hence we can eliminate it**

In [None]:
#Eliminating records that belongs to 'TNO' family
tnofam= LCSummary[LCSummary['Family']=='TNO']
LCSummary = LCSummary.drop(tnofam.index, axis=0)

In [None]:
#Looking for rows with negative values for diameter
print(LCSummary[LCSummary['Diam']<=0][['Name','Albedo','Diam','H']])

In [None]:
# As the rows with negative diameter do not have valid values for Albedo or H, we cannot calculate the diameter. Hence eliminate rows with invalid values for H,Albedo and Diam
invdiam= LCSummary[LCSummary['Albedo']<0]
LCSummary = LCSummary.drop(invdiam.index, axis=0)

In [None]:
# Sorting the rows based on descending values of diameter.
Sort_size = LCSummary.sort_values(by=['Diam'], ascending=False)

In [None]:
#Lets look at the top 10biggest asteroids
print(Sort_size.head(10)[['Number','Name','Family','Diam']])

Per above, the largest asteroid in size is Ceres followed by Pallas.

In [None]:
#Lets look the 10smallest asteroids
print(Sort_size.tail(10)[['Number','Name','Family','Diam']])


Per above the smallest asteroid is 2006 RH120

In [None]:
# Now lets group the asteroids based on the classes they belongs.
#Lets find out the unique classes first
Aclass = LCSummary.Class.unique()
np.sort(Aclass)

In [None]:
# Some classes are repeated (like C: and C),so applying the correction
LCSummary['Class']=LCSummary['Class'].replace(to_replace={'C:':'C','CB:':'CB','CBU:':'CBU','CP:':'CP','CX:':'CX','DCX:':'DCX','DTU:':'DTU','DU:':'DU','DX:':'DX','F:':'F','FC:':'FC','FCX:':'FCX','G:':'G','MU:':'MU','P:':'P','Sq':'SQ','XD:':'XD','CFU:':'CFU','Xk':'XK','DSU:':'DSU','Cgh':'CGH','FX:':'FX','GS:':'GS'},regex = True)

In [None]:
# Lets look at the final list
LCSummary.Class.unique()

In [None]:
sizebyclass = LCSummary.groupby('Class')['Name'].count()
print(sizebyclass)

In [None]:
# Find out the mean asteroid size for each class
sizebyclass = LCSummary.groupby('Class')['Diam'].mean().reset_index(name = 'Mean')
sizebyclass_a= sizebyclass.sort_values(by="Mean",ascending = 'True')
print(sizebyclass_a)

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline


Relation between Rotation period (in hours) and Asteroid diameter(km)

In [None]:
_ = LCSummary.plot.scatter(x='Diam', y='Period')

In [None]:
As we can see above, rotation period is less for bigger asteroids and vice versa.