# Contrails Detection Analysis

This notebook delves into a dataset crafted with randomized values to simulate atmospheric conditions impacting contrail formation and visibility. Utilizing descriptive statistics, its objective is to discern patterns concerning altitude, temperature, humidity, and contrail persistence.

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Seed for reproducibility
np.random.seed(42)

# Generate sample data
data = {
    "ObservationID": range(1, 101),
    "Altitude": np.random.randint(30000, 40000, 100),  # Contrails typically form at high altitudes
    "Temperature": np.random.randint(-50, -20, 100),  # Very cold temperatures at high altitudes
    "RelativeHumidity": np.random.randint(60, 100, 100),  # Higher humidity needed for contrail formation
    "ContrailPersistence": np.random.randint(1, 60, 100),  # Persistence can vary widely
    "SkyCoverage": np.random.randint(0, 100, 100)  # Percentage of sky covered by contrails
}

df = pd.DataFrame(data)

df.shape

(100, 6)

In [10]:
df.head()

Unnamed: 0,ObservationID,Altitude,Temperature,RelativeHumidity,ContrailPersistence,SkyCoverage
0,1,37270,-23,83,4,49
1,2,30860,-44,74,33,24
2,3,35390,-42,91,14,23
3,4,35191,-43,91,21,12
4,5,35734,-39,83,48,59


In [11]:
df.describe()

Unnamed: 0,ObservationID,Altitude,Temperature,RelativeHumidity,ContrailPersistence,SkyCoverage
count,100.0,100.0,100.0,100.0,100.0,100.0
mean,50.5,35177.22,-35.45,80.13,31.97,49.5
std,29.011492,2869.564753,9.712405,12.24304,15.414446,28.733361
min,1.0,30064.0,-50.0,60.0,2.0,0.0
25%,25.75,32674.25,-44.0,70.0,22.0,21.75
50%,50.5,35428.0,-37.0,81.5,33.0,54.0
75%,75.25,37655.25,-26.0,91.0,46.0,68.0
max,100.0,39998.0,-21.0,99.0,59.0,98.0


### **Mean**

The mean provides an average value for our data, offering insights into the central tendency of each variable.


In [12]:
print("Mean values:")
df.mean()

Mean values:


ObservationID             50.50
Altitude               35177.22
Temperature              -35.45
RelativeHumidity          80.13
ContrailPersistence       31.97
SkyCoverage               49.50
dtype: float64

### **Median**

The median gives us the middle value when our data is ordered, which can be more robust to outliers than the mean.


In [13]:
print("Median values:")
df.median()

Median values:


ObservationID             50.5
Altitude               35428.0
Temperature              -37.0
RelativeHumidity          81.5
ContrailPersistence       33.0
SkyCoverage               54.0
dtype: float64

### **Mode**

The mode represents the most frequently occurring value in our data, which can be particularly informative for categorical data.


In [14]:
# Calculating mode for each column
# Since mode can return multiple values, we'll ensure we're handling this appropriately
mode_values = df.mode().loc[0]
print("Mode values:")
mode_values


Mode values:


ObservationID              1.0
Altitude               30064.0
Temperature              -21.0
RelativeHumidity          91.0
ContrailPersistence       33.0
SkyCoverage               57.0
Name: 0, dtype: float64

### **Variance and Standard Deviation**

Variance provides measures of data spread, indicating how much the data varies from the average.


In [15]:
print("Variance values:")
df.var()

Variance values:


ObservationID          8.416667e+02
Altitude               8.234402e+06
Temperature            9.433081e+01
RelativeHumidity       1.498920e+02
ContrailPersistence    2.376052e+02
SkyCoverage            8.256061e+02
dtype: float64

### **Standard Deviation**

Standard deviation provides measures of data spread, indicating how much the data varies from the average.

In [16]:
print("\nStandard Deviation values:")
df.std()


Standard Deviation values:


ObservationID            29.011492
Altitude               2869.564753
Temperature               9.712405
RelativeHumidity         12.243040
ContrailPersistence      15.414446
SkyCoverage              28.733361
dtype: float64

### **Skewness and Kurtosis**

Skewness and kurtosis are measures of data shape, indicating asymmetry and the peakedness of the data distribution, respectively.


In [19]:
from scipy.stats import skew, kurtosis

print("Skewness values:")
print(df.apply(skew))

print("\nKurtosis values:")
df.apply(kurtosis)


Skewness values:
ObservationID          0.000000
Altitude              -0.152978
Temperature            0.077802
RelativeHumidity      -0.180743
ContrailPersistence   -0.262596
SkyCoverage           -0.050774
dtype: float64

Kurtosis values:


ObservationID         -1.200240
Altitude              -1.168779
Temperature           -1.427832
RelativeHumidity      -1.253947
ContrailPersistence   -0.797769
SkyCoverage           -1.123127
dtype: float64