# Descriptive Statistics in Data Science


Descriptive statistics is one of the foundational tools in data science that allows us to quickly gain insights from our data. This notebook demonstrates how to use descriptive statistics to analyze and summarize data, identify outliers, and prepare for further analysis.



## Why is Descriptive Statistics Important?

1. **Summarizing Data:** Measures of central tendency like mean, median, and mode help us understand the central point of our data.
2. **Identifying Outliers:** Tools like box plots and range analysis help us spot any abnormal or extreme data points.
3. **Understanding Data Distribution:** By calculating standard deviation, variance, skewness, and kurtosis, we can assess the spread and shape of our data distribution.
4. **Data Preparation:** Before jumping into complex analysis or modeling, understanding the basic statistics helps in data cleaning and feature selection.


In [None]:

import pandas as pd
import numpy as np

# Sample data
data = {
    'Feature1': [12, 15, 14, 10, 11, 13, 150, 12, 14, 10],
    'Feature2': [10, 15, 12, 12, 14, 13, 11, 13, 15, 14]
}

# Create DataFrame
df = pd.DataFrame(data)

# Displaying the data
df


In [None]:

# Descriptive Statistics
descriptive_stats = df.describe()
descriptive_stats


In [None]:

import matplotlib.pyplot as plt

# Plotting Histogram
plt.figure(figsize=(12, 6))
df['Feature1'].hist(bins=10, alpha=0.7)
plt.title('Histogram of Feature1')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()


In [None]:

import seaborn as sns

# Plotting Boxplot
plt.figure(figsize=(8, 6))
sns.boxplot(data=df)
plt.title('Boxplot of Features')
plt.show()


In [None]:

# Skewness and Kurtosis
skewness = df.skew()
kurtosis = df.kurtosis()

skewness, kurtosis


In [None]:

# Identifying Outliers in Feature1
outliers = df[df['Feature1'] > df['Feature1'].mean() + 2*df['Feature1'].std()]
outliers



## Conclusion

By leveraging descriptive statistics, we gain valuable insights into our data that can guide our further analysis and decision-making process. From understanding the central tendency and spread of data to identifying outliers and distribution characteristics, descriptive statistics is an essential tool in the data scientist's toolkit.
