# **Introduction to Exploratory Data Analytics (EDA)**

# ***What is EDA ?***
#### Exploratory Data Analysis (EDA) is the process of investigating datasets to uncover patterns, anomalies, and other insights. It's a critical step in data science and machine learning, where you use various tools and techniques to understand the data's structure and characteristics before applying any modeling techniques.

# **Key Objectives of EDA**

### Summarize Main Characteristics: Use descriptive statistics to summarize data.
### Identify Patterns: Detect trends and relationships between variables.
### Spot Anomalies: Find outliers and unusual observations.
### Test Hypotheses: Use data visualizations to test assumptions.
### Prepare Data for Modeling: Clean and preprocess data for machine learning algorithms.

## **Common EDA Techniques**
### Descriptive Statistics: Measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation).
### Data Visualization: Charts and graphs such as histograms, box plots, scatter plots, and heatmaps.
### Data Cleaning: Handling missing values, outliers, and data types.

## **Libraries and Modules Used for EDA**
### ***Pandas***
#### Pandas is a powerful Python library for data manipulation and analysis. It provides data structures like DataFrames and Series which are ideal for handling and analyzing structured data.

### ***Common Functions***
### Read Data: pd.read_csv('file.csv'), pd.read_excel('file.xlsx')
### Data Inspection: df.head(), df.tail(), df.info(), df.describe()
### Handling Missing Values: df.isnull(), df.fillna(), df.dropna()
### Data Grouping: df.groupby('column')

## ***NumPy***
### NumPy provides support for arrays and matrices along with a collection of mathematical functions to operate on these data structures.

### ***Common Functions***
1. Array Creation: np.array([1, 2, 3])
2. Basic Operations: np.mean(array), np.median(array), np.std(array)

## **Matplotlib**
### Matplotlib is a plotting library used for creating static, animated, and interactive visualizations.

## ***Common Functions***
### Plotting: plt.plot(), plt.scatter(), plt.hist(), plt.bar(), plt.boxplot()
### Customization: plt.title(), plt.xlabel(), plt.ylabel(), plt.legend()

## ***Seaborn***

### Seaborn is built on top of Matplotlib and provides a high-level interface for drawing attractive statistical graphics.

## ***Common Functions***

### Plotting: sns.heatmap(), sns.pairplot(), sns.distplot(), sns.countplot()

## ***SciPy***

### SciPy is used for scientific and technical computing.

## ***Common Functions***

### Statistics: scipy.stats.describe(data), scipy.stats.linregress(x, y)

## ***Plotly***

### Plotly is an interactive graphing library that makes it easy to create interactive plots.

### ***Common Functions***

### Plotting: plotly.express.scatter(), plotly.express.line(), plotly.express.bar(), plotly.express.heatmap(), plotly.express.box()