# [Covariance](#id1) And [Correlation](#id2)

[Directly step in to practicals](#id5)

<a id="id1"></a>
## Covariance:


* **Definition:** Covariance measures how two variables change together, indicating the degree to which they are related.
* **Formula:** Cov(X, Y) = Σ[(X_i - μX) * (Y_i - μY)] / (n - 1), where X and Y are variables, μX and μY are their means, and n is the number of data points.
* **Example:** If the covariance between stock prices and interest rates is positive, it suggests that when stock prices rise, interest rates tend to rise as well.

![correlation](correlation.jpg)


<a id="id3"></a>

### Pearson Correlation:

* **Definition:** Pearson correlation measures the linear relationship between two continuous variables.
Best for: When both variables have a linear relationship and are normally distributed.
Example: Using Pearson correlation to determine the relationship between a person's age and their cholesterol levels.


* **Best for:** When both variables have a linear relationship and are normally distributed.


* **Example:** Using Pearson correlation to determine the relationship between a person's age and their cholesterol levels.

<a id="id4"></a>

### Spearman Correlation:

* **Definition:** Spearman correlation assesses the strength and direction of monotonic relationships (not necessarily linear) between two variables.

* **Best for:** When the data may not be linear and when there are outliers.

* **Example:** Assessing the correlation between the ranking of students in a class based on their test scores.

### Difference between Pearson and Spearman Correlation:

- Pearson measures linear relationships, while Spearman captures monotonic relationships (linear or not).
- Spearman is less sensitive to outliers and works better with ordinal or non-normally distributed data.

### In Which Scenario to Use Each:

- Use Pearson correlation when you expect a linear relationship and the data is normally distributed.
- Use Spearman correlation when the relationship may not be linear or when you have ordinal or non-normally distributed data.

### Variance:
* **Definition:** Variance measures the spread or dispersion of a set of data points.
* **Formula:** Variance (σ^2) = Σ[(X_i - μ)^2] / n, where X_i is a data point, μ is the mean, and n is the number of data points.
* **Difference:** Variance is a measure of the spread within a single variable, while covariance and correlation quantify the relationship between two variables.

### Difference between Variance and Covariance:

- Variance is a measure of the spread within one variable, while covariance measures the joint variability of two variables.
- Variance is always non-negative, while covariance can be positive (variables move in the same direction), negative (variables move in opposite directions), or zero (no relationship).

#### What you have learned is :-

- covariance and correlation measure the relationship between two variables, with Pearson being suitable for linear relationships, and Spearman for non-linear or ordinal data. Variance, on the other hand, quantifies the spread of a single variable.


* ***Additional Tip*** :

        If y values changes to x in covariance formula then that is looks similar to variance.

        I you don't believe then, Have a look

<a id="id5"></a>
# Practical Session Of Covariance And Correlation :-

In [8]:
import pandas as pd
import numpy as np

data = pd.read_csv("D:\AI\DATASETS\healthexp.csv")

data.head()

Unnamed: 0,Year,Country,Spending_USD,Life_Expectancy
0,1970,Germany,252.311,70.6
1,1970,France,192.143,72.2
2,1970,Great Britain,123.993,71.9
3,1970,Japan,150.437,72.0
4,1970,USA,326.961,70.9


<a id="id2"></a>
## Correlation:

* **Definition:** Correlation is a standardized measure of the strength and direction of the linear relationship between two variables.

    - Types of correlations are:
        1. [Pearson Correlation](#id3)
        2. [Spearman Correlation](#id4)


* **Formula:** Correlation (ρ) = Cov(X, Y) / (σX * σY), where ρ is the correlation coefficient, Cov(X, Y) is the covariance, and σX and σY are the standard deviations of X and Y.

* **Key Difference:** Correlation is dimensionless and ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.

In [21]:
# Finding the covariance using `DataFrame.cov()` method

data.cov()

  data.cov()


Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,201.098848,25718.83,41.915454
Spending_USD,25718.827373,4817761.0,4166.800912
Life_Expectancy,41.915454,4166.801,10.733902


In [20]:
# Above warning incates that in feature(next version) we need to provide only numeric columns to that function.
num_cols = data.columns[data.dtypes!="object"]  # we are taking all the numerical columns names dynamically
# data[num_cols.index].cov()
data[num_cols].cov() # Now we removed that error, and this'll be supported in feature versions

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,201.098848,25718.83,41.915454
Spending_USD,25718.827373,4817761.0,4166.800912
Life_Expectancy,41.915454,4166.801,10.733902


In [22]:
data.corr()

  data.corr()


Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.826273,0.902175
Spending_USD,0.826273,1.0,0.57943
Life_Expectancy,0.902175,0.57943,1.0


In [23]:
# avoiding error
# This is pearson correlation by default
data[num_cols].corr()

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.826273,0.902175
Spending_USD,0.826273,1.0,0.57943
Life_Expectancy,0.902175,0.57943,1.0


In [26]:
data[num_cols].corr(method='spearman')

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.931598,0.896117
Spending_USD,0.931598,1.0,0.747407
Life_Expectancy,0.896117,0.747407,1.0


Thank You For Studying Till End.

[My Git Hub][]

[My Git Hub]: https://github.com/shaikmaaheed "Have a look on my GitHub"
Written With Love By,

[Shaik Maaheed][]

[Shaik Maaheed]: https://www.linkedin.com//in//shaikmaaheed// "Follow me on LinkedIn"
