## Covariance And Correlation

In [17]:
import seaborn as sns

In [18]:
df=sns.load_dataset('healthexp')
df.head()

Unnamed: 0,Year,Country,Spending_USD,Life_Expectancy
0,1970,Germany,252.311,70.6
1,1970,France,192.143,72.2
2,1970,Great Britain,123.993,71.9
3,1970,Japan,150.437,72.0
4,1970,USA,326.961,70.9


### Covariance

In [19]:
df1 = df[['Year','Spending_USD', 'Life_Expectancy']]
df1.cov()

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,201.098848,25718.83,41.915454
Spending_USD,25718.827373,4817761.0,4166.800912
Life_Expectancy,41.915454,4166.801,10.733902


### Pearson correlation coefficient

When you call corr() without specifying a method, it computes the Pearson correlation coefficient by default.

In [20]:

df1.corr(method='pearson')

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.826273,0.902175
Spending_USD,0.826273,1.0,0.57943
Life_Expectancy,0.902175,0.57943,1.0


### Spearman rank correlation

In [21]:

df1.corr(method='spearman')

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.931598,0.896117
Spending_USD,0.931598,1.0,0.747407
Life_Expectancy,0.896117,0.747407,1.0


Pearson and Spearman correlations are both methods for assessing the relationship between two variables, but they are suited for different types of data and situations. Here's a comparison to help you decide which one to use:

### **Pearson Correlation**

**Description**: 
- **Pearson correlation coefficient** measures the linear relationship between two continuous variables.
- It assesses how well the data fit a straight line (linear relationship).

**When to Use**:
- **Data Type**: Use Pearson correlation when both variables are continuous and normally distributed. It assumes that the data are measured on an interval or ratio scale.
- **Linear Relationship**: Use Pearson if you are interested in assessing a linear relationship between the variables.
- **Normality**: If your data is approximately normally distributed and the relationship between the variables is linear, Pearson correlation is appropriate.
- **Outliers**: Pearson correlation is sensitive to outliers, which can skew the results significantly.


### **Spearman Rank Correlation**

**Description**:
- **Spearman rank correlation coefficient** measures the strength and direction of the monotonic relationship between two variables.
- It assesses how well the relationship between the variables can be described by a monotonic function (not necessarily linear).

**When to Use**:
- **Data Type**: Use Spearman correlation when at least one of the variables is ordinal, or when the variables are continuous but not normally distributed. It does not assume that the data are on an interval or ratio scale.
- **Monotonic Relationship**: Use Spearman if you suspect a monotonic relationship (one variable increases as the other variable increases, or vice versa) rather than a linear relationship.
- **Non-Normal Data**: It is more appropriate than Pearson correlation when the data are not normally distributed or when dealing with ranks or ordinal data.
- **Outliers**: Spearman correlation is less sensitive to outliers because it is based on ranks rather than the actual values.


### **Summary of When to Use Each**:

- **Pearson Correlation**:
  - **Continuous** data.
  - **Linear** relationship.
  - Data are **normally distributed**.
  - Sensitive to **outliers**.

- **Spearman Rank Correlation**:
  - **Ordinal** data or continuous data that does not meet the assumptions for Pearson.
  - **Monotonic** relationship (not necessarily linear).
  - Data may be **non-normal**.
  - Less sensitive to **outliers**.

Choosing between Pearson and Spearman depends on the nature of your data and the relationship you are investigating. If your data is continuous and you are interested in linear relationships, Pearson is appropriate. For ordinal data or when the relationship is not linear, Spearman is a better choice.