# Example: Calculating Correlation with NumPy

## Import NumPy:



In [None]:
import numpy as np


## Define the Datasets:



In [None]:
hours_studied = np.array([1, 2, 3, 4, 5])
test_scores = np.array([50, 55, 65, 70, 80])


# Calculate the Correlation Coefficient:



In [None]:
correlation_coefficient = np.corrcoef(hours_studied, test_scores)[0, 1]


Here’s how `np.corrcoef` works:

-   It computes the Pearson correlation coefficient matrix of the input arrays.
-   `[0, 1]` selects the correlation coefficient between `hours_studied` and `test_scores`.

## Print the Result:



In [None]:
print(f"Correlation Coefficient: {correlation_coefficient:.2f}")


Correlation Coefficient: 0.99


#### **Explanation**

-   The correlation coefficient ranges from -1 to 1.
    
    -   **1** indicates a perfect positive correlation.
    -   **-1** indicates a perfect negative correlation.
    -   **0** indicates no correlation.
-   In our example, a correlation coefficient of 0.97 suggests a very strong positive relationship between hours studied and test scores.

# Example: Calculating Correlation with SciPy



Let’s use the same datasets as before: the number of hours studied and the corresponding test scores.

#### **Step-by-Step Calculation**

1. Use the same arrays

In [None]:
from scipy import stats

In [None]:
pearson_corr, _ = stats.pearsonr(hours_studied, test_scores)


Here’s how `stats.pearsonr` works:

-   It computes the Pearson correlation coefficient and the p-value for testing non-correlation.
-   The first value returned is the correlation coefficient, and the second is the p-value.

In [None]:
print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")


Pearson Correlation Coefficient: 0.99



**Calculate Other Types of Correlations**

For example, Spearman’s rank correlation can be calculated as follows:

In [None]:
spearman_corr, _ = stats.spearmanr(hours_studied, test_scores)
print(f"Spearman Rank Correlation Coefficient: {spearman_corr:.2f}")


Spearman Rank Correlation Coefficient: 1.00


In [None]:
# Calculate Kendall’s Tau correlation coefficient
kendall_tau, _ = stats.kendalltau(hours_studied, test_scores)

In [None]:
# Print the result
print(f"Kendall’s Tau Correlation Coefficient: {kendall_tau:.2f}")

Kendall’s Tau Correlation Coefficient: 1.00



#### **Explanation**

-   **Pearson Correlation Coefficient:** Measures the linear relationship between two datasets. It is sensitive to outliers.
-   **Spearman Rank Correlation Coefficient:** Measures the monotonic relationship between two datasets. It is less sensitive to outliers and non-linear relationships.
-   **Kendall’s Tau Correlation Coefficient:** Measures the ordinal association between two variables. It evaluates the strength of the monotonic relationship between the variables.
    -   **Value Range:** The coefficient ranges from -1 to 1.
        -   **1** indicates a perfect positive association.
        -   **-1** indicates a perfect negative association.
        -   **0** indicates no association.

Kendall's tau is often preferred over Pearson's or Spearman's when dealing with ordinal data or when you want a measure less sensitive to outlier

This example illustrates how to use SciPy to compute different types of correlation coefficients and provides insight into the relationship between datasets.


### **Example: Calculating Correlation with Pandas**

Let’s use the same datasets: hours studied and test scores.

#### **Step-by-Step Calculation**

1.  **Import the Necessary Libraries:**

In [None]:
import pandas as pd


## Create a DataFrame with the Datasets:



In [None]:
data = {
    'Hours_Studied': [1, 2, 3, 4, 5],
    'Test_Scores': [50, 55, 65, 70, 80]
}
df = pd.DataFrame(data)


## Calculate the Pearson Correlation Coefficient:



In [None]:
pearson_corr = df.corr(method='pearson')
#This computes the Pearson correlation coefficient for all pairs of columns in the DataFrame.



# Print the Pearson Correlation Matrix:



In [None]:
print("Pearson Correlation Matrix:")
print(pearson_corr)


Pearson Correlation Matrix:
               Hours_Studied  Test_Scores
Hours_Studied       1.000000     0.993399
Test_Scores         0.993399     1.000000


## Calculate Other Types of Correlation (Spearman and Kendall):



In [None]:
spearman_corr = df.corr(method='spearman')
kendall_corr = df.corr(method='kendall')

print("\nSpearman Correlation Matrix:")
print(spearman_corr)

print("\nKendall Correlation Matrix:")
print(kendall_corr)



Spearman Correlation Matrix:
               Hours_Studied  Test_Scores
Hours_Studied            1.0          1.0
Test_Scores              1.0          1.0

Kendall Correlation Matrix:
               Hours_Studied  Test_Scores
Hours_Studied            1.0          1.0
Test_Scores              1.0          1.0


#### **Explanation**

-   **Pearson Correlation Coefficient:** Measures linear relationships.
-   **Spearman Rank Correlation Coefficient:** Measures monotonic relationships.
-   **Kendall’s Tau Correlation Coefficient:** Measures ordinal associations.

Pandas makes it easy to compute these correlations and visualize how different variables in your dataset relate to one another.