Skip to content

Latest commit

 

History

History
46 lines (30 loc) · 1.8 KB

File metadata and controls

46 lines (30 loc) · 1.8 KB

3.7 Feature importance: Correlation

Slides

Notes

Correlation coefficient measures the degree of dependency between two variables. This value is negative if one variable grows while the other decreases, and it is positive if both variables increase. Depending on its size, the dependency between both variables could be low, moderate, or strong. It allows measuring the importance of numerical variables.

If r is correlation coefficient, then the correlation between two variables is:

  • LOW when r is between [0, -0.2) or [0, 0.2)
  • MEDIUM when r is between [-0.2, -0.5) or [2, 0.5)
  • STRONG when r is between [-0.5, -1.0] or [0.5, 1.0]

Positive Correlation vs. Negative Correlation

  • When r is positive, an increase in x will increase y.
  • When r is negative, an increase in x will decrease y.
  • When r is 0, a change in x does not affect y.

Functions and methods:

  • df[x].corrwith(y) - returns the correlation between x and y series. This is a function from pandas.

The entire code of this project is available in this jupyter notebook.

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation