# Cross-Entropy (for Machine Learning)

Cross-Entropy in Machine Learning is a type of Loss Function - remember that a Loss Function Loss Function is a method of evaluating how well a machine learning algorithm models a dataset.

There is a great explanation on [towards data science](https://towardsdatascience.com/what-is-cross-entropy-3bdb04c13616), but essentially what cross-entropy is really doing is tracking events and probabilities, and then asking how likely an event is to occur based on probabilities. I've skipped a lot of the details, but if an event is very likely to occur based on some probabilities then it has a _small_ cross-entropy, and if it's not very likely to occur then it has a _high_ cross-entropy.

### Brief run down of the Math behind Cross-Entropy

Normally, to determine the likelyhood of something occurring, you'd look at the probability. If you were looking to determine the odds of something occurring after (for example) two specific probabilities, you would multiply the two probabilities together. So if I were to consider the event of getting heads on a coin and then rolling a six on a dice, the odds of the two happening together (assuming we don't do any advanced statistics) would be `1/2 * 1/6`.

Now in Machine Learning, we might have hundreds of probabilities, so multiplying them all together may be ill advised. For this reason, we turn to logarithms. If you're unfamiliar with logarithms, [you can find more about them here.](https://www.rapidtables.com/math/algebra/Logarithm.html)

So if we consider our above example, instead of multiplying probabilities, if we take the natural logs of them we would have: `ln(1/2) + ln(1/6)`, which would work out to be `-2.484`. The negative values are a little harder to understand though, so we end up multiplying everything by `-1`. So, we have `-ln(1/2) - ln(1/6) = 2.484`.

In the case of machine learning, the _lower_ the number is, the better the model, and the _higher_ the number is, the worse the model is.

Again, for a better look at the logarithmic formulas behind Cross-Entropy, I strongly recommend [checking this article out.](https://towardsdatascience.com/what-is-cross-entropy-3bdb04c13616)

### Binary Cross-Entropy

What is binary cross entropy? 

[From what it looks like,](https://www.analyticsvidhya.com/blog/2021/03/binary-cross-entropy-log-loss-for-binary-classification/#:~:text=What%20is%20Binary%20Cross%20Entropy,far%20from%20the%20actual%20value.) Binary Cross-Entropy looks like it's just Cross-Entropy but just with 0's or 1's. Which, that tracks - binary would be 1 or 0.

[This article] does a good job of going in-depth on this, but basically this is like regular Cross-Entropy, but just with 0's and 1's. 1's being 100% probability and 0's being zero percent probability.

### Additional Resources:

- [Machine Learning Mastery's notes on Cross-Entropy](https://machinelearningmastery.com/cross-entropy-for-machine-learning/) (This also has examples on how to implement this in Python)
- [Towards Data Science's notes on Cross-Entropy](https://towardsdatascience.com/what-is-cross-entropy-3bdb04c13616)
- [Analytics in Diamag's notes on Cross-Entropy](https://analyticsindiamag.com/a-beginners-guide-to-cross-entropy-in-machine-learning/)
- [Neptune AI's notes on Cross-Entropy](https://neptune.ai/blog/cross-entropy-loss-and-its-applications-in-deep-learning) (Very clear and easy to understand examples)