# A brief overview of Power law distributions
For more details, see the very good explanation in Easley and Kleinberg, _Networks, Crowds, and Markets_.

## So what's the big deal with Power law distributions?

Power law and normal distributions just describe the relationship between some quantity $k$, and some quantity $y=f(k)$. 

> <span style="color:gray"> For example, we could let $k =$ _number of downloads_ and $f(k) =$ _# songs with a given number of of downloads_ and plot the relationship. 

A power law distribution is characterized by the following relationship:
$$ y = \frac{a}{k^c}$$

## How are they different from normal?

If we compare power law (orange) to normal distribution (blue), we can see that:
- **POWER LAW:** This kind of distribution would imply that a large number of songs have very few downloads; a small number of songs have a lot of downloads   


- **NORMAL DIST:** This kind of distribution would imply that most songs have an average number of downloads; a few have many downloads, and a few have little downloads.


<img src="fig/power_vs_normal.png" style="width: 400px;"/>

Intuitively, normal distribution generates an allocation that we might typically see as "fair", "averaged", "balanced" where the power law might generate something that looks "skewed" or "unequal". Rich-get-richer is one way in which such a skewed distribution might occur, but it's not the only way.

## So what's the meaning of $c$ and $a$?

Below  is an example of what a power-law distribution looks like for different values of $c$ and $a$.   
First you can see it in linear scale (left), then in log scale (center, right), which transforms the function into a line.

<img src="fig/power_params.png" style="width: 1000px;"/>


In linear scale:
- As $c$ gets bigger, the bend is more "exaggerated" because $f(k)$ falls faster with $k$ (left). 

In log scale, we see:
- $c$ determines the slope (center)
- $a$ determines the intercept (right)

## How does this work with real data?
If we use the example of the songs data, if you have some sample of points that looks like the purple dots on the bottom left, you would first log transform it. Then, you could figure out the slope (center) and intercept (right) to estimate its distribution. 

- Changing the value of $a$ will essentially change the estimated number of songs with the least # of downloads (where the line hits the y axis). 
- Changing the value of $c$ will essentially change how fast the estimated number of songs falls, as you increase the download count (the slope of the line).


<img src="fig/power_samplefit.png" style="width: 1000px;"/>