Novelty and collective attention
说明:本文是Novelty and collective attention一文[1]的读书笔记。
The observations can be described by a dynamical model characterized by a single novelty factor. Our measurements indicate that novelty within groups decays with a stretched-exponential law, suggesting the existence of a natural time scale over which attention fades.
we measured the histogram of the final diggs of all 29,864 popular stories in the year 2006. As can be seen from Fig. 1, the distribution appears to be quite skewed, with the normal Q–Q plot of <math>log(N_\infty)</math> a straight line. A Kolmogorov–Smirnov normality test of <math>log(N_{\infty})</math> with mean 6.546 and standard deviation 0.6626 yields a P value of 0.0939, suggesting that <math>N_{\infty} </math> follows a log-normal distribution.
<math>N_t</math>, the number of diggs of a popular story after finite time t. The distribution of <math>log(N_t)</math> again obeys a bell-shaped curve. As an example, a Kolmogorov–Smirnov normality test of <math>log(N_{2h})</math> with mean 5.925 and standard deviation 0.5451 yields a P value as high as 0.5605, supporting the hypothesis that <math>N_t</math> also follows a log-normal distribution.
- <math>N_t</math> represents the number of people who know the story at time t, and a fraction <math>\mu</math> of those people will further spread the story to some of their friends.
- Mathematically, this assumption can be expressed as <math>Nt = (1 + X_t)N_{t-1}</math>, where X1, X2, . . . are positive, independent, and identically distributed random variables with mean <math>\mu</math> and variance <math>\sigma^2</math>.
- This growth in time is eventually curtailed by a decay in novelty, which we parameterize by a time-dependent factor <math>r_t</math>, consisting of a series of decreasing positive numbers with the property that <math>r_1 = 1</math> and <math>r_t \rightarrow 0</math> , as <math> t \rightarrow \infty</math>.
- With this additional parameter, the full stochastic dynamics of story propagation is governed by <math> N_t = (1 + r_t X_t)N_{t-1} </math>, where the factor <math>r_t X_t</math> acts as a discounted random multiplicative factor.
- Put together, we have <math>N_t = \prod_{s = 1}^{t}(1 + r_s X_s)N_0</math>
- When <math>X_t</math> is small (which is the case for small time steps), we have the following approximate solution:
Because when x is small, there exists <math> 1 + x \approx e^x </math>.
- Taking logarithm of both sides, we obtain
- taking the mean and variance of both sides for equation 2:
问题:如何推导出公式【3】?
If two variables X and Y are independent, the variance of their product is given by[2]
- <math>
Equivalently, using the basic properties of expectation, it is given by
- <math>
Now if X and Y are independent, then by definition j(x,y) = f(x)g(y) where f and g are the marginal PDFs for X and Y. Then
- <math>\begin{align}
and Cov(X, Y) = 0.
Observe that independence of X and Y is required only to write j(x, y) = f(x)g(y), and this is required to establish the second equality above. The third equality follows from a basic application of the Fubini-Tonelli theorem.
If the model is correct, a plot of the sample mean versus the sample variance for each time t should yield a straight line passing through the origin with slope <math>\frac{\mu}{\sigma^2}</math> . 如上图Fig2所示。
The decay factor <math>r_t</math> can now be computed explicitly from <math>N_t</math> up to a constant scale. By taking expectation values of Eq. 2 and normalizing r1 to 1, we have
<math>r_t = \frac{E(log N_t) - E(log N_{t-1})}{E(log N_1) - E(log N_0)}</math> [4]
问题:如何推导出公式【4】?
根据公式2,可以得到:
<math>log N_t - log N_{t-1} = r_t Xt</math> [5]
<math>log N_1- log N_9 = r_1 X1</math> [6]
由【5】得到 <math>E(log N_t - log N_{t-1} )=E( r_t) E( Xt) =E(r_t) \mu</math> [7]
由【6】得到<math>E(log N_1- log N_0 )=E( r_1) E( X1) = \mu</math> [8]
由【7】和【8】可以得到:
<math>E(r_t) = \frac{E(log N_t) - E(log N_{t-1})}{E(log N_1) - E(log N_0)}</math>
- The curve of <math>r_t</math> estimated from the 1,110 stories in January 2007 is shown in Fig. 3a. As can be seen, <math>r_t</math> decays very fast in the first 2–2 hours, and its value becomes 0.03 after 3 hours.
- Fig. 3 b and c shows that <math>r_t</math> decays slower than exponential and faster than power law.
- Fig. 3d shows that <math>r_t</math> can be fit empirically to a stretched exponential relaxation or Kohlrausch–Williams–Watts law[3]:
The half-life <math> \tau </math> of <math> r_t</math> can then be determined by solving the equation
<math>\int_{0}^{\tau} e^{-0.4^{t^{0.4}}} = \frac{1}{2} \int_{0}^{\infty} e^{-0.4^{t^{0.4}}}</math>
A numerical calculation gives 69 minutes, or 1 hour. This characteristic time is consistent with the fact that a story usually lives on the front page for a period between 1 and 2 hours.
from random import normalvariate
import matplotlib.pyplot as plt
import matplotlib.cm as cm
x = [normalvariate(0.5, 0.1) for i in range(500)]
plt.hist(x)
我们看到了一个钟形分布,调节mean和std,可以得到不同的取值。
def random_model(mean, sd):
Nt = {}
Nt[0] = 1
for t in range(1, 100):
xt = normalvariate(mean, sd)
Nt[t] = (1+xt)*Nt[t-1]
return Nt
fig = plt.figure(figsize=(12, 4),facecolor='white')
cmap = cm.get_cmap('rainbow_r',10)
for mean in np.linspace(0.1,0.9,10):
Nt = random_model(mean, 0.1)
plt.plot(Nt.keys(), Nt.values(),color=cmap(mean),linestyle='-',marker='.',label=str(np.round(mean,2)))
plt.yscale('log',basey=10)
plt.ylabel('log(Nt)'); plt.xlabel('t')
plt.legend(loc=2,fontsize=8)
plt.show()
显然这个时候是指数分布,新闻的diggs增长过快。于是需要一个新的参数, decay factor使得增长变慢,并且越来越慢。一个选择就是stretched exponential relaxation。
def random_model_with_decay(mean, sd, decay_prameter): Nt = {} Nt[0] = 1 for t in range(1, 100): xt = normalvariate(mean, sd) rt = np.e**(-(t**decay_prameter)) # make it simpler here Nt[t] = (1+rt*xt)*Nt[t-1] return Nt fig = plt.figure(figsize=(12, 4),facecolor='white') cmap = cm.get_cmap('rainbow_r',10) for mean in np.linspace(0.5,0.9,1): for dp in np.linspace(0.1, 0.5, 5): Nt = random_model_with_decay(mean, 0.1, dp) plt.plot(Nt.keys(), Nt.values(), color=cmap(mean*dp),linestyle='-',marker='.', label='Mean ='+str(np.round(mean,2))+' & Decay prameter = ' + str(dp)) plt.yscale('log',basey=10) plt.ylabel('log(Nt)'); plt.xlabel('t') plt.legend(loc=2,fontsize=8) plt.show()
在这里,我们可以清楚看到选取不同的decay prameter会显著改变decay factor,进而改变增长曲线:使得衰退比指数慢(指数导致正态分布,快速衰退),比幂律快(幂律导致长尾,衰退过慢)。
欧拉数e,约等于2.71828,但它的来源更重要。1748年欧拉发表了“无穷分析概要”,确立了欧拉数e的数学地位。
证明:
因为: <math>e^x = 1 + \frac{x}{1!}+ \frac{x^2}{2!}+\cdots+ \frac{x^n}{n!}</math>见[4]
所以,当x趋近于0的时候, <math>e^x \approx 1 + \frac{x}{1!} = 1 + x</math>
返回 [[Collective Order]]