In [1]:
# load package
%matplotlib inline
import pandas as pd
import numpy as np
import codecs
import os
import re
import matplotlib.pyplot as plt
import seaborn as sns

# Citation success index数学公式

##### For most individual journals, the citations regarding to their publictions, the distributions are mostly skewed, with a tail of a relative small number of highly-cited ones. Based on the IF, we can more or less obtain the knowledge of journals, yet for full dataset of publications, else we can say panorama, we need more than just the IFs (Larivière et al., 2016; Milojevic et al., 2017).
##### Milojevic et al. (2017) presented Citation success index with an acronym of CSI, which designed and defined as the probability that a random paper in journal A has more citations than a random paper in journal B, when comparing full datasets of publications within these two journals. The equation is given as below.  
$CSI=\sum_{c=0}^\infty [P_t(>c)+1/2P_t(c)]P_r(c)$
==

#### where $P_t$(>c) is the fraction of papers having a number of citations larger than c in journal t, and $P_r$(c) is the fraction of papers with c citations in journal r. 

###### Reference
 [1].Larivière, V., Kiermer, V., MacCallum, C. J., McNutt, M., Patterson, M., Pulverer, B., et al. (2016). A simple proposal for the publication of journal citation distributions. bioRxiv.  
 [2].Milojevic, ´ S., Radicchi, F., & Bar-Ilan, J. (2017). Citation success index – An intuitive pair-wise journal comparison metric. Journal of Informetrics, 11, 223–231.


In [26]:
## CSI函数实现
from collections import Counter
def CSI(c1,c2):
    if len(c1)>0 and len(c2)==0:
        return 1
    if len(c1)==0 and len(c2)>0:
        return 0
    if len(c1)==0 and len(c2)==0:
        return 0.5
    count = 0
    n1 = len(c1)
    n2 = len(c2)
    c1 = Counter(c1)
    c2 = Counter(c2)
    for i in c1:
        for j in c2:
            if i>j:
                count += c1[i]*c2[j]
            if i==j:
                count += 0.5*c1[i]*c2[j]
            if i<j:
                count += 0*c2[j]
    return 1.0*count/n1/n2

## Just load your data
- c1: a list of citation of each publication in journal t.
- c2: a list of citation of each publication in journal r.