# h-Index

In academia, the h-index is a metric used to calculate the impact of a researcher's papers. It is calculated as follows: A researcher has index h if at least h of her N papers have h citations each. If there are multiple h satisfying this formula, the maximum is chosen.

For example, suppose N = 5, and the respective citations of each paper are [4, 3, 0, 1, 5]. Then the h-index would be 3, since the researcher has 3 papers with at least 3 citations.

Given a list of paper citations of a researcher, calculate their h-index.

In [7]:
def h_index(citations): 
    N = len(citations) 
    
    # reverse=True sets it so that you go from largest to smallest
    citations.sort(reverse=True)
    
    h = 0
    while h < N and citations[h] >= h + 1:
        h += 1
    return h

In [17]:
citations_t = [4,3,0,1,5]
h_index(citations_t) 

3

In [18]:
citations_t = [4,3,0,1,5]
citations_t.sort(reverse=True)
citations_t

[5, 4, 3, 1, 0]

### there's a faster way.. 
- sort, however, takes O(N * logN) to sort ...
- so the solution like in many of these situations is to do an iterative calculation that doesn't use a sort. 

In [24]:
def h_index_fast(citations):
    n = len(citations)
    counts = [0 for _ in range(n + 1)]

    # bucketing
    for citation in citations:
        if citation >= n:
            counts[n] += 1
        else:
            counts[citation] += 1

    # counting
    total = 0
    for i in range(n, -1, -1):
        total += counts[i]
        if total >= i:
            return i

In [23]:
# neat way to create compact lists
[0 for _ in range(5)]

[0, 0, 0, 0, 0]