<a href="https://colab.research.google.com/github/owaisahmad315/pandas/blob/main/counts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import pandas as pd

In [8]:
"""
how to get an overview of the data found in a
series. For the following examples we will use two series. The songs_66
series:

"""
songs_66 = pd.Series([3, None , 11, 9],
                      index=['George', 'Ringo', 'John', 'Paul'],
                      name='Counts')
songs_66


George     3.0
Ringo      NaN
John      11.0
Paul       9.0
Name: Counts, dtype: float64

In [9]:
# And the scores_2 series:
scores2 = pd.Series([67.3, 100, 96.7, None, 100],
                    index=['Ringo', 'Paul', 'George', 'Peter', 'Billy'],
                    name='test2')
scores2

Ringo      67.3
Paul      100.0
George     96.7
Peter       NaN
Billy     100.0
Name: test2, dtype: float64

In [10]:
""" Given a series, the
.count method returns the number of non-null items. The scores2 series
has 5 entries but one of them is None, so .count only returns 4:
"""
scores2.count() # important to note


4

In [11]:
"""
.value_counts returns a mapping of those values to their counts, ordered
by frequency:

"""
scores2.value_counts()

100.0    2
67.3     1
96.7     1
Name: test2, dtype: int64

In [12]:
"""
To get the unique values or the count of non-NaN items use the .unique
and .nunique methods respectively. Note that .unique includes the nan
value, but .nunique does not count it:

"""
scores2.unique()

array([ 67.3, 100. ,  96.7,   nan])

In [13]:
scores2.nunique()

3

In [15]:
"""
Dealing with duplicate values is another feature of pandas. To drop
duplicate values use the .drop_duplicates method. Since Billy has the
same score as Paul, he will get dropped:

"""
scores2.drop_duplicates()

Ringo      67.3
Paul      100.0
George     96.7
Peter       NaN
Name: test2, dtype: float64

In [16]:
"""
To retrieve a series with boolean values indicating whether its value was
repeated, use the .duplicated method:

"""
scores2.duplicated()

Ringo     False
Paul      False
George    False
Peter     False
Billy      True
Name: test2, dtype: bool

In [22]:
"""
To drop duplicate index entries requires a little more effort. Lets create a
series, scores3, that has 'Paul' in the index twice. If we use the .groupby
method, and group by the index, we can then take the first or last item
from the values for each index label:

"""
scores3 = pd.Series([67.3, 100, 96.7, None, 100, 79],
                    index=['Ringo', 'Paul', 'George', 'Peter', 'Billy',
                    'Paul'])
scores3


Ringo      67.3
Paul      100.0
George     96.7
Peter       NaN
Billy     100.0
Paul       79.0
dtype: float64

In [24]:
scores3.groupby(scores3.index).first()

Billy     100.0
George     96.7
Paul      100.0
Peter       NaN
Ringo      67.3
dtype: float64

In [25]:
scores3.groupby(scores3.index).last()

Billy     100.0
George     96.7
Paul       79.0
Peter       NaN
Ringo      67.3
dtype: float64