### Pandas - Series
* Pandas Series is a one-dimensional labelled array. 
* Axis labels are called as index
* Series is like a column in an excel sheet or a database column
* Labels need not be unique but must be of hashable type. 
* It supports both integer and String-based indexing

### In this Notebook I have covered 
* Create a Series
* Accessing elements of Series
* Create Series with index as labels
* Combine the Series
* map(), apply(), transform(), agg(), groupby(), value_counts(),
* nlargest(), nsmallest(), between(), pct_change(), quantile(),
* align(), drop(), drop_duplicates(), isin(), reindex(), reset_index(), 
* where(), mask(), fillna(), sort_values(), sort_index(), notna()

In [1]:
import pandas as pd

In [2]:
pd.__version__

'1.0.3'

In [3]:
import numpy as np

### Create a Series

In [2]:
rainbow = pd.Series(data = ['violet', 'indigo', 'blue', 'green', 'yellow', 'orange', 'red'])
rainbow

0    violet
1    indigo
2      blue
3     green
4    yellow
5    orange
6       red
dtype: object

In [3]:
# Check type of object
type(rainbow)

pandas.core.series.Series

In [4]:
# Index of rainbow series
rainbow.index

RangeIndex(start=0, stop=7, step=1)

In [5]:
# Are elements of series unique
rainbow.is_unique

True

### Accessing elements

In [6]:
# display first 3 elements , i.e. index 0 to 2

rainbow[0:3]

0    violet
1    indigo
2      blue
dtype: object

In [7]:
# display last 3 colors of rainbow series

rainbow [-3:]

4    yellow
5    orange
6       red
dtype: object

In [8]:
# display colors of rainbow in reverse order

rainbow[::-1]

6       red
5    orange
4    yellow
3     green
2      blue
1    indigo
0    violet
dtype: object

In [9]:
### Create Series with index as labels

rainbow = pd.Series(data = ['violet', 'indigo', 'blue', 'green', 'yellow', 'orange', 'red'],
                   index = ['v','i','b', 'g','y','o', 'r'])
print(rainbow)

v    violet
i    indigo
b      blue
g     green
y    yellow
o    orange
r       red
dtype: object


In [10]:
# Check index of Series
rainbow.index

Index(['v', 'i', 'b', 'g', 'y', 'o', 'r'], dtype='object')

In [11]:
# access elements based on index label

rainbow['v']

'violet'

In [12]:
# specify range of labels

rainbow['v':'g']

v    violet
i    indigo
b      blue
g     green
dtype: object

In [13]:
# access series in reverse order by label names

rainbow['r':'v':-1]

r       red
o    orange
y    yellow
g     green
b      blue
i    indigo
v    violet
dtype: object

In [14]:
# shape of series
rainbow.shape

(7,)

In [15]:
# Series to array

rainbow.to_numpy()

array(['violet', 'indigo', 'blue', 'green', 'yellow', 'orange', 'red'],
      dtype=object)

In [16]:
# Check datatype of elements of Series
rainbow.dtype

dtype('O')

In [17]:
# total bytes (memory) consumed
rainbow.nbytes

56

In [18]:
# Copy , deep=True (default), a new object will be created
colors = rainbow.copy()
colors

v    violet
i    indigo
b      blue
g     green
y    yellow
o    orange
r       red
dtype: object

In [19]:
id(colors) == id(rainbow)

False

In [20]:
# Combine : Combine the Series and other using func to perform elementwise selection for combined Series. 

Sales_2020 = pd.Series(data = [20, 4 ,6 ,8], index=['A','B','C','D'])
Sales_2019 = pd.Series(data = [21, 6, 7, 9], index = ['A', 'B', 'C', 'D'])

In [21]:
Sales_2019

A    21
B     6
C     7
D     9
dtype: int64

In [22]:
Sales_2020

A    20
B     4
C     6
D     8
dtype: int64

In [23]:
Sales_2020.combine(Sales_2019, func=max)

A    21
B     6
C     7
D     9
dtype: int64

In [24]:
### map() # Map values of Series 

ratings = {5: 'Excellent', 4:'Good', 3:'Average', 2:'Poor', 1:'Bad'}

brand_ratings = pd.Series(data = [1, 3, 5, 4], index = ['A', 'B', 'C', 'D'])

brand_ratings


A    1
B    3
C    5
D    4
dtype: int64

In [25]:
brand_ratings.map(ratings)

A          Bad
B      Average
C    Excellent
D         Good
dtype: object

In [26]:
### apply() : Invoke function on values of Series.

pd.Series([1, 2, 3, 4]).apply(lambda x : 2 * x)

0    2
1    4
2    6
3    8
dtype: int64

In [27]:
### transform() : We can give multiple functions to create a dataframe

double = lambda x : 2 *x
thrice = lambda x : 3 * x

pd.Series([1,2,3,4]).transform([double, thrice])

Unnamed: 0,<lambda>,<lambda>.1
0,2,3
1,4,6
2,6,9
3,8,12


In [28]:
### agg() : Aggregate using one or more operation

pd.Series([10, 12, 13, 15]).agg(['min', 'max'])

min    10
max    15
dtype: int64

In [29]:
### Keys() : Return alias for index.
S = pd.Series(data = [1,2,3,4,5], index=['A','B','C','D','E'])
S

A    1
B    2
C    3
D    4
E    5
dtype: int64

In [30]:
S.keys()

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

In [31]:
### groupby()

S = pd.Series([10, 11, 12, 13, 14, 15, 16], index=['A','A','B','B','C','C','C'], name='Brand')
S

A    10
A    11
B    12
B    13
C    14
C    15
C    16
Name: Brand, dtype: int64

In [32]:
S.groupby(by=S.index).mean()

A    10.5
B    12.5
C    15.0
Name: Brand, dtype: float64

In [33]:
S.groupby(level=0).mean()

A    10.5
B    12.5
C    15.0
Name: Brand, dtype: float64

In [34]:
## Value_counts() : counts of unique values.

fav_pets = pd.Series(['Dog', 'Cat', 'Dog', 'Parrot', 'Pigeon', 'Cat', 'Parrot', 'Cat'])
fav_pets.value_counts()

Cat       3
Parrot    2
Dog       2
Pigeon    1
dtype: int64

In [35]:
# to get relative frequency

fav_pets.value_counts(normalize=True)

Cat       0.375
Parrot    0.250
Dog       0.250
Pigeon    0.125
dtype: float64

In [37]:
# value_counts() to include NaNs
import numpy as np

S = pd.Series([10, 11, np.nan, 9, 10, np.nan])
S.value_counts(dropna=False)

NaN     2
10.0    2
9.0     1
11.0    1
dtype: int64

In [38]:
# Remove NaN

S.dropna(inplace=True)
S

0    10.0
1    11.0
3     9.0
4    10.0
dtype: float64

In [39]:
S.value_counts()

10.0    2
9.0     1
11.0    1
dtype: int64

In [40]:
## Rolling()

S = pd.Series([10, 11, 12, 8, 7,6,10])

S.rolling(window=3).sum().dropna()

2    33.0
3    31.0
4    27.0
5    21.0
6    23.0
dtype: float64

In [41]:
## nlargest() Return the largest n elements.

brands = {'A': 5000, 'B':4500, 'C':3000, 'D':2800, 'E':1500, 'F':500}

S = pd.Series(brands, name='bags')
S

A    5000
B    4500
C    3000
D    2800
E    1500
F     500
Name: bags, dtype: int64

In [42]:
# top  3 expensive brands

S.nlargest(n=3)

A    5000
B    4500
C    3000
Name: bags, dtype: int64

In [43]:
# cheapest  3 brands

S.nsmallest(n=3)

F     500
E    1500
D    2800
Name: bags, dtype: int64

In [44]:
# between() : find brands which are within price range

S[S.between(1000, 3000)]

C    3000
D    2800
E    1500
Name: bags, dtype: int64

In [45]:
# pct_change()

Sales = pd.Series([400, 600, 800, 900, 750, 1000], name='Sales')

Sales.pct_change()

0         NaN
1    0.500000
2    0.333333
3    0.125000
4   -0.166667
5    0.333333
Name: Sales, dtype: float64

In [46]:
# quantile() # returns value at given quantile

Sales.quantile(0.50)

775.0

In [47]:
# align()  Align two objects on their axes with the specified join method.

X = pd.Series([1,2,3,4], index=['A', 'B', 'C', 'D'])
Y = pd.Series([10,20,30,40], index=['A','C','D','E'])



In [48]:
s1, s2 = X.align(Y, join='inner', level=0)


In [49]:
s1

A    1
C    3
D    4
dtype: int64

In [50]:
s2

A    10
C    20
D    30
dtype: int64

In [51]:
## drop() Return Series with specified index labels removed.

x = pd.Series([1,2,3], ['A', 'B', 'C'])
x

A    1
B    2
C    3
dtype: int64

In [52]:
x.drop('A') # remove label A

B    2
C    3
dtype: int64

In [53]:
## drop_duplicates()

A = pd.Series([10, 15, 10, 20, 30, 10])
A.drop_duplicates()

0    10
1    15
3    20
4    30
dtype: int64

In [54]:
## head() to select first 5 elements
A.head()

0    10
1    15
2    10
3    20
4    30
dtype: int64

In [55]:
# index location where max, min element is present

A.idxmax() , A.idxmin()

(4, 0)

In [56]:
## isin()

cities = pd.Series(['Delhi', 'Mumbai', 'Chennai'])

cities.isin(['Pune', 'Mumbai'])

0    False
1     True
2    False
dtype: bool

In [57]:
## reindex()

S = pd.Series([1,2,3,4], index=['A', 'B', 'C', 'D'])
S

A    1
B    2
C    3
D    4
dtype: int64

In [58]:
S_new = S.reindex(index=['B','C','D','A','Z'])
S_new

B    2.0
C    3.0
D    4.0
A    1.0
Z    NaN
dtype: float64

In [59]:
## reset_index() : index is treated as Column
S2= S_new.reset_index(drop=False, name='Value')
S2

Unnamed: 0,index,Value
0,B,2.0
1,C,3.0
2,D,4.0
3,A,1.0
4,Z,


In [60]:
S2.index

RangeIndex(start=0, stop=5, step=1)

In [62]:
## where()

S = pd.Series([1, 3, -5, 6, 7 , -9, 10])

# replace all negative value with zero
S.where(S > 0, other=0)

0     1
1     3
2     0
3     6
4     7
5     0
6    10
dtype: int64

In [63]:
## mask()
S = pd.Series([4, 5, np.nan, 6, 7])

S.mask(S.isna(), other=0)


0    4.0
1    5.0
2    0.0
3    6.0
4    7.0
dtype: float64

In [64]:
## fillna()
S.fillna(value=0)

0    4.0
1    5.0
2    0.0
3    6.0
4    7.0
dtype: float64

In [65]:
## Sort 
S.sort_values(ascending=False)

4    7.0
3    6.0
1    5.0
0    4.0
2    NaN
dtype: float64

In [66]:
## Sort based on index 
S.sort_index(ascending=False)

4    7.0
3    6.0
2    NaN
1    5.0
0    4.0
dtype: float64

In [67]:
# notna()
S[S.notna()]

0    4.0
1    5.0
3    6.0
4    7.0
dtype: float64