# Pandas Series 
In Python, understanding Series is helpful to understanding dataframes.<br>
Series are indexed data frame with only one data column. It is easier to understand them first before moving to study complex data frames.In Python, a Series is a one-dimensional labeled array in the Pandas library that can hold any data type - integers, floats, strings, or even Python objects.

ðŸ”¹**Basic Structure**

**A Series has two parts:**

Values â€“ the actual data (like numbers or strings)
Index â€“ labels that identify each value
Think of it like a column in a spreadsheet or a single column of a DataFrame.

In [1]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

The constructor for Series data structure is <font color=blue>pandas.Series<br>
**pd.Series(data=[10, 20, 30], index=['a', 'b', 'c'])**
### `pd.Series()`

**when you create a Pandas Series without specifying an index, Pandas automatically assigns a default index which starts with 0, 1, 2....with step +1**

In [2]:
# Series created using a list
series_1 = pd.Series([100, 200, 300, 400, 500])
print(series_1)

0    100
1    200
2    300
3    400
4    500
dtype: int64


In [3]:
# Series created using a list
series_2 = pd.Series([10.1, 20, 'python', 40.4])

print(series_2)

0      10.1
1        20
2    python
3      40.4
dtype: object


The above series returns an 'object' datatype since a Python object is created at this instance. 

In [4]:
# Create a Series of 5 states and their capitals
state_capitals = pd.Series(
    ['Mumbai', 'Chennai', 'Kolkata', 'Bengaluru', 'Jaipur'],
    index=['Maharashtra', 'Tamil Nadu', 'West Bengal', 'Karnataka', 'Rajasthan']
)
print(state_capitals)

Maharashtra       Mumbai
Tamil Nadu       Chennai
West Bengal      Kolkata
Karnataka      Bengaluru
Rajasthan         Jaipur
dtype: object


In [5]:
# define the index
stocks_set1 = ['Reliance', 'TCS', 'Infosys', 'HDFC Bank']

# Price list
S1 = pd.Series([2500, 3600, 1550, 1700], index=stocks_set1)

print("Series S1:")
print(S1)

Series S1:
Reliance     2500
TCS          3600
Infosys      1550
HDFC Bank    1700
dtype: int64


In [6]:
# define the index of Series S2
stocks_set2 = ['Reliance', 'TCS', 'Infosys', 'HDFC Bank']

# Second price list
S2 = pd.Series([2550, 3700, 1600, 1725], index=stocks_set2)

print("Series S2:")
print(S2)

Series S2:
Reliance     2550
TCS          3700
Infosys      1600
HDFC Bank    1725
dtype: int64


In [7]:
# Add both Series
print("Sum of S1 and S2:")
print(S1 + S2)

Sum of S1 and S2:
Reliance     5050
TCS          7300
Infosys      3150
HDFC Bank    3425
dtype: int64


Adding lists that have different indexes  will create 'NaN' values<br>
'NaN' is short for 'Not a Number'. It fills the space for missing or corrupt data.

## Methods or functions
Few important methods or functions that can be applied on Series. 

### <span style="color:black">Series.index</span>
It is useful to know the range of the index when the series is large.

In [8]:
My_Series = pd.Series([10, 20, 30, 40, 50])
print(My_Series.index)

RangeIndex(start=0, stop=5, step=1)


In [9]:
My_Series

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [10]:
My_Series2 = pd.Series([100, 1, 1000, -5], index=['a', 'b', 'c', 'd'])
print(My_Series2.index)

Index(['a', 'b', 'c', 'd'], dtype='object')


In [11]:
My_Series2

a     100
b       1
c    1000
d      -5
dtype: int64

### <span style="color:black">Series.values</span>
It returns the values of the series.

In [12]:
print(My_Series.values)

[10 20 30 40 50]


In [13]:
My_Series.values 

array([10, 20, 30, 40, 50], dtype=int64)

### <span style="color:black">Series.isnull()</span>
We can check for missing values with this method.

In [14]:
print(S1 + S2)

Reliance     5050
TCS          7300
Infosys      3150
HDFC Bank    3425
dtype: int64


In [15]:
# Returns whether the values are null or not. If it is 'True' then the value for that index is a 'NaN value

(S1 + S2).isnull()

Reliance     False
TCS          False
Infosys      False
HDFC Bank    False
dtype: bool

In [16]:
stocks_set1 = ['Reliance', 'TCS', 'Infosys', 'HDFC Bank', 'ICICI Bank', 'Bharti Airtel']

# Create first Series
S1 = pd.Series([2500, 3650, 1550, 1720, 980, 950], index=stocks_set1)

stocks_set2 = ['Reliance', 'TCS', 'Infosys', 'HDFC Bank', 'ICICI Bank', 'Adani Ports']

# Create second Series
S2 = pd.Series([2550, 3725, 1600, 1740, 1000, 890], index=stocks_set2)

# Add both Series
print("Sum of S1 and S2:")
print(S1 + S2)

Sum of S1 and S2:
Adani Ports         NaN
Bharti Airtel       NaN
HDFC Bank        3460.0
ICICI Bank       1980.0
Infosys          3150.0
Reliance         5050.0
TCS              7375.0
dtype: float64


In [17]:
(S1 + S2).isnull()

Adani Ports       True
Bharti Airtel     True
HDFC Bank        False
ICICI Bank       False
Infosys          False
Reliance         False
TCS              False
dtype: bool

### <span style="color:black">Series.dropna()</span>
One way to deal with the 'NaN' values is to drop them completely from the series. This method filters out missing data.

In [18]:
print(S1 + S2)

Adani Ports         NaN
Bharti Airtel       NaN
HDFC Bank        3460.0
ICICI Bank       1980.0
Infosys          3150.0
Reliance         5050.0
TCS              7375.0
dtype: float64


In [19]:
print((S1 + S2).dropna())

HDFC Bank     3460.0
ICICI Bank    1980.0
Infosys       3150.0
Reliance      5050.0
TCS           7375.0
dtype: float64


In [20]:
(S1 + S2).isnull()

Adani Ports       True
Bharti Airtel     True
HDFC Bank        False
ICICI Bank       False
Infosys          False
Reliance         False
TCS              False
dtype: bool

<b>dropna() doesnâ€™t change the original Series -
it just returns a new Series that excludes missing values 
If you want to permanently update the Series, then we should store it in different variable

In [21]:
S3 = (S1 + S2).dropna()
print(S3)

HDFC Bank     3460.0
ICICI Bank    1980.0
Infosys       3150.0
Reliance      5050.0
TCS           7375.0
dtype: float64


In [22]:
S3.isnull()

HDFC Bank     False
ICICI Bank    False
Infosys       False
Reliance      False
TCS           False
dtype: bool

### <span style="color:black">Series.fillna(1)</span>
Another way to deal with the 'NaN' values is to fill a custom value of your choice. Here, we are filling the 'NaN' values with the value '1'. 

In [23]:
print((S1 + S2).fillna(1000))  # Check the filled output

Adani Ports      1000.0
Bharti Airtel    1000.0
HDFC Bank        3460.0
ICICI Bank       1980.0
Infosys          3150.0
Reliance         5050.0
TCS              7375.0
dtype: float64


### pd.Series.apply()

If at all one wants to 'apply' any functions on a particular series, e.g. one wants to 'sine' of each value in the series, then it is possible in pandas.
<br>
<b>Series.apply (func)</b>
<br>
func = A python function that will be applied to every single value of the series.

In [24]:
import numpy as np

In [25]:
math_Series = pd.Series([10, 20, 36, 40, 50, 64])

In [26]:
math_Series.apply(np.sin)  # Find 'sine' of each value in the series

0   -0.544021
1    0.912945
2   -0.991779
3    0.745113
4   -0.262375
5    0.920026
dtype: float64

In [27]:
math_Series.apply(np.sqrt)

0    3.162278
1    4.472136
2    6.000000
3    6.324555
4    7.071068
5    8.000000
dtype: float64

In [28]:
print(math_Series)

0    10
1    20
2    36
3    40
4    50
5    64
dtype: int64


In [29]:
#Overwritting existing data
math_Series = math_Series.apply(np.sqrt)
print(math_Series)

0    3.162278
1    4.472136
2    6.000000
3    6.324555
4    7.071068
5    8.000000
dtype: float64


In [30]:
#Create a random Pandas Series of float numbers

s = pd.Series(np.random.randn(10))
print(s)

0   -0.153177
1    0.511693
2    0.061868
3    0.215273
4   -0.152491
5    0.654514
6    0.413588
7    0.175266
8   -1.984394
9   -0.141943
dtype: float64


In [31]:
#Check the default index
s.index

RangeIndex(start=0, stop=10, step=1)

In [32]:
#Create a new series specifying the index
k = pd.Series(np.random.randn(5),  index = ['a', 'b', 'c', 'd', 'e'])
print(k)

a    0.159386
b   -0.858283
c   -1.218989
d   -0.164702
e    0.942183
dtype: float64


In [33]:
#Create a pandas series using dictionary
dictionary = {'a':1000, 'b':2000, 'c':3000, 'd':4000, 'e':5000}
w = pd.Series(dictionary)
print(w)

a    1000
b    2000
c    3000
d    4000
e    5000
dtype: int64


In [34]:
#Create a pandas series using numpy array
arr = np.array([1, 2, 3, 4, 5])
arr = pd.Series(arr)
print(arr)

0    1
1    2
2    3
3    4
4    5
dtype: int32


In [35]:
#Performing operations similar to Numpy Arrays
arr[0]

1

In [36]:
w['a']

1000

In [37]:
#Vectorized operation
w = w + 2

In [38]:
print(w)

a    1002
b    2002
c    3002
d    4002
e    5002
dtype: int64


In [39]:
w = w ** 2
print(w)

a     1004004
b     4008004
c     9012004
d    16016004
e    25020004
dtype: int64
