In [None]:
..\..\..\data_science\exercise\Section-3-Python-for-Data-Scientists/

---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<img align="right" width="400" height="400"  src="..\..\..\data_science\exercise\Section-3-Python-for-Data-Scientists/images/pandas-apps.png"  >

## _Overview of Pandas Series Data Structure.ipynb_

#### Read about Pandas Data: https://pandas.pydata.org/docs/user_guide

## Learning agenda of this notebook

1. Overview of Python Pandas library and its data structures
2. Creating a Series
    - From Python List
    - From NumPy Arrays
    - From Python Dictionary
    - From a scalar value
3. Attributes of a Pandas Series
4. Understanding Index in a Series and its usage
    - Identification
    - Selection/Filtering/Subsetting
    - Alignment

In [2]:
import pandas as pd
pd.__version__, pd.__path__

('2.0.3', ['C:\\Users\\FashN\\anaconda3\\Lib\\site-packages\\pandas'])

<img align="right" width="500" height="600"  src="..\..\..\data_science\exercise\Section-3-Python-for-Data-Scientists/images/series-anatomy.png"  >

## 1. Creating a Series
> **A Series is a one-dimensional array capable of holding a sequence of values of any data type (integers, floating point numbers, strings, Python objects etc) which by default have numeric data labels starting from zero. You can imagine a Pandas Series as a column in a spreadsheet or a Pandas Dataframe object.**
- To create a Series object you can use `pd.Series()` method

**```pd.Series(data, index, dtype, name)```**
- Where,
   - `data`: can be a Python list, Python dictionary, numPy array, or a scalar value.
   - `index`: If you donot pass the index argument, it will default to `np.arrange(n)`. Indices must be hashable (numbers or strings) and have the same length as `data`. Non-unique index values are allowed. Index is used for three purposes:
       - Identification.
       - Selection.
       - Alignment.
   - `dtype`: Optionally, you can assign any valid numpy datatype to the series object (np.sctypes). If not specified, this will be inferred from `data`.
   - `name`: Optionally, you can assign a name to a series, which becomes attribute of the series object. Moreover, it becomes the column name, if that series object is used to create a dataframe later.

### a. Creating a Series from Python List

In [3]:
import pandas as pd
import numpy as np

list1 = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
# When index is not provided, it creates an index for the data starting from zero and with a step size of one.

s = pd.Series(data=list1)
print(f'list = {list1}\nseries = \n{s}\ntype = {type(s)}')

list = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
series = 
0    Godwin
1    Kelvin
2      Mike
3          
4     Levin
dtype: object
type = <class 'pandas.core.series.Series'>


>Observe that output is shown in two columns - the index is on the left and the data value is on the right. If we do not explicitly specify an index for the data values while creating a series, then by default indices range from 0 through N – 1. Here N is the number of data elements.

**You can explicitly specify the index for a Series object, which can be either int or string type, and must be of the same size as the values in the series. Otherwise, it will raise a ValueError**

In [4]:
list1 = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
indices = ['MS01', 'MS02', '', 'MS02', 'MS03'] # non-unique index values are allowed and you can have empty string as index

s=pd.Series(data=list1, index=indices)
print(f'list = {list1}\nseries = \n{s}\ntype = {type(s)}')

list = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
series = 
MS01    Godwin
MS02    Kelvin
          Mike
MS02          
MS03     Levin
dtype: object
type = <class 'pandas.core.series.Series'>


In [5]:
s['MS01']

'Godwin'

In [8]:
s['MS02']

MS02    Kelvin
MS02          
dtype: object

>Also note that non-unique indices are allowed

In [9]:
list1 = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
indices = [1.01, 1.02, 1.03, 1.03, 1.04] # non-unique index values are allowed and you can have empty string as index

s=pd.Series(data=list1, index=indices)
print(f'list = {list1}\nseries = \n{s}\ntype = {type(s)}')

list = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
series = 
1.01    Godwin
1.02    Kelvin
1.03      Mike
1.03          
1.04     Levin
dtype: object
type = <class 'pandas.core.series.Series'>


In [10]:
s[1.03]

1.03    Mike
1.03        
dtype: object

**You can create a series with NaN values, using `np.nan`, which is IEEE 754 floating-point representation of Not a Number. NaN values can act as a placeholder for any missing numerical values in the array.**

In [12]:
list1 = [1.01, 1.02, 1.03, 1.03, 1.04]
# indices = [1.01, 1.02, 1.03, 1.03, 1.04] # non-unique index values are allowed and you can have empty string as index

s=pd.Series(data=list1, index=indices)
print(f'list = {list1}\nseries = \n{s}\ntype = {type(s)}')

list = [1.01, 1.02, 1.03, 1.03, 1.04]
series = 
1.01    1.01
1.02    1.02
1.03    1.03
1.03    1.03
1.04    1.04
dtype: float64
type = <class 'pandas.core.series.Series'>


**You can use the `dtype` argument to specify a datatype to the series object.**

In [18]:
list1 = [101, 102, 103, 103, 104]

s=pd.Series(data=list1,dtype=np.uint8)
print(f'list = {list1}\nseries = \n{s}\ntype = {type(s)}')

list = [101, 102, 103, 103, 104]
series = 
0    101
1    102
2    103
3    103
4    104
dtype: uint8
type = <class 'pandas.core.series.Series'>


**Optionally, you can assign a name to a series, which becomes attribute of the series object. Moreover, it becomes the column name, if that series object is used to create a dataframe later.**

In [19]:
list1 = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
indices = ['MS01', 'MS02', '', 'MS02', 'MS03']

s=pd.Series(data=list1, index=indices, name='Friends')
print(f'list = {list1}\nseries = \n{s}\ntype = {type(s)}')

list = ['Godwin', 'Kelvin', 'Mike', ' ', 'Levin']
series = 
MS01    Godwin
MS02    Kelvin
          Mike
MS02          
MS03     Levin
Name: Friends, dtype: object
type = <class 'pandas.core.series.Series'>


### b. Creating a Series from NumPy Array

In [20]:
s = pd.Series(data=np.arange(4))
print(f'series = \n{s}\ntype = {type(s)}')

series = 
0    0
1    1
2    2
3    3
dtype: int32
type = <class 'pandas.core.series.Series'>


In [22]:
arr1=np.array([22.3, 33.6,98, 44])

s=pd.Series(data=arr1,dtype='float64')
print(f'series = \n{s}\ntype = {type(s)}')

series = 
0    22.3
1    33.6
2    98.0
3    44.0
dtype: float64
type = <class 'pandas.core.series.Series'>


### c. Creating a Series from Python Dictionary

In [23]:
my_dict = {
    'name':"Arif", 
    'gender':"Male", 
    'Role':"Teacher", 
    'subject':"Data Science"}

s=pd.Series(data=my_dict)
print(f'series = \n{s}\ntype = {type(s)}')

series = 
name               Arif
gender             Male
Role            Teacher
subject    Data Science
dtype: object
type = <class 'pandas.core.series.Series'>


**When you create a series from dictionary, it will automatically take the keys as index and the value as data**

### d. Creating a Series from Scalar value

In [24]:
s = pd.Series(data=25)
print(f'series = \n{s}\ntype = {type(s)}')

series = 
0    25
dtype: int64
type = <class 'pandas.core.series.Series'>


### e. Creating an Empty Series

In [25]:
s = pd.Series()
print(f'series = \n{s}\ntype = {type(s)}')

series = 
Series([], dtype: object)
type = <class 'pandas.core.series.Series'>


## 3. Attributes of Panda  Series
- We can access certain properties called attributes of a series by using that property with the series name using dot `.` notation

In [29]:
my_dict = {0: "Godwin", 1:"Simon", 2:"Blessing", 3:"Gloria", 4:"Baby"}
s = pd.Series(my_dict, name='family')
print(f'series = \n{s}\ntype = {type(s)}')

series = 
0      Godwin
1       Simon
2    Blessing
3      Gloria
4        Baby
Name: family, dtype: object
type = <class 'pandas.core.series.Series'>


In [32]:
print(f's.name = {s.name}\ns.index = {s.index}\ns.values = {s.values}\ns.hasnan = {s.hasnans}')

s.name = family
s.index = Index([0, 1, 2, 3, 4], dtype='int64')
s.values = ['Godwin' 'Simon' 'Blessing' 'Gloria' 'Baby']
s.hasnan = False


In [33]:
s.iloc[[2,4]]

2    Blessing
4        Baby
Name: family, dtype: object

<img align="right" width="300" height="300"  src="..\..\..\data_science\exercise\Section-3-Python-for-Data-Scientists/images/series-anatomy.png"  >

### b. First use of Index (Identification)
- Since every data value of a series object has an associated index (integer or string). So we can use this index/label to identify or access data value(s)
- There are three ways to access elements of a series:
    - Using `s[]` operator and specifying the index (integer/label)
    - Using `s.loc[]` method and specifying the index (integer/label)
    - Using `.iloc[]` method and specify the position (an integer value from 0 to length-1). It also support negative indexing, the last element can be accessed by an index of -1

**Identification using Integer Indices or by Position**