# Pandas Series
Pandas is an open-source library in Python that is made mainly for working with relational or labeled data both easily and intuitively. It provides various data structures and operations for manipulating numerical data and time series. This library is built on top of the NumPy library of Python. Pandas is fast and it has high performance & productivity for users.

#### History of Pandas Library
Pandas were initially developed by Wes McKinney in 2008 while he was working at AQR Capital Management. He convinced the AQR to allow him to open source the Pandas. Another AQR employee, Chang She, joined as the second major contributor to the library in 2012. Over time many versions of pandas have been released.

#### Why Use Pandas?
- Fast and efficient for manipulating and analyzing data.
- Data from different file objects can be easily loaded.
- Flexible reshaping and pivoting of data sets
- Provides time-series functionality.

#### What can you do using Pandas?
Pandas are generally used for data science but have you wondered why? This is because pandas are used in conjunction with other libraries that are used for data science. It is built on the top of the NumPy library which means that a lot of structures of NumPy are used or replicated in Pandas. The data produced by Pandas are often used as input for plotting functions of Matplotlib, statistical analysis in SciPy, and machine learning algorithms in Scikit-learn. Here is a list of things that we can do using Pandas.
- Data set cleaning, merging, and joining.
- Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data.
- Columns can be inserted and deleted from DataFrame and higher dimensional objects.
- Powerful group by functionality for performing split-apply-combine operations on data sets.
- Data Visulaization

## Getting Started

#### Install and Import Pandas

In [1]:
# Install Pandas
%pip install pandas

# Import pandas
import pandas as pd

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called indexes.
Pandas Series is nothing but a column in an Excel sheet. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

![Alt text](image-1.png)

#### Creating a Series
In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, or an Excel file. Pandas Series can be created from lists, dictionaries, and from scalar values, etc.

In [9]:
# import pandas as pd
import pandas as pd
 
# simple array
data = [1, 2, 3, 4]
 
ser = pd.Series(data)
print(ser)

0    1
1    2
2    3
3    4
dtype: int64


In [6]:
import pandas as pd 
import numpy as np
  
# Creating empty series 
ser = pd.Series() 
print("Pandas Series: ", ser) 
  
# simple array 
data = np.array(['g', 'e', 'e', 'k', 's']) 
    
ser = pd.Series(data) 
print("Pandas Series:")
print(ser)

Pandas Series:  Series([], dtype: object)
Pandas Series:
0    g
1    e
2    e
3    k
4    s
dtype: object


In [10]:
import pandas as pd
 
# a simple list
list = ['g', 'e', 'e', 'k', 's']
  
# create series form a list
ser = pd.Series(list)
print(ser)

0    g
1    e
2    e
3    k
4    s
dtype: object


### Accessing element of Series
There are two ways through which we can access element of series, they are :
- Accessing Element from Series with Position
- Accessing Element Using Label (index)

#### Accessing Element from Series with Position 
In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.

In [11]:
# import pandas and numpy 
import pandas as pd
import numpy as np
 
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data)
  
  
#retrieve the first element
print(ser[:5])

0    g
1    e
2    e
3    k
4    s
dtype: object


#### Accessing Element Using Label (index)
In order to access an element from series, we have to set values by index label. A Series is like a fixed-size dictionary in that you can get and set values by index label.

Accessing a single element using index label

In [18]:
# import pandas and numpy 
import pandas as pd
import numpy as np
 
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data,index=[10,11,12,13,14,15,16,17,18,19,20,21,22])
  
  
# accessing a element using index element
print("Single Element:")
print(ser[16])
print("\n")
print("A slice of Series:")
print(ser[3:6])

print("Another way of slice of Series:")
print(ser.iloc[3:6])

Single Element:
o


A slice of Series:
13    k
14    s
15    f
dtype: object
Another way of slice of Series:
13    k
14    s
15    f
dtype: object


#### Assign a new index to series

In [8]:
# Load array from list
from pandas import Series, DataFrame
import pandas as pd
series = Series([4, 7, -5, 3])
print("series = ", series)
print("series.index = ", list(series.index))
print("series.values = ", series.values)

series = Series([4, 7, -5, 3], index= ['a', 'b', 'c', 'd'])
print("series with new indexes = ", series)
print("series[[a , c]] = ", series[['a' , 'c']])

series =  0    4
1    7
2   -5
3    3
dtype: int64
series.index =  [0, 1, 2, 3]
series.values =  [ 4  7 -5  3]
series with new indexes =  a    4
b    7
c   -5
d    3
dtype: int64
series[[a , c]] =  a    4
c   -5
dtype: int64


#### Convert dictionary to series

In [14]:
from pandas import Series, DataFrame
import pandas as pd
data = {'Ohio':35000, 'Texas': 71000, 'Oregon': 16000, 'Utah' : 5000}
series = Series(data)
print("series = ", series)
states = ['California', 'Ohio', 'Oregon', 'Texas']
slices = Series(data, index = states)
print("slices = ", slices)
print("pd.isnull(slices):", pd.isnull(slices))
print("pd.notnull(slices):", pd.notnull(slices))
print("series + slices = ", series + slices)


series =  Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64
slices =  California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64
pd.isnull(slices): California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool
pd.notnull(slices): California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool
series + slices =  California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64


In [2]:
from pandas import Series, DataFrame
import pandas as pd
data = {'Ohio':35000, 'Texas': 71000, 'Oregon': 16000, 'Utah' : 5000}
series = Series(data)
series.name = 'population'
series.index.name = 'state'
print("series = ", series)

series =  state
Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
Name: population, dtype: int64


#### Operation on Series
Binary operation methods on series:

| FUNCTION | DESCRIPTION |
| -- | -- |
| add() | Method is used to add series or list like objects with same length to the caller series |
| sub() | Method is used to subtract series or list like objects with same length from the caller series |
| mul()	| Method is used to multiply series or list like objects with same length with the caller series |
| div()	| Method is used to divide series or list like objects with same length by the caller series |
| sum()	| Returns the sum of the values for the requested axis |
| prod() | Returns the product of the values for the requested axis |
| mean() | Returns the mean of the values for the requested axis |
| pow()	| Method is used to put each element of passed series as exponential power of caller series and returned the results |
| abs()	| Method is used to get the absolute numeric value of each element in Series/DataFrame |
| cov()	| Method is used to find covariance of two series |
 
#### Pandas series method:
| FUNCTION | DESCRIPTION |
| -- | -- |
| Series() | A pandas Series can be created with the Series() constructor method. This constructor method accepts a variety of inputs |
| combine_first() |	Method is used to combine two series into one |
| count() | Returns number of non-NA/null observations in the Series |
| size() | Returns the number of elements in the underlying data |
| name() | Method allows to give a name to a Series object, i.e. to the column |
| is_unique() | Method returns boolean if values in the object are unique |
| idxmax() | Method to extract the index positions of the highest values in a Series |
| idxmin() | Method to extract the index positions of the lowest values in a Series |
| sort_values() | Method is called on a Series to sort the values in ascending or descending order |
| sort_index() | Method is called on a pandas Series to sort it by the index instead of its values |
| head() | Method is used to return a specified number of rows from the beginning of a Series. The method returns a brand new Series |
| tail() | Method is used to return a specified number of rows from the end of a Series. The method returns a brand new Series |
| le() | Used to compare every element of Caller series with passed series.It returns True for every element which is Less than or Equal to the element in passed series |
| ne() | Used to compare every element of Caller series with passed series. It returns True for every element which is Not Equal to the element in passed series |
| ge() | Used to compare every element of Caller series with passed series. It returns True for every element which is Greater than or Equal to the element in passed series |
| eq() | Used to compare every element of Caller series with passed series. It returns True for every element which is Equal to the element in passed series |
| gt() | Used to compare two series and return Boolean value for every respective element |
| lt() | Used to compare two series and return Boolean value for every respective element |
| clip() | Used to clip value below and above to passed Least and Max value |
| clip_lower() | Used to clip values below a passed least value |
| clip_upper() | Used to clip values above a passed maximum value |
| astype() | Method is used to change data type of a series |
| tolist() | Method is used to convert a series to list |
| get() | Method is called on a Series to extract values from a Series. This is alternative syntax to the traditional bracket syntax |
| unique() | Pandas unique() is used to see the unique values in a particular column |
| nunique() | Pandas nunique() is used to get a count of unique values |
| value_counts() | Method to count the number of the times each unique value occurs in a Series |
| factorize() | Method helps to get the numeric representation of an array by identifying distinct values |
| map() | Method to tie together the values from one object to another |
| between() | Pandas between() method is used on series to check which values lie between first and second argument |
| apply() | Method is called and feeded a Python function as an argument to use the function on every Series value. This method is helpful for executing custom operations that are not included in pandas or numpy |


In [25]:
# importing pandas module  
import pandas as pd  
 
# creating a series
data1 = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
 
# creating a series
data2 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
 
print(data1)
print(data2)

# data1 + data2
data = data1.add(data2, fill_value=0)
print(data)

# data1 - data2
data = data1.sub(data2, fill_value=0)
print(data)


a    5
b    2
c    3
d    7
dtype: int64
a    1
b    6
d    4
e    9
dtype: int64
a     6.0
b     8.0
c     3.0
d    11.0
e     9.0
dtype: float64
a    4.0
b   -4.0
c    3.0
d    3.0
e   -9.0
dtype: float64


#### Reindexing

In [None]:
# reindex series
from pandas import Series, DataFrame
import pandas as pd
series = Series([4.5, 7.2, -5.3, 3.6], index=['d','b', 'a', 'c'])
print("Before reindex")
print("series.index = ", list(series.index))
print("series.values = ", series.values)
series = series.reindex(['a', 'b', 'c', 'd', 'e'])
print("After reindex")
print("series.index = ", list(series.index))
print("series.values = ", series.values)
print("series.drop('c')")
series = series.drop('e')
print("series.index = ", list(series.index))
print("series.values = ", series.values)

Before reindex
series.index =  ['d', 'b', 'a', 'c']
series.values =  [ 4.5  7.2 -5.3  3.6]
After reindex
series.index =  ['a', 'b', 'c', 'd', 'e']
series.values =  [-5.3  7.2  3.6  4.5  nan]
series.drop('c')
series.index =  ['a', 'b', 'c', 'd']
series.values =  [-5.3  7.2  3.6  4.5]
