### Pandas

Pandas is an open-source library that is built on top of NumPy library. It is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users.

Table of Contents

        Introduction
        Creating Objects
        Viewing Data
        Selection
        Manipulating Data
        Grouping Data
        Merging, Joining and Concatenating
        Working with Date and Time
        Working With Text Data
        Working with CSV and Excel files
        Operations
        Visualization
        Applications and Projects
        Miscellaneous 



#### 1. Introduction

Advantages

    Fast and efficient for manipulating and analyzing data.
    Data from different file objects can be loaded.
    Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
    Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
    Data set merging and joining.
    Flexible reshaping and pivoting of data sets
    Provides time-series functionality.
    Powerful group by functionality for performing split-apply-combine operations on data sets. 

Pandas generally provide two data structure for manipulating data, They are:

    Series
    DataFrame 

#### Series

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

#### Creating a Series

In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas Series can be created from the lists, dictionary, and from a scalar value etc.

In [2]:

import pandas as pd  
import numpy as np 
  
#Creating empty series  
ser = pd.Series()  
    
print(ser)  
  
# simple array  
data = np.array(['s', 'a', 'd', 'i', 'q'])  
    
ser = pd.Series(data)  
print(ser) 


Series([], dtype: float64)
0    s
1    a
2    d
3    i
4    q
dtype: object


#### DataFrame

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

#### Creating a DataFrame

In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.

In [3]:

import pandas as pd  
    
# Calling DataFrame constructor  
df = pd.DataFrame()  
print(df) 
  
# list of strings  
lst = ['Geeks', 'For', 'Geeks', 'is',   
            'portal', 'for', 'Geeks']  
    
# Calling DataFrame constructor on list  
df = pd.DataFrame(lst)  
print(df)  


Empty DataFrame
Columns: []
Index: []
        0
0   Geeks
1     For
2   Geeks
3      is
4  portal
5     for
6   Geeks


#### 2. Creating Objects

##### Creating a dataframe using List:
DataFrame can be created using a single list or a list of lists.

In [4]:
# import pandas as pd
import pandas as pd
 
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is', 
            'portal', 'for', 'Geeks']
 
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)

        0
0   Geeks
1     For
2   Geeks
3      is
4  portal
5     for
6   Geeks


##### Creating DataFrame from dict of ndarray/lists: 

To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length.

In [5]:
# Python code demonstrate creating 
# DataFrame from dict narray / lists 
# By default addresses.
 
import pandas as pd
 
# intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
        'Age':[20, 21, 19, 18]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
print(df)

    Name  Age
0    Tom   20
1   nick   21
2  krish   19
3   jack   18


A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

#### Column Selection: 
In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

In [8]:
# Import pandas package
import pandas as pd
 
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
 
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
print(df)
print()
# select two columns
print(df[['Name', 'Qualification']])

     Name  Age    Address Qualification
0     Jai   27      Delhi           Msc
1  Princi   24     Kanpur            MA
2  Gaurav   22  Allahabad           MCA
3    Anuj   32    Kannauj           Phd

     Name Qualification
0     Jai           Msc
1  Princi            MA
2  Gaurav           MCA
3    Anuj           Phd


#### Row Selection: 
Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

In [12]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")

print(data.columns)
print()
print(data.head())
print()
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)

Index(['Team', 'Number', 'Position', 'Age', 'Height', 'Weight', 'College',
       'Salary'],
      dtype='object')

                         Team  Number Position   Age Height  Weight  \
Name                                                                  
Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
Jae Crowder    Boston Celtics    99.0       SF  25.0    6-6   235.0   
John Holland   Boston Celtics    30.0       SG  27.0    6-5   205.0   
R.J. Hunter    Boston Celtics    28.0       SG  22.0    6-5   185.0   
Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

                         College     Salary  
Name                                         
Avery Bradley              Texas  7730337.0  
Jae Crowder            Marquette  6796117.0  
John Holland   Boston University        NaN  
R.J. Hunter        Georgia State  1148640.0  
Jonas Jerebko                NaN  5000000.0  

Team        Boston Celtics
Number                   0
Position 

#### Indexing and Selecting Data

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

Indexing a Dataframe using indexing operator [] :
Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing operator to refer to df[].
Selecting a single columns

In order to select a single column, we simply put the name of the column in-between the brackets

In [13]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
 
# retrieving columns by indexing operator
first = data["Age"]
  
print(first)

Name
Avery Bradley              25.0
Jae Crowder                25.0
John Holland               27.0
R.J. Hunter                22.0
Jonas Jerebko              29.0
Amir Johnson               29.0
Jordan Mickey              21.0
Kelly Olynyk               25.0
Terry Rozier               22.0
Marcus Smart               22.0
Jared Sullinger            24.0
Isaiah Thomas              27.0
Evan Turner                27.0
James Young                20.0
Tyler Zeller               26.0
Bojan Bogdanovic           27.0
Markel Brown               24.0
Wayne Ellington            28.0
Rondae Hollis-Jefferson    21.0
Jarrett Jack               32.0
Sergey Karasev             22.0
Sean Kilpatrick            26.0
Shane Larkin               23.0
Brook Lopez                28.0
Chris McCullough           21.0
Willie Reed                26.0
Thomas Robinson            25.0
Henry Sims                 26.0
Donald Sloan               28.0
Thaddeus Young             27.0
                           ... 
Al-

#### Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and columns. The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns. 

In [14]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
 
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
 
print(first, "\n\n\n", second)

Team        Boston Celtics
Number                   0
Position                PG
Age                     25
Height                 6-2
Weight                 180
College              Texas
Salary         7.73034e+06
Name: Avery Bradley, dtype: object 


 Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: R.J. Hunter, dtype: object


#### Indexing a DataFrame using .iloc[ ] :

This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

In [15]:
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
 
#retrieving rows by iloc method 
row2 = data.iloc[3] 
 
print(row2)

Team        Boston Celtics
Number                  28
Position                SG
Age                     22
Height                 6-5
Weight                 185
College      Georgia State
Salary         1.14864e+06
Name: R.J. Hunter, dtype: object


#### Working with Missing Data

Missing Data can occur when no information is provided for one or more items or for a whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also refer to as NA(Not Available) values in pandas.



#### Checking for missing values using isnull() and notnull() :
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series

In [17]:
# importing pandas as pd
import pandas as pd
 
# importing numpy as np
import numpy as np
 
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score':[np.nan, 40, 80, 98]}
 
# creating a dataframe from list
df = pd.DataFrame(dict)
print(df)
# using isnull() function  
df.isnull()

   First Score  Second Score  Third Score
0        100.0          30.0          NaN
1         90.0          45.0         40.0
2          NaN          56.0         80.0
3         95.0           NaN         98.0


Unnamed: 0,First Score,Second Score,Third Score
0,False,False,True
1,False,False,False
2,True,False,False
3,False,True,False


#### Filling missing values using fillna(), replace() and interpolate() :

In order to fill null values in a datasets, we use fillna(), replace() and interpolate() function these function replace NaN values with some value of their own. All these function help in filling a null values in datasets of a DataFrame. Interpolate() function is basically used to fill NA values in the dataframe but it uses various interpolation technique to fill the missing values rather than hard-coding the value. 

In [18]:
# importing pandas as pd
import pandas as pd
 
# importing numpy as np
import numpy as np
 
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score':[np.nan, 40, 80, 98]}
 
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
 
# filling missing value using fillna()  
df.fillna(0)


Unnamed: 0,First Score,Second Score,Third Score
0,100.0,30.0,0.0
1,90.0,45.0,40.0
2,0.0,56.0,80.0
3,95.0,0.0,98.0


#### Dropping missing values using dropna() :

In order to drop a null values from a dataframe, we used dropna() function this fuction drop Rows/Columns of datasets with Null values in different ways.

In [21]:
# importing pandas as pd
import pandas as pd
 
# importing numpy as np
import numpy as np
 
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
 
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
   
df

Unnamed: 0,First Score,Second Score,Third Score,Fourth Score
0,100.0,30.0,52,
1,90.0,,40,
2,,45.0,80,
3,95.0,56.0,98,65.0


Now we drop rows with at least one Nan value (Null value)

In [22]:
# importing pandas as pd
import pandas as pd
 
# importing numpy as np
import numpy as np
 
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
 
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
 
# using dropna() function  
df.dropna()

Unnamed: 0,First Score,Second Score,Third Score,Fourth Score
3,95.0,56.0,98,65.0


#### Iterating over rows and columns

#### Iterating over rows :
In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.

In [23]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe from a dictionary 
df = pd.DataFrame(dict)
 
print(df)

     name  degree  score
0  aparna     MBA     90
1  pankaj     BCA     40
2  sudhir  M.Tech     80
3   Geeku     MBA     98


Now we apply iterrows() function in order to get a each element of rows.

In [24]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe from a dictionary 
df = pd.DataFrame(dict)
 
# iterating over rows using iterrows() function 
for i, j in df.iterrows():
    print(i, j)
    print()

0 name      aparna
degree       MBA
score         90
Name: 0, dtype: object

1 name      pankaj
degree       BCA
score         40
Name: 1, dtype: object

2 name      sudhir
degree    M.Tech
score         80
Name: 2, dtype: object

3 name      Geeku
degree      MBA
score        98
Name: 3, dtype: object



In [25]:
# iterating over columns using iteritems() function 
for i, j in df.iteritems():
    print(i, j)
    print()

name 0    aparna
1    pankaj
2    sudhir
3     Geeku
Name: name, dtype: object

degree 0       MBA
1       BCA
2    M.Tech
3       MBA
Name: degree, dtype: object

score 0    90
1    40
2    80
3    98
Name: score, dtype: int64



In [27]:
# iterating over columns using itertuples() function
for i in df.itertuples():
    print(i)
    print()

Pandas(Index=0, name='aparna', degree='MBA', score=90)

Pandas(Index=1, name='pankaj', degree='BCA', score=40)

Pandas(Index=2, name='sudhir', degree='M.Tech', score=80)

Pandas(Index=3, name='Geeku', degree='MBA', score=98)



#### DataFrame Methods:

Function 	Description

index() 	Method returns index (row labels) of the DataFrame

insert() 	Method inserts a column into a DataFrame

add() 	Method returns addition of dataframe and other, element-wise (binary operator add)

sub() 	Method returns subtraction of dataframe and other, element-wise (binary operator sub)

mul() 	Method returns multiplication of dataframe and other, element-wise (binary operator mul)

div() 	Method returns floating division of dataframe and other, element-wise (binary operator truediv)

unique() 	Method extracts the unique values in the dataframe

nunique() 	Method returns count of the unique values in the dataframe
value_counts() 	Method counts the number of times each unique value occurs 
within the Series

columns() 	Method returns the column labels of the DataFrame

axes() 	Method returns a list representing the axes of the DataFrame

isnull() 	Method creates a Boolean Series for extracting rows with null values

notnull() 	Method creates a Boolean Series for extracting rows with non-null values

between() 	Method extracts rows where a column value falls in between a predefined range

isin() 	Method extracts rows from a DataFrame where a column value exists in a predefined collection

dtypes() 	Method returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns

astype() 	Method converts the data types in a Series

values() 	Method returns a Numpy representation of the DataFrame i.e. only the values in the DataFrame will be returned, the axes labels will be removed

sort_values()- Set1, Set2 	Method sorts a data frame in Ascending or Descending order of passed Column

sort_index() 	Method sorts the values in a DataFrame based on their index positions or labels instead of their values but sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method

loc[] 	Method retrieves rows based on index label

iloc[] 	Method retrieves rows based on index position

ix[] 	Method retrieves DataFrame rows based on either index label or index position. This method combines the best features of the .loc[] and 
.iloc[] methods

rename() 	Method is called on a DataFrame to change the names of the index labels or column names

columns() 	Method is an alternative attribute to change the coloumn name

drop() 	Method is used to delete rows or columns from a DataFrame

pop() 	Method is used to delete rows or columns from a DataFrame

sample() 	Method pulls out a random sample of rows or columns from a DataFrame

nsmallest() 	Method pulls out the rows with the smallest values in a column

nlargest() 	Method pulls out the rows with the largest values in a column

shape() 	Method returns a tuple representing the dimensionality of the DataFrame

ndim() 	Method returns an ‘int’ representing the number of axes / array dimensions.
Returns 1 if Series, otherwise returns 2 if DataFrame

dropna() 	Method allows the user to analyze and drop Rows/Columns with Null values in different ways

fillna() 	Method manages and let the user replace NaN values with some value of their own

rank() 	Values in a Series can be ranked in order with this method

query() 	Method is an alternate string-based syntax for extracting a subset from a DataFrame

copy() 	Method creates an independent copy of a pandas object

duplicated() 	Method creates a Boolean Series and uses it to extract rows that have duplicate values

drop_duplicates() 	Method is an alternative option to identifying duplicate rows and removing them through filtering

set_index() 	Method sets the DataFrame index (row labels) using one or more existing columns

reset_index() 	Method resets index of a Data Frame. This method sets a list of integer ranging from 0 to length of data as index

where() 	Method is used to check a Data Frame for one or more condition and return the result accordingly. By default, the rows not satisfying the condition are filled with NaN value

#### Creating a Pandas Series

#### Creating a series from Dictionary:

In order to create a series from dictionary, we have to first create a dictionary after that we can make a series using dictionary. Dictionary key are used to construct a index.

In [44]:

import pandas as pd 
   
# a simple dictionary 
dict = {'Geeks' : 10, 
        'for' : 20, 
        'geeks' : 30} 
   
# create series from dictionary 
ser = pd.Series(dict) 
   
print(ser) 


Geeks    10
for      20
geeks    30
dtype: int64


#### Creating a series from Scalar value:

In order to create a series from scalar value, an index must be provided. The scalar value will be repeated to match the length of index.

In [45]:

import pandas as pd 
  
import numpy as np 
  
# giving a scalar value with index 
ser = pd.Series(10, index =[0, 1, 2, 3, 4, 5]) 
  
print(ser) 


0    10
1    10
2    10
3    10
4    10
5    10
dtype: int64


#### Creating a series using NumPy functions :

In order to create a series using numpy function, we can use different function of numpy like numpy.linspace(), numpy.random.radn().

In [46]:

# import pandas and numpy  
import pandas as pd  
import numpy as np  
    
# series with numpy linspace()   
ser1 = pd.Series(np.linspace(3, 33, 3))  
print(ser1)  
    
# series with numpy linspace()  
ser2 = pd.Series(np.linspace(1, 100, 10))  
print("\n", ser2)  


0     3.0
1    18.0
2    33.0
dtype: float64

 0      1.0
1     12.0
2     23.0
3     34.0
4     45.0
5     56.0
6     67.0
7     78.0
8     89.0
9    100.0
dtype: float64


#### Creating a series from array: 

In order to create a series from array, we have to import a numpy module and have to use array() function.

In [28]:
# import pandas as pd
import pandas as pd
 
# import numpy as np
import numpy as np
 
# simple array
data = np.array(['g','e','e','k','s'])
 
ser = pd.Series(data)
print(ser)

0    g
1    e
2    e
3    k
4    s
dtype: object


#### Creating a series from Lists:

In order to create a series from list, we have to first create a list after that we can create a series from list.

In [29]:
import pandas as pd
 
# a simple list
list = ['g', 'e', 'e', 'k', 's']
  
# create series form a list
ser = pd.Series(list)
print(ser)

0    g
1    e
2    e
3    k
4    s
dtype: object


#### Accessing element of Series

There are two ways through which we can access element of series, they are :

    Accessing Element from Series with Position
    Accessing Element Using Label (index)


#### Accessing Element from Series with Position :

In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.

In [30]:
# import pandas and numpy 
import pandas as pd
import numpy as np
 
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data)
  
#retrieve the first five element
print(ser[:])
print(ser[:5])

0     g
1     e
2     e
3     k
4     s
5     f
6     o
7     r
8     g
9     e
10    e
11    k
12    s
dtype: object
0    g
1    e
2    e
3    k
4    s
dtype: object


#### Accessing Element Using Label (index) :
    
In order to access an element from series, we have to set values by index label. A Series is like a fixed-size dictionary in that you can get and set values by index label.

Accessing a single element using index label 

In [35]:
# import pandas and numpy 
import pandas as pd
import numpy as np
 
# creating simple array
data = np.array(['g','e','e','k','s','f', 'o','r','g','e','e','k','s'])
ser = pd.Series(data,index=[10,11,12,13,14,15,16,17,18,19,20,21,22])
  
#accessing a element using index element
print(ser[16])
print(ser[10])
print(ser[2:5])
print(ser[5:2:-1])

o
g
12    e
13    k
14    s
dtype: object
15    f
14    s
13    k
dtype: object


#### Indexing and Selecting Data in Series

#### Indexing a Series using indexing operator [] :

Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing operator to refer to df[ ].

In [36]:
# importing pandas module  
import pandas as pd  
     
# making data frame  
df = pd.read_csv("nba.csv")  
   
ser = pd.Series(df['Name']) 
data = ser.head(10)
data 

0    Avery Bradley
1      Jae Crowder
2     John Holland
3      R.J. Hunter
4    Jonas Jerebko
5     Amir Johnson
6    Jordan Mickey
7     Kelly Olynyk
8     Terry Rozier
9     Marcus Smart
Name: Name, dtype: object

#### Indexing a Series using .loc[ ] :

This function selects data by refering the explicit index . The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets of data. 

In [37]:
# importing pandas module  
import pandas as pd  
     
# making data frame  
df = pd.read_csv("nba.csv")  
   
ser = pd.Series(df['Name']) 
data = ser.head(10)


# using .loc[] function
data.loc[3:6]

3      R.J. Hunter
4    Jonas Jerebko
5     Amir Johnson
6    Jordan Mickey
Name: Name, dtype: object

#### Indexing a Series using .iloc[ ] :
This function allows us to retrieve data by position. In order to do that, we’ll need to specify the positions of the data that we want. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

In [38]:
# importing pandas module  
import pandas as pd  
     
# making data frame  
df = pd.read_csv("nba.csv")  
   
ser = pd.Series(df['Name']) 
data = ser.head(10)


# using .iloc[] function
data.iloc[3:6]

3      R.J. Hunter
4    Jonas Jerebko
5     Amir Johnson
Name: Name, dtype: object

#### Binary Operation on Series

We can perform binary operation on series like addition, subtraction and many other operation. In order to perform binary operation on series we have to use some function like .add(),.sub() etc..

In [39]:
# importing pandas module  
import pandas as pd  
 
# creating a series
data = pd.Series([5, 2, 3,7], index=['a', 'b', 'c', 'd'])
 
# creating a series
data1 = pd.Series([1, 6, 4, 9], index=['a', 'b', 'd', 'e'])
 
print(data, "\n\n", data1)

a    5
b    2
c    3
d    7
dtype: int64 

 a    1
b    6
d    4
e    9
dtype: int64


Now we add two series using .add() function.

In [40]:
data.add(data1, fill_value=0)

a     6.0
b     8.0
c     3.0
d    11.0
e     9.0
dtype: float64

In [41]:
data.sub(data1, fill_value=0)

a    4.0
b   -4.0
c    3.0
d    3.0
e   -9.0
dtype: float64

#### Conversion Operation on Series

In conversion operation we perform various operation like changing datatype of series, changing a series to list etc. In order to perform conversion operation we have various function which help in conversion like .astype(), .tolist() etc.

In [42]:
# Python program using astype
# to convert a datatype of series
 
# importing pandas module  
import pandas as pd 
   
# reading csv file from url  
data = pd.read_csv("nba.csv") 
    
# dropping null value columns to avoid errors 
data.dropna(inplace = True) 
   
# storing dtype before converting 
before = data.dtypes 
   
# converting dtypes using astype 
data["Salary"]= data["Salary"].astype(int) 
data["Number"]= data["Number"].astype(str) 
   
# storing dtype after converting 
after = data.dtypes 
   
# printing to compare 
print("BEFORE CONVERSION\n", before, "\n") 
print("AFTER CONVERSION\n", after, "\n") 

BEFORE CONVERSION
 Name         object
Team         object
Number      float64
Position     object
Age         float64
Height       object
Weight      float64
College      object
Salary      float64
dtype: object 

AFTER CONVERSION
 Name         object
Team         object
Number       object
Position     object
Age         float64
Height       object
Weight      float64
College      object
Salary        int32
dtype: object 



In [43]:
# Python program converting
# a series into list
 
# importing pandas module  
import pandas as pd  
   
# importing regex module 
import re 
     
# making data frame  
data = pd.read_csv("nba.csv")  
     
# removing null values to avoid errors  
data.dropna(inplace = True)  
   
# storing dtype before operation 
dtype_before = type(data["Salary"]) 
   
# converting to list 
salary_list = data["Salary"].tolist() 
   
# storing dtype after operation 
dtype_after = type(salary_list) 
   
# printing dtype 
print("Data type before converting = {}\nData type after converting = {}"
      .format(dtype_before, dtype_after)) 
   
# displaying list 
salary_list 

Data type before converting = <class 'pandas.core.series.Series'>
Data type after converting = <class 'list'>


[7730337.0,
 6796117.0,
 1148640.0,
 1170960.0,
 2165160.0,
 1824360.0,
 3431040.0,
 2569260.0,
 6912869.0,
 3425510.0,
 1749840.0,
 2616975.0,
 845059.0,
 1500000.0,
 1335480.0,
 6300000.0,
 134215.0,
 1500000.0,
 19689000.0,
 1140240.0,
 947276.0,
 981348.0,
 947276.0,
 947276.0,
 11235955.0,
 8000000.0,
 1635476.0,
 22875000.0,
 845059.0,
 845059.0,
 1572360.0,
 12650000.0,
 3750000.0,
 1636842.0,
 4000000.0,
 167406.0,
 947276.0,
 1000000.0,
 4626960.0,
 845059.0,
 1074169.0,
 6500000.0,
 2144772.0,
 525093.0,
 3457800.0,
 4582680.0,
 947276.0,
 2869440.0,
 947276.0,
 525093.0,
 13600000.0,
 10050000.0,
 2500000.0,
 7000000.0,
 12000000.0,
 6268675.0,
 650000.0,
 3553917.0,
 245177.0,
 1509360.0,
 3873398.0,
 13800000.0,
 947276.0,
 11370786.0,
 2008748.0,
 14260870.0,
 11710456.0,
 1131960.0,
 845059.0,
 1270964.0,
 3815000.0,
 15501000.0,
 1100602.0,
 111444.0,
 5675000.0,
 525093.0,
 9650000.0,
 18907726.0,
 1100602.0,
 19689000.0,
 947276.0,
 21468695.0,
 3376000.0,
 7085000.0,

#### Binary operation methods on series:

Function 	Description

add() 	Method is used to add series or list like objects with same length to the caller series

sub() 	Method is used to subtract series or list like objects with same length from the caller series

mul() 	Method is used to multiply series or list like objects with same length with the caller series

div() 	Method is used to divide series or list like objects with same length by the caller series

sum() 	Returns the sum of the values for the requested axis

prod() 	Returns the product of the values for the requested axis

mean() 	Returns the mean of the values for the requested axis

pow() 	Method is used to put each element of passed series as exponential power of caller series and returned the results

abs() 	Method is used to get the absolute numeric value of each element in Series/DataFrame

cov() 	Method is used to find covariance of two series

#### Pandas series method:

Function 	Description

Series() 	A pandas Series can be created with the Series() constructor method. This constructor method accepts a variety of inputs

combine_first() 	Method is used to combine two series into one

count() 	Returns number of non-NA/null observations in the Series

size() 	Returns the number of elements in the underlying data

name() 	Method allows to give a name to a Series object, i.e. to the column

is_unique() 	Method returns boolean if values in the object are unique

idxmax() 	Method to extract the index positions of the highest values in a Series

idxmin() 	Method to extract the index positions of the lowest values in a Series

sort_values() 	Method is called on a Series to sort the values in ascending or descending order

sort_index() 	Method is called on a pandas Series to sort it by the index instead of its values

head() 	Method is used to return a specified number of rows from the beginning of a Series. The method returns a brand new Series

tail() 	Method is used to return a specified number of rows from the end of a Series. The method returns a brand new Series

le() 	Used to compare every element of Caller series with passed series.It returns True for every element which is Less than or Equal to the element in passed series

ne() 	Used to compare every element of Caller series with passed series. It returns True for every element which is Not Equal to the element in passed series

ge() 	Used to compare every element of Caller series with passed series. It returns True for every element which is Greater than or Equal to the element in passed series

eq() 	Used to compare every element of Caller series with passed series. It returns True for every element which is Equal to the element in passed series

gt() 	Used to compare two series and return Boolean value for every respective element

lt() 	Used to compare two series and return Boolean value for every respective element

clip() 	Used to clip value below and above to passed Least and Max value

clip_lower() 	Used to clip values below a passed least value

clip_upper() 	Used to clip values above a passed maximum value

astype() 	Method is used to change data type of a series

tolist() 	Method is used to convert a series to list

get() 	Method is called on a Series to extract values from a Series. This is alternative syntax to the traditional bracket syntax

unique() 	Pandas unique() is used to see the unique values in a particular column

nunique() 	Pandas nunique() is used to get a count of unique values

value_counts() 	Method to count the number of the times each unique value occurs in a Series

factorize() 	Method helps to get the numeric representation of an array by identifying distinct values

map() 	Method to tie together the values from one object to another

between() 	Pandas between() method is used on series to check which values lie between first and second argument

apply() 	Method is called and feeded a Python function as an argument to use the function on every Series value. This method is helpful for executing custom operations that are not included in pandas or numpy