Python Pandas is defined as an open-source library that provides high-performance data manipulation in Python. This tutorial is designed for both beginners and professionals.

It is used for data analysis in Python and developed by Wes McKinney in 2008.

Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There are different tools are available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer Pandas because working with Pandas is fast, simple and more expressive than other tools.

It can perform five significant steps required for processing and analysis of data irrespective of the origin of the data, i.e., load, manipulate, prepare, model, and analyze.


Python Pandas Data Structure
The Pandas provides two data structures for processing the data, i.e., Series and DataFrame, which are discussed below:

1) Series

It is defined as a one-dimensional array that is capable of storing various data types. The row labels of series are called the index. We can easily convert the list, tuple, and dictionary into series using "series' method. A Series cannot contain multiple columns. It has one parameter:

Data: It can be any list, dictionary, or scalar value.

Creating Series from Array:

Before creating a Series, Firstly, we have to import the numpy module and then use array() function in the program.

In [5]:
import numpy as np
import pandas as pd
info= np.array(['P','A','N','D','A','S'])
a=pd.Series(info)

In [6]:
a

0    P
1    A
2    N
3    D
4    A
5    S
dtype: object

Python Pandas DataFrame
It is a widely used data structure of pandas and works with a two-dimensional array with labeled axes (rows and columns). DataFrame is defined as a standard way to store data and has two different indexes, i.e., row index and column index. It consists of the following properties:

The columns can be heterogeneous types like int, bool, and so on.
It can be seen as a dictionary of Series structure where both the rows and columns are indexed. It is denoted as "columns" in case of columns and "index" in case of rows.

In [8]:
import pandas as pd
x=['kajal','shushma','snehal']
output1= pd.DataFrame(x)
output1

Unnamed: 0,0
0,kajal
1,shushma
2,snehal


panda series is a one dimensional array that is it contain only one column and multiple row.The row labels of series are called the index.It has the following parameter:

data: It can be any list, dictionary, or scalar value.

index: The value of the index should be unique and hashable. It must be of the same length as data. If we do not pass any index, default np.arrange(n) will be used.

dtype: It refers to the data type of series.

copy: It is used for copying the data.
Creating a Series:


We can create a Series in two ways:

1.Create an empty Series
2.Create a Series using inputs.



In [9]:
# create an empty series
import pandas as pd
x=pd.Series()
x

  x=pd.Series()


In [11]:
# creating an Series using input
import pandas as pd
x=['kajal',25,'Navi mumbai']
df=pd.Series(x)
df

0          kajal
1             25
2    Navi mumbai
dtype: object

In [12]:
print(x[2])
#accessing data from DataFrame with index value

Navi mumbai


Series.index :Defines the index of the Series.

Series.shape :It returns a tuple of shape of the data.

Series.dtype :It returns the data type of the data.

Series.size :It returns the size of the data.

Series.empty :It returns True if Series object is empty, otherwise returns false.

Series.hasnans :It returns True if there are any NaN values, otherwise returns false.

Series.nbytes :It returns the number of bytes in the data.

Series.ndim :It returns the number of dimensions in the data.

Series.itemsize :It returns the size of the datatype of item.

In [22]:
import numpy as np   
import pandas as pd   
x=pd.Series(data=[2,4,6,8])   
y=pd.Series(data=[11.2,18.6,22.5], index=['a','b','c'])   
print(x.index)   
print(x.values)   
print(y.index)   
print(y.values)  
print(x.size)
print(x.dtype)
print(x.shape)
print(x.ndim)
print(x.nbytes)
print(x.hasnans)

RangeIndex(start=0, stop=4, step=1)
[2 4 6 8]
Index(['a', 'b', 'c'], dtype='object')
[11.2 18.6 22.5]
4
int64
(4,)
1
32
False


Series Functions
There are some functions used in Series which are as follows:

Functions
1. Pandas Series.map() : Map the values from two series that have a common column.

2. Pandas Series.std() : Calculate the standard deviation of the given set of numbers, DataFrame, column, and rows.

3. Pandas Series.to_frame() : Convert the series object to the dataframe.

4. Pandas Series.value_counts() : Returns a Series that contain counts of unique values.

Series

In [24]:
import pandas as pd
from pandas import Series

In [26]:
a=pd.Series([1,2,3,4,5])
print(a) # one dimensional array

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [29]:
a.values # to know the value inside the array

array([1, 2, 3, 4, 5])

In [30]:
a.index  # to know index of an array

RangeIndex(start=0, stop=5, step=1)

In [31]:
# To set our own index- by default index start from o
s1= pd.Series([1,3,4,5],index=['a','b','c','d'])
print(s1)

a    1
b    3
c    4
d    5
dtype: int64


slicing

In [32]:
s1['c']

4

In [33]:
s1[['d','a']] # shows more than one value.

d    5
a    1
dtype: int64

Boolean

In [35]:
'b' in s1

True

In [37]:
's' in s1

False

with dictionary

In [38]:
ds={'oil':300,'gas':100,'fan':600,'light':700}
k=pd.Series(ds)
print(k) # shows keys as index and value as value

oil      300
gas      100
fan      600
light    700
dtype: int64


Pandas DataFrame
Data structure which works with a two dimensional array with labeled axes(rows and column)
we can create data frames using following ways
1. dict 
2. Lists
3. Numpy ndarrays
4. Series

In [39]:
# create empty data frames
import pandas as pd
df= pd.DataFrame()
print(df)


Empty DataFrame
Columns: []
Index: []


In [40]:
# create DataFrame using lists
import pandas as pd
x=[1,2,3,4,'abc']
df=pd.DataFrame(x)
print(df)

     0
0    1
1    2
2    3
3    4
4  abc


In [41]:
# create dataframe using ndarrays/lists
import pandas as pd
info={'ID':[100,200,300],
      'Department':['Bsc','Bteh','Mtech']}
df=pd.DataFrame(info)
print(df)
#keys act as column name and values as values
                                       

    ID Department
0  100        Bsc
1  200       Bteh
2  300      Mtech


In [42]:
#create DataFrame from dict of series
import pandas as pd
info={'one':pd.Series([1,2,3],index=['a','b','c']),
      'two':pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])}
df=pd.DataFrame(info)
print(df)

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4
e  NaN    5


In [43]:
#column selection
print(df['one'])

a    1.0
b    2.0
c    3.0
d    NaN
e    NaN
Name: one, dtype: float64


In [45]:
#add new column by passing series
df['three']=pd.Series([20,40,60],index=['a','b','c'])
print(df)

   one  two  three
a  1.0    1   20.0
b  2.0    2   40.0
c  3.0    3   60.0
d  NaN    4    NaN
e  NaN    5    NaN


In [46]:
#add new column using existing DataFrame column
df['four']=df['one']+df['three']
print(df)

   one  two  three  four
a  1.0    1   20.0  21.0
b  2.0    2   40.0  42.0
c  3.0    3   60.0  63.0
d  NaN    4    NaN   NaN
e  NaN    5    NaN   NaN


In [47]:
#delete column using del function
del df['one']
print(df)

   two  three  four
a    1   20.0  21.0
b    2   40.0  42.0
c    3   60.0  63.0
d    4    NaN   NaN
e    5    NaN   NaN


In [48]:
#delete column using pop function
df.pop('two')
print(df)

   three  four
a   20.0  21.0
b   40.0  42.0
c   60.0  63.0
d    NaN   NaN
e    NaN   NaN


In [49]:
#select any row by passing the row label to a loc function
print(df.loc['b'])


three    40.0
four     42.0
Name: b, dtype: float64


In [50]:
#selection by integer location
print(df.iloc[3])


three   NaN
four    NaN
Name: d, dtype: float64


In [51]:
#slice rows
print(df[2:5])

   three  four
c   60.0  63.0
d    NaN   NaN
e    NaN   NaN


In [52]:
#addition of rows using append function
import pandas as pd
d=pd.DataFrame([[7,8],[9,10]],
               columns=['x','y'])
d2=pd.DataFrame([[11,12],[13,14]],
                columns=['x','y'])
d=d.append(d2)
print(d)

    x   y
0   7   8
1   9  10
0  11  12
1  13  14
