# Pandas

Pandas is an open source library in python which is used to provide high performance, easy to use data structures and data analysis tools.

It can be installed easily by **pip**-  **pip3 install pandas**

Also on **Anaconda**- **conda install pandas**


** Introduction**

Pandas basically deals with the three data structures:
    
1. Series (1 Dimensional)
2. DataFrame(2 Dimensional)
3. Panel(3 Dimensional)



## Imports

In [15]:
import pandas as pd
import numpy as np

## Series

Series takes 4 arguments in the parameter. They are data, index, dtype, copy.

In [16]:
s = pd.Series()

print(s)

Series([], dtype: float64)


In [17]:
data = np.array(['a','b','c','d'])  # A simple numpy array

s = pd.Series(data)
print(s)

0    a
1    b
2    c
3    d
dtype: object


In [45]:
#We did not pass any index so by default it takes the index from 0-3.

Now let us pass the index in it.

In [18]:
data = np.array(['a','b','c','d'])  # A simple numpy array
index = [10,20,30,40]

s = pd.Series(data, index)  #or index=[10,20,30,40] can be passed in the parameters too
print(s)

10    a
20    b
30    c
40    d
dtype: object


Also to instead of creating two different arrays and passing them we can use Python **Dictionaries** instead.

In [19]:
data = {'Stuart': 0, 'Jerry': '1', 'Ratatouille': 2}

a = pd.Series(data)
print(a)

Jerry          1
Ratatouille    2
Stuart         0
dtype: object


In [46]:
#Here keys are used as indexes.

If a scalar is used as data in it so the indexes should be provided. The value will be repeated.

In [20]:
a = pd.Series('Hi', index = [10, 20, 30, 40])

print(a)

10    Hi
20    Hi
30    Hi
40    Hi
dtype: object


In [21]:
data = [10,20,30,40]
index = ['a','b','c','d'] 

s = pd.Series(data, index) 

print(s[0])      # Accessing by element wise. ie the first element as 0 and so on

10


In [22]:
data = [10,20,30,40]
index = ['a','b','c','d'] 

s = pd.Series(data, index) 

print(s['a'])   # Accessing by index. ie the first element has an index as 'a'

10


In [23]:
print(s[['a', 'c', 'd']]) # Accessing multiple elements. 

a    10
c    30
d    40
dtype: int64


## DataFrame

DataFrame takes 5 arguments in the parameter. They are data, index, columns,  dtype, copy.

A Pandas DataFrame can be created by: Lists, Dictionaries, Series, Numpy Arrays, Another Dataframe.

In [24]:
df = pd.DataFrame()

print(df)

Empty DataFrame
Columns: []
Index: []


In [25]:
data = ['a', 'b', 'c', 'd']

df = pd.DataFrame(data)
print(df)

   0
0  a
1  b
2  c
3  d


We did not pass any index so by default it takes the index from 0-3.
Now let us pass the index in it. Also it took the column heading as 0 by default

In [26]:
data = [['Stuart', 5], ['Jerry', 7], ['Rataouille', 9]]

df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)

         Name  Age
0      Stuart    5
1       Jerry    7
2  Rataouille    9


So we made the columns with the specific heading that we wanted.

In [27]:
data = [['Stuart', 5], ['Jerry', 7], ['Rataouille', 9]]

df = pd.DataFrame(data, columns=['Name', 'Age'], dtype=float)
print(df)

         Name  Age
0      Stuart  5.0
1       Jerry  7.0
2  Rataouille  9.0


Also to instead of creating two different arrays and passing them we can use Python **Dictionaries** instead.

Also using **Indexes** in it

In [32]:
data = {'Age':[10, 12, 7], 'Name':['Stuart', 'Jerry', 'Rataouille']}

df = pd.DataFrame(data, index = ['a', 'b', 'c'])
print(df)

   Age        Name
a   10      Stuart
b   12       Jerry
c    7  Rataouille


Another way of making a DataFrame is by using list of dictionaries

In [33]:
data = [{'x':10, 'y':20}, {'x': 200, 'y': 350, 'z': 400}]

df = pd.DataFrame(data)
print(df)

     x    y      z
0   10   20    NaN
1  200  350  400.0


An important thing to notice here is whenever Pandas dont find data in an area it returns it as Nan(Not a Number)

We can also make Dataframe by using different Series.

In [37]:
data = {'First': pd.Series([1,2,3], index = ['a', 'b', 'c']),
       'Second': pd.Series([1,2,3,4], index = ['a','b','c', 'd'])}

df = pd.DataFrame(data)
print(df)

   First  Second
a    1.0       1
b    2.0       2
c    3.0       3
d    NaN       4


### Accessing different elements and operations can be done in DataFrame too.

Column Selection

In [39]:
data = [['Stuart', 5], ['Jerry', 7], ['Rataouille', 9]]

df = pd.DataFrame(data, columns=['Name', 'Age'], dtype=float)

print(df['Name'])

0        Stuart
1         Jerry
2    Rataouille
Name: Name, dtype: object


Adding Another Column in the DataFrame

In [41]:
data = [['Stuart', 5], ['Jerry', 7], ['Rataouille', 9]]

df = pd.DataFrame(data, columns=['Name', 'Age'], dtype=float)
df['Size'] = ['Small', 'Small', 'Small']

print(df)

         Name  Age   Size
0      Stuart  5.0  Small
1       Jerry  7.0  Small
2  Rataouille  9.0  Small


In [43]:
# If the DataType of two or more columns are same, then they can be added together.

Deleting a column

In [53]:
data = [['Stuart', 5, 'Small'], ['Jerry', 7, 'Small'], ['Rataouille', 9, 'Small']]

df = pd.DataFrame(data, columns=['Name', 'Age', 'Size'], dtype=float)

print(df)
print(" ")

#There are two ways to do it:

del df['Name']
print(df)

print(" ")

df.pop('Size')
print(df)

         Name  Age   Size
0      Stuart  5.0  Small
1       Jerry  7.0  Small
2  Rataouille  9.0  Small
 
   Age   Size
0  5.0  Small
1  7.0  Small
2  9.0  Small
 
   Age
0  5.0
1  7.0
2  9.0


A row can be selected by three ways: 'loc', 'iloc', 'slicing'

In [74]:
data = [['Stuart', 5, 'Small'], ['Jerry', 7, 'Small'], ['Rataouille', 9, 'Small']]

df = pd.DataFrame(data, columns=['Name', 'Age', 'Size'],index=['a', 'b', 'c'])

print(df)
print(" ")

#using loc
print(df.loc['a'])
print(" ")

#using iloc
print(df.iloc[1])
print(" ")

#using slicing
print(df[1:4])

         Name  Age   Size
a      Stuart    5  Small
b       Jerry    7  Small
c  Rataouille    9  Small
 
Name    Stuart
Age          5
Size     Small
Name: a, dtype: object
 
Name    Jerry
Age         7
Size    Small
Name: b, dtype: object
 
         Name  Age   Size
b       Jerry    7  Small
c  Rataouille    9  Small


In [75]:
# Adding rows
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print(df)
print(" ")

#Deleting Rows
df = df.drop(0)
print(df)



   a  b
0  1  2
1  3  4
0  5  6
1  7  8
 
   a  b
1  3  4
1  7  8
