# **Pandas**

### Stands for Panels and DataFrames
### Read Edit and Manipulate Data
### 3rd Party Library in Python





###Pandas handle:

* 1D --> Series
* 2D --> DataFrame
* 3D --> Panel

### **Data is divided in to 2 types**
### 1. **Structured Data:** Excel,CSV,DataBase etc...
### 2. **Unstructured Data:** Image,Audio,Video,Corpus(text) etc...


## **1. Install pandas library**

In [None]:
!pip install pandas



## **2. Import pandas**

In [6]:
import pandas as pd

## **3. Print vs Non-Print:**
* In pandas Library we need to directly print pandas series or DataFrame.
* Using print function produces index values

In [None]:
# Using print
S = pd.Series([1,2,3])
print(S)

0    1
1    2
2    3
dtype: int64


In [None]:
# Not using print
S = pd.Series([1,2,3])
S

Unnamed: 0,0
0,1
1,2
2,3


## **1D Series**

### A series is a 1 Dimensional array like object containing a sequence of values and an associated array of data labels, called its index

### **Various methods for creating a Series**

### **1. Creating a Series from List:**

In [None]:
import pandas as pd
data = [10,20,30,40,50]
series = pd.Series(data)  # S must be uppercase
series

Unnamed: 0,0
0,10
1,20
2,30
3,40
4,50


### **2. Creating a Series with Dictionary**

In [None]:
import pandas as pd
data = {'a':1,'b':2,'c':3}
series = pd.Series(data)
series

Unnamed: 0,0
a,1
b,2
c,3


### **3. Creating a Series with custom index**

In [None]:
values = [100,200,300]
idx = ['A','B','C']
series = pd.Series(values, index = idx)
series

Unnamed: 0,0
A,100
B,200
C,300


### **4. Creating a Series with Scalar value**

In [None]:
scalar = 5
idy = ['P','Q','R']
series = pd.Series(scalar, index = idy)
series

Unnamed: 0,0
P,5
Q,5
R,5


### **5. Creating a Series with different data types**

In [None]:
data = [1,2.2,'cat',False]
series = pd.Series(data)
series

Unnamed: 0,0
0,1
1,2.2
2,cat
3,False


### **6. Creating a Series with Date-Time index**

In [None]:
dates = pd.date_range('2024-08-01',periods = 5)
data = [100,200,300,400,500]
series = pd.Series(data,index = dates)   # length of data values should match with no. of periods
series

Unnamed: 0,0
2024-08-01,100
2024-08-02,200
2024-08-03,300
2024-08-04,400
2024-08-05,500


### **7. Creating a Series with Custom data type**

In [None]:
data = [complex(1,2),complex(3,4),complex(5,6)]
series = pd.Series(data)
series

Unnamed: 0,0
0,1.0+2.0j
1,3.0+4.0j
2,5.0+6.0j


## **2D DataFrame**


### **Various methods for creating a DataFrame**

### **1. Creating a DataFrame from Dictionary**

In [None]:
import pandas as pd
data = {
         'Name': ['Bhargav','Praveen','Sai Kiran'],
         'Age' : [22,21,22],
         'City' : ['Hyd','Hyd','Vizag']
       }

df = pd.DataFrame(data)    # D and F should be Uppercase
df

Unnamed: 0,Name,Age,City
0,Bhargav,22,Hyd
1,Praveen,21,Hyd
2,Sai Kiran,22,Vizag


### **2. Creating a DataFrame from List of Lists**

In [None]:
lst = [
        ['ajay','22','NYC'],
        ['bhadra','32','Seoul'],
        ['charan','42','Hyderabad']
      ]
col = ['Name','Age','City']
df = pd.DataFrame(lst, columns = col)
df

Unnamed: 0,Name,Age,City
0,ajay,22,NYC
1,bhadra,32,Seoul
2,charan,42,Hyderabad


### **3. Creating a DataFrame from a Numpy Array**

In [16]:
import pandas as pd
import numpy as np
data = np.array([
                 ['a1',1,'A run'],
                 ['b2',2,'B run'],
                 ['c3',3,'C run']
                ])

run = ['id','age','run type']
df = pd.DataFrame(data, columns = run)
df

Unnamed: 0,id,age,run type
i1,a1,1,A run
i2,b2,2,B run
i3,c3,3,C run


In [17]:
import pandas as pd
import numpy as np
data = np.array([
                 ['a1',1,'A run'],
                 ['b2',2,'B run'],
                 ['c3',3,'C run']
                ])

run = ['id','age','run type']
idx = ['i1','i2','i3']
df = pd.DataFrame(data, index = idx, columns = run)
df

Unnamed: 0,id,age,run type
i1,a1,1,A run
i2,b2,2,B run
i3,c3,3,C run


### **4. Creating a DataFrame from Multiple Series**

In [9]:
s1 = pd.Series([1,2,3])
s2 = pd.Series([4,5,6])
s3 = pd.Series([7,8,9])

df = pd.concat([s1,s2,s3],axis = 1)
df

Unnamed: 0,0,1,2
0,1,4,7
1,2,5,8
2,3,6,9


### **5. Creating a DataFrame with default index**

In [None]:
data = {
           'Name' : ['Jai','Mahesh','Bob'],
           'Age'  : [9,8,24]
       }

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Jai,9
1,Mahesh,8
2,Bob,24


## **6. Creating an Empty DataFrame**

In [None]:
df = pd.DataFrame()
df

In [None]:
df = pd.DataFrame(columns = ['Name','Age'])
df

Unnamed: 0,Name,Age


## **7. Creating a DataFrame by concatinating Empty DF and Filled DF**

In [15]:
df1 = pd.DataFrame(columns = ['Name','Age'])
df2 = pd.DataFrame(
                    [{'Name':'BSB','Age':22},
                     {'Name':'CSC','Age':21}]
                  )
df = pd.concat([df1,df2])
df


Unnamed: 0,Name,Age
0,BSB,22
1,CSC,21


## **8. Creating a DataFrame with Multi Index**

In [None]:
index = pd.MultiIndex.from_tuples(
                                  [
                                   ('A','First'),
                                   ('A','Second'),
                                   ('B','First'),
                                   ('B','Second'),
                                  ],
                                  name = ['Category','Subcategory']
                                 )

data = {'value': [1,2,3,4]}
df = pd.DataFrame(data,index = index)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,value
Category,Subcategory,Unnamed: 2_level_1
A,First,1
A,Second,2
B,First,3
B,Second,4


## **9. Creating a DataFrame with Data-Time Index**

In [None]:
dates = pd.date_range('2024-08-01',periods = 3)
data = {'value': [10,20,30],'cost':[100,200,300]}

df = pd.DataFrame(data,index = dates)
df

Unnamed: 0,value,cost
2024-08-01,10,100
2024-08-02,20,200
2024-08-03,30,300


## **10. Creating a DataFrame from Dictionary of Series**

In [7]:
s1 = pd.Series([608,610])
s2 = pd.Series(['BSB','CSC'])
s3 = pd.Series([22,21])

data = {'ID':s1,'Name':s2,'Age':s3}
df = pd.DataFrame(data)
df

Unnamed: 0,ID,Name,Age
0,608,BSB,22
1,610,CSC,21


## **11. Creating a DataFrame from Random data**

In [21]:
import numpy as np
data = np.random.rand(3,4)
col = ['a','b','c','d']

df = pd.DataFrame(data,columns = col)
df

Unnamed: 0,a,b,c,d
0,0.935491,0.348684,0.077313,0.197936
1,0.144819,0.736977,0.536088,0.173722
2,0.229187,0.734953,0.916312,0.07086


## **12. Creating a DataFrame with specified data type**

In [25]:
data = {
        'num': [1,2,3],
        'flt': [0.5,0.6,0.7],
        'strng': ['a','b','c']
       }

df = pd.DataFrame(data)

df = df.astype({
                 'num':int,
                 'flt':float,
                 'strng': str
})

print(df.dtypes)
df

num        int64
flt      float64
strng     object
dtype: object


Unnamed: 0,num,flt,strng
0,1,0.5,a
1,2,0.6,b
2,3,0.7,c


## 13. Creating a DataFrame from a SQL Query

In [28]:
import pandas as pd
import sqlite3

con = sqlite3.connect(':memory:')

con.execute('''
            Create table people(
             ID integer primary key,
             Name text,
             Age integer
            )
            '''
           )

con.execute('''
            Insert into people(name,age)
            values('ramu',21),('remo',22),('aparichitudu',23)
            '''
           )

query = 'select * from people'

df = pd.read_sql_query(query,con)
df

Unnamed: 0,ID,Name,Age
0,1,ramu,21
1,2,remo,22
2,3,aparichitudu,23
