## Pandas Library in Python

#### Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [2]:
import pandas as pd
import numpy as np

In [3]:
data =[1,2,3,4,5]
series=pd.Series(data)
print("Series \n",series)

Series 
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [4]:
# create a series from dictionary
data={'a':1,'b':2,'c':3}
series_dict=pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [5]:
data=[1,2,3,4,5]
index=['a','b','c','d','e']
print(pd.Series(data,index=index))

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [6]:
##  Data Frames
## create a data frames from a dictionary of list
data = {
    'Name':['krish','aftab','tanweer'],
    'age':[25,22,24],
    'city':['newyork','araria','purnia']

}
df=pd.DataFrame(data)
print(df)
print(type(df))

      Name  age     city
0    krish   25  newyork
1    aftab   22   araria
2  tanweer   24   purnia
<class 'pandas.core.frame.DataFrame'>


In [12]:
df=pd.read_csv('example.csv')
df

Unnamed: 0,name,age
0,Krish,32
1,aftab,21
2,tanweer,24


In [16]:
## loc and iloc 
df.loc[1]

name    aftab
age        21
Name: 1, dtype: object

In [19]:
df.iloc[1]

name    aftab
age        21
Name: 1, dtype: object

In [21]:
df.at[1,'age']

21

In [22]:
df.at[2,'name']

'tanweer'

In [25]:
df.iat[1,1]

21

In [26]:
## Data manipulation with data frames
df

Unnamed: 0,name,age
0,Krish,32
1,aftab,21
2,tanweer,24


In [28]:
## adding cities in data
df['city']=['araria','purnia','katihar']
df

Unnamed: 0,name,age,city
0,Krish,32,araria
1,aftab,21,purnia
2,tanweer,24,katihar


In [29]:
df['city']

0     araria
1     purnia
2    katihar
Name: city, dtype: object

In [39]:
## remove a column
df.drop('age',axis=1)



KeyError: "['age'] not found in axis"

In [37]:
df['age'] ## this is becoz drop operation is not permanently for make it permanent make inplace=True

KeyError: 'age'

In [40]:
## now it is permanently deleted

In [None]:
## incrementing value of data
df['age']=df['age']+1

## Reading data from various data sources using pandas

In [1]:
import pandas as pd
from io import StringIO

In [2]:
Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'
df=pd.read_json(StringIO(Data))

In [3]:
df.head()

Unnamed: 0,employee_name,email,job_profile
0,James,james@gmail.com,"{'title1': 'Team Lead', 'title2': 'Sr. Develop..."


In [4]:
df.to_json()

'{"employee_name":{"0":"James"},"email":{"0":"james@gmail.com"},"job_profile":{"0":{"title1":"Team Lead","title2":"Sr. Developer"}}}'

In [6]:
df=pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',header=None)

In [7]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [8]:
df.to_csv("wine.csv")

## Reading a html file

In [9]:
url='https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/'
df=pd.read_html(url)

In [11]:
df[0]

Unnamed: 0,Bank NameBank,CityCity,StateSt,CertCert,Acquiring InstitutionAI,Closing DateClosing,FundFund
0,Republic First Bank dba Republic Bank,Philadelphia,PA,27332,"Fulton Bank, National Association","April 26, 2024",10546
1,Citizens Bank,Sac City,IA,8758,Iowa Trust & Savings Bank,"November 3, 2023",10545
2,Heartland Tri-State Bank,Elkhart,KS,25851,"Dream First Bank, N.A.","July 28, 2023",10544
3,First Republic Bank,San Francisco,CA,59017,"JPMorgan Chase Bank, N.A.","May 1, 2023",10543
4,Signature Bank,New York,NY,57053,"Flagstar Bank, N.A.","March 12, 2023",10540
...,...,...,...,...,...,...,...
564,"Superior Bank, FSB",Hinsdale,IL,32646,"Superior Federal, FSB","July 27, 2001",6004
565,Malta National Bank,Malta,OH,6629,North Valley Bank,"May 3, 2001",4648
566,First Alliance Bank & Trust Co.,Manchester,NH,34264,Southern New Hampshire Bank & Trust,"February 2, 2001",4647
567,National State Bank of Metropolis,Metropolis,IL,3815,Banterra Bank of Marion,"December 14, 2000",4646


In [20]:
df_xlsx=pd.read_excel('data.xlsx')


In [21]:
df_xlsx

Unnamed: 0,Name,Age
0,jack,23
1,ali,25
2,rahi,43
3,jyan,54
