## Pandas- DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, While a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [1]:
import pandas as pd

In [3]:
## Series 
## A Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a table.
data=[1,2,3,4,5]
series=pd.Series(data)
print("Series:\n",series)
print(type(series))

Series:
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [4]:
## Create a Series from dictionay
data={'a':1,'b':2,'c':3}
series_dict=pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [5]:
data=[10,20,30]
index=['a','b','c']
pd.Series(data,index=index)

a    10
b    20
c    30
dtype: int64

In [6]:
## DataFrame
## create a Dataframe from a dictionary of list
data={
    'Name':['krish','John','sudeep'],
    'Age':[25,30,45],
    'City':['Bangalore','New York','Florida']
}
df=pd.DataFrame(data)
print(df)
print(type(df))

     Name  Age       City
0   krish   25  Bangalore
1    John   30   New York
2  sudeep   45    Florida
<class 'pandas.core.frame.DataFrame'>


In [7]:
import numpy as np 
np.array(df)

array([['krish', 25, 'Bangalore'],
       ['John', 30, 'New York'],
       ['sudeep', 45, 'Florida']], dtype=object)

In [9]:
## create a DataFrame from a list of Dictionaries 
data=[
    {'Name':'Krish','Age':32,'City':'Bengalore'},
    {'Name':"Sudeep",'Age':25,"City":'Florida'},
    {'Name':'John','Age':45,'City':'New York'}
]
df=pd.DataFrame(data)
print(df)
print(type(df))

     Name  Age       City
0   Krish   32  Bengalore
1  Sudeep   25    Florida
2    John   45   New York
<class 'pandas.core.frame.DataFrame'>


In [10]:
df=pd.read_csv('sales_data.csv')
df.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [11]:
df.tail(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [14]:
### Accessing Data from DataFrame 
data={
    'Name':['krish','John','sudeep'],
    'Age':[25,30,45],
    'City':['Bangalore','New York','Florida']
}
data
df=pd.DataFrame(data)

In [15]:
df

Unnamed: 0,Name,Age,City
0,krish,25,Bangalore
1,John,30,New York
2,sudeep,45,Florida


In [17]:
type(df['Name'])

pandas.core.series.Series

In [24]:
df.loc[0:]

Unnamed: 0,Name,Age,City
0,krish,25,Bangalore
1,John,30,New York
2,sudeep,45,Florida


In [23]:
df.iloc[0][2]

  df.iloc[0][2]


'Bangalore'

In [29]:
## Accessing a specified element 
df.at[2,'Name']

'sudeep'

In [30]:
## Accessing a specified  element using iat
df.iat[2,2]

'Florida'

In [31]:
df

Unnamed: 0,Name,Age,City
0,krish,25,Bangalore
1,John,30,New York
2,sudeep,45,Florida


In [32]:
### Data Manipulation with Dataframe
df

Unnamed: 0,Name,Age,City
0,krish,25,Bangalore
1,John,30,New York
2,sudeep,45,Florida


In [35]:
## Adding a column
df['Salary']=[50000,60000,70000]

In [36]:
df

Unnamed: 0,Name,Age,City,Salary
0,krish,25,Bangalore,50000
1,John,30,New York,60000
2,sudeep,45,Florida,70000


In [37]:
## Remove a column 
df.drop('Salary',axis=1)

Unnamed: 0,Name,Age,City
0,krish,25,Bangalore
1,John,30,New York
2,sudeep,45,Florida


In [38]:
df

Unnamed: 0,Name,Age,City,Salary
0,krish,25,Bangalore,50000
1,John,30,New York,60000
2,sudeep,45,Florida,70000


In [39]:
df.drop('Salary',axis=1,inplace=True)

In [40]:
df

Unnamed: 0,Name,Age,City
0,krish,25,Bangalore
1,John,30,New York
2,sudeep,45,Florida


In [41]:
## Add age to the column 

df['Age']=df['Age']+1

In [42]:
df

Unnamed: 0,Name,Age,City
0,krish,26,Bangalore
1,John,31,New York
2,sudeep,46,Florida


In [43]:
df.drop(0)

Unnamed: 0,Name,Age,City
1,John,31,New York
2,sudeep,46,Florida


In [44]:
df 

Unnamed: 0,Name,Age,City
0,krish,26,Bangalore
1,John,31,New York
2,sudeep,46,Florida


In [45]:
df.drop(0,inplace=True)
df

Unnamed: 0,Name,Age,City
1,John,31,New York
2,sudeep,46,Florida


In [47]:
# Display the data types of each column 
print("Data types:\n",df.dtypes)

# Describe the DataFrame 
print('Statistical summary:\n',df.describe())



Data types:
 Name    object
Age      int64
City    object
dtype: object
Statistical summary:
              Age
count   2.000000
mean   38.500000
std    10.606602
min    31.000000
25%    34.750000
50%    38.500000
75%    42.250000
max    46.000000


In [48]:
df.describe()

Unnamed: 0,Age
count,2.0
mean,38.5
std,10.606602
min,31.0
25%,34.75
50%,38.5
75%,42.25
max,46.0
