# Pandas: DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [2]:
%pip install pandas 

Collecting pandas
  Downloading pandas-2.2.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (89 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Using cached tzdata-2024.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.3-cp310-cp310-macosx_11_0_arm64.whl (11.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.3/11.3 MB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hUsing cached pytz-2024.2-py2.py3-none-any.whl (508 kB)
Using cached tzdata-2024.2-py2.py3-none-any.whl (346 kB)
Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.2.3 pytz-2024.2 tzdata-2024.2
Note: you may need to restart the kernel to use updated packages.


### Series
- A Pandas Series is a one-dimensional array-like object that can hold any data type. 
- It is similar to a column in a table.


In [None]:
import pandas as pd # type: ignore

data = [1,2,3,4,5]
series = pd.Series(data)
print("Series \n",series)
print(type(series))

Series 
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


##### Create a Series from Dictionary

In [None]:
data={'a':1,'b':2,'c':3} # keys are index, values are data
series_dict=pd.Series(data) # Series converts dictionary to series
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [6]:
data=[10,20,30]
index=['a','b','c']
pd.Series(data,index=index)

a    10
b    20
c    30
dtype: int64

### Dataframe
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [45]:
# Create a Dataframe from a dictionary of list
data={
    'Name':['Krish','John','Jack'],
    'Age':[25,30,45],
    'City':['Bangalore','New York','Boston']
}
df=pd.DataFrame(data)
print(df)
print(type(df))

    Name  Age       City
0  Krish   25  Bangalore
1   John   30   New York
2   Jack   45     Boston
<class 'pandas.core.frame.DataFrame'>


In [None]:
# Create a Data frame from a List of Dictionaries

data=[
    {'Name':'Krish','Age':32,'City':'Bangalore'},
    {'Name':'John','Age':34,'City':'Bangalore'},
    {'Name':'Harry','Age':32,'City':'Bangalore'},
    {'Name':'Jack','Age':32,'City':'Bangalore'}
    
]
df=pd.DataFrame(data) # Dataframe converts list of dictionaries into a table
print(df)
print(type(df))

    Name  Age       City
0  Krish   32  Bangalore
1   John   34  Bangalore
2  Harry   32  Bangalore
3   Jack   32  Bangalore
<class 'pandas.core.frame.DataFrame'>


In [None]:
df=pd.read_csv('sales_data.csv')
df.head(5) # Display first 5 rows of the dataframe

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [None]:
df.tail(5) # Display last 5 rows of the dataframe

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


##### Accessing Data From Dataframe

In [None]:
df 

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Boston


In [47]:
df['Name'] # Access a column

0    Krish
1     John
2     Jack
Name: Name, dtype: object

In [None]:
df.loc[0] # Access a row

Name        Krish
Age            25
City    Bangalore
Name: 0, dtype: object

In [26]:
df.iloc[0] [2] # Access a column 

  df.iloc[0] [2] # Access a column


'Bangalore'

##### Accessing a specified element

In [48]:
df.at[0,'Name'] # Access a cell

'Krish'

##### Accessing a specified element using iat

In [None]:
df.iat[2,2] # 0: Name, 1: Age, 2: City

'Boston'

In [28]:
df

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Boston


### Data Manipulation with Dataframe

In [36]:
# Adding a column
df['Salary']=[50000,60000,70000]
df

Unnamed: 0,Name,Age,City,Salary
0,Krish,25,Bangalore,50000
1,John,30,New York,60000
2,Jack,45,Boston,70000


In [None]:
# Remove a column
df.drop('Salary',axis=1,inplace=True) 

# axis=0 refers to rows
# axis=1 refers to columns. 
# inplace=True will modify the original dataframe.
# inplace=False will return a new DataFrame with the specified column removed

In [34]:
df

Unnamed: 0,Name,Age,City
0,Krish,25,Bangalore
1,John,30,New York
2,Jack,45,Boston


In [38]:
# Add age to the column
df['Age']=df['Age']+1
df

Unnamed: 0,Name,Age,City
0,Krish,26,Bangalore
1,John,31,New York
2,Jack,46,Boston


In [None]:
# Remove a row
df.drop(0,inplace=True)

In [40]:
df

Unnamed: 0,Name,Age,City
1,John,31,New York
2,Jack,46,Boston


In [41]:
df=pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [42]:
# Display the data types of each column
print("Data types:\n", df.dtypes)

# Describe the DataFrame
print("Statistical summary:\n", df.describe())

Data types:
 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object
Statistical summary:
        Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000


In [43]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99
