# Pandas-DataFrame And Series
Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [2]:
import pandas as pd

#### Series
- A Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a table.

In [5]:
# Create a series from List
data = [1,2,3,4,5]
series = pd.Series(data)
print(series)
print(type(series))

0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [7]:
# Create a Series from Dictionary
data = {'a': 1, 'b': 2, 'c': 3}
series2 = pd.Series(data)
print(series2)
print(type(series2))

a    1
b    2
c    3
dtype: int64
<class 'pandas.core.series.Series'>


In [8]:
data = [10,20,30]
index = ['a','b','c']
series3 = pd.Series(data, index=index)
print(series3)

a    10
b    20
c    30
dtype: int64


#### Dataframe

In [18]:
# Create a Dataframe from dictionary of list
data = {
    'Name': ['Jeet', 'Yash', 'John'],
    'Age': [30, 33, 38],
    'City': ['Surat', 'Mumbai', 'Pune']
}
df = pd.DataFrame(data)
print(df)
print(type(df))

   Name  Age    City
0  Jeet   30   Surat
1  Yash   33  Mumbai
2  John   38    Pune
<class 'pandas.core.frame.DataFrame'>


In [20]:
# Create a Dataframe from list of dictionaries

data = [
    {'Name': 'Jeet', 'Age': 30, 'City': 'Surat'},
    {'Name': 'Yash', 'Age': 33, 'City': 'Mumbai'},
    {'Name': 'John', 'Age': 38, 'City': 'Pune'}
]

df2 = pd.DataFrame(data)
print(df2)
print(type(df2))

   Name  Age    City
0  Jeet   30   Surat
1  Yash   33  Mumbai
2  John   38    Pune
<class 'pandas.core.frame.DataFrame'>


In [21]:
df = pd.read_csv('sales_data.csv')
df.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [23]:
df.tail()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [22]:
df.describe()

Unnamed: 0,Transaction ID,Units Sold,Unit Price,Total Revenue
count,240.0,240.0,240.0,240.0
mean,10120.5,2.158333,236.395583,335.699375
std,69.42622,1.322454,429.446695,485.804469
min,10001.0,1.0,6.5,6.5
25%,10060.75,1.0,29.5,62.965
50%,10120.5,2.0,89.99,179.97
75%,10180.25,3.0,249.99,399.225
max,10240.0,10.0,3899.99,3899.99


In [60]:
# Display the data types of each column
print("Data types:\n", df.dtypes)

Data types:
 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object


In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Transaction ID    240 non-null    int64  
 1   Date              240 non-null    object 
 2   Product Category  240 non-null    object 
 3   Product Name      240 non-null    object 
 4   Units Sold        240 non-null    int64  
 5   Unit Price        240 non-null    float64
 6   Total Revenue     240 non-null    float64
 7   Region            240 non-null    object 
 8   Payment Method    240 non-null    object 
dtypes: float64(2), int64(2), object(5)
memory usage: 17.0+ KB


In [27]:
data = {
    'Name': ['Jeet', 'Yash', 'John'],
    'Age': [30, 33, 38],
    'City': ['Surat', 'Mumbai', 'Pune']
}
df3 = pd.DataFrame(data)
df3

Unnamed: 0,Name,Age,City
0,Jeet,30,Surat
1,Yash,33,Mumbai
2,John,38,Pune


#### Notes :
- Series is like a Single Column.
- Dataframe is multiple Rows and Columns. Or Each Column in Dataframe is Series.

In [31]:
df3['Name']

0    Jeet
1    Yash
2    John
Name: Name, dtype: object

In [38]:
# if we want specific row we can use RowIndex with 'loc' method. 
df3.loc[0]

Name     Jeet
Age        30
City    Surat
Name: 0, dtype: object

In [41]:
df3.loc[0][0]

  df3.loc[0][0]


'Jeet'

In [40]:
# if we want specific column we can use ColumIndex with 'iloc' method.
df3.iloc[0]

Name     Jeet
Age        30
City    Surat
Name: 0, dtype: object

- Accesing a specified element

In [42]:
df3

Unnamed: 0,Name,Age,City
0,Jeet,30,Surat
1,Yash,33,Mumbai
2,John,38,Pune


In [43]:
df3.at[2,'City']

'Pune'

- Accessing specified element using iat[ ]

In [45]:
df3.iat[0,2]                # iat[rowIndex, columnIndex]

'Surat'

- Data manipulation with Dataframe

In [46]:
df3

Unnamed: 0,Name,Age,City
0,Jeet,30,Surat
1,Yash,33,Mumbai
2,John,38,Pune


In [47]:
# Adding a Column
df3['Salary'] = [90000, 70000, 60000]
df3

Unnamed: 0,Name,Age,City,Salary
0,Jeet,30,Surat,90000
1,Yash,33,Mumbai,70000
2,John,38,Pune,60000


In [53]:
# Remove a Column
df3.drop('Salary', axis=1, inplace=True)       # 'inplace = True' will parmenantly delete from dataframe

In [54]:
df3

Unnamed: 0,Name,Age,City
0,Jeet,30,Surat
1,Yash,33,Mumbai
2,John,38,Pune


In [55]:
# Add Age to the column
df3['Age'] = df3['Age'] + 1

In [56]:
df3

Unnamed: 0,Name,Age,City
0,Jeet,31,Surat
1,Yash,34,Mumbai
2,John,39,Pune


In [58]:
df3.drop(0, inplace= True)

In [59]:
df3

Unnamed: 0,Name,Age,City
1,Yash,34,Mumbai
2,John,39,Pune
