# DataFrame Methods
---

#### In this notebook we have:
* DataFrame Methods:
    1. [head()](#1.-head())
    2. [tail()](#2-tail)
    3. [info()](#3.-info())
    4. [count()](#4-count)
    5. [describe()](#5-describe)
    6. [nunique()](#6-nunique)
* [Accessing columns from DataFrame (single col : Series)](#accessing-single-column-from-the-dataframe)
* [Rearranging the col of DataFrame](#Rearranging-columns-in-DataFrame)
* [Renaming Columns](#Renaming-Columns)
    1. using rename(p) method for single col change
    2. using rename(p) method for multiple col change
    3. using columns attribute
* Renaming Index
    1. using rename(p) method
    2. using index attribute
* Bonus!!!
---


## Brief Intro:
* DataFrame being a class it has predefined methods in it.
* These methods can only be accessed through DataFrame object.
* Methods perform operations on DataFrame and returns result.

---
## 1. head()
* Returns the first 5 records of the dataset by default.
* If we want to preview certain records of the DataFrame from start, then we can add an argument to this head() method.

In [2]:
import pandas as pd

path = "../datasets/sales1.csv"
df = pd.read_csv(path)

df.head()

Unnamed: 0,Order ID,Customer Name,Product,Quantity
0,166837,Veeru,34in Ultrawide Monitor,2
1,166838,Tarun,Samsung m10,3
2,166839,Kedar,20in Monitor,1
3,166840,Lavanya,iPhone 11,3
4,166841,Venu,Macbook Pro Laptop,2


In [2]:
df.head(10) # Returns first 10 records of the DataFrame

Unnamed: 0,Order ID,Customer Name,Product,Quantity
0,166837,Veeru,34in Ultrawide Monitor,2
1,166838,Tarun,Samsung m10,3
2,166839,Kedar,20in Monitor,1
3,166840,Lavanya,iPhone 11,3
4,166841,Venu,Macbook Pro Laptop,2
5,166842,Venu,Bose SoundSport Headphones,2
6,166843,Harsha,27in 4K Gaming Monitor,2
7,166844,Harsha,Samsung m10,1
8,166845,Siddhu,34in Ultrawide Monitor,2
9,166846,Siddhu,iPhone 11,1


---
## 2. tail()
* Similar to the head() method but returns the last 5 rows as default

In [3]:
df.tail()

Unnamed: 0,Order ID,Customer Name,Product,Quantity
595,167403,Balaji,Macbook Pro Laptop,1
596,167404,Lavanya,ThinkPad Laptop,1
597,167405,Venu,Flatscreen TV,1
598,167406,Siddhu,Samsung m20,2
599,167407,Tarun,LG Washing Machine,1


In [4]:
df.tail(10)

Unnamed: 0,Order ID,Customer Name,Product,Quantity
590,167398,Mallikarjun,iPhone 7s,1
591,167399,Siddhu,Samsung m10,1
592,167400,Chaithanya,27in 4K Gaming Monitor,1
593,167401,Mallikarjun,iPhone 11,1
594,167402,Kedar,Samsung m10,1
595,167403,Balaji,Macbook Pro Laptop,1
596,167404,Lavanya,ThinkPad Laptop,1
597,167405,Venu,Flatscreen TV,1
598,167406,Siddhu,Samsung m20,2
599,167407,Tarun,LG Washing Machine,1


---
## 3. info()
This method returns below information about DataFrame,
* Type of object
* Range of object
* Number of columns
* Number of rows
* The data type of each column along with the non-null count
* Number of data types
* Total memory usage

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Order ID       600 non-null    int64 
 1   Customer Name  600 non-null    object
 2   Product        600 non-null    object
 3   Quantity       600 non-null    int64 
dtypes: int64(2), object(2)
memory usage: 18.9+ KB


---
## 4. count()
* Returns the number of non-null values of each column

In [6]:
df.count()

Order ID         600
Customer Name    600
Product          600
Quantity         600
dtype: int64

---
## 5. describe()
* Returns the below values for only numerical datatypes:
    * count
    * mean 
    * std
    * min
    * 25%
    * 50%
    * 75%
    * max

In [7]:
df.describe()

Unnamed: 0,Order ID,Quantity
count,600.0,600.0
mean,167122.751667,1.481667
std,164.948568,0.683454
min,166837.0,1.0
25%,166980.75,1.0
50%,167120.5,1.0
75%,167266.25,2.0
max,167409.0,3.0


---
## 6. nunique() 
* Returns the number of unique values of each column of the DataFrame.

In [8]:
df.nunique()

Order ID         573
Customer Name     23
Product           21
Quantity           3
dtype: int64

In [9]:
print(df.nunique().iloc[0]) 

573


In [10]:
print(df.nunique()['Customer Name'])

23


---
## Accessing single column from the DataFrame
* If we try to access a single column out of the DataFrame then that returns a Series.
* If we access 2 or more columns out of the DataFrame , then that returns a DataFrame

In [11]:
# Accessing a single columns

print(type(df.Quantity))
print()
print(type(df['Quantity']))


<class 'pandas.core.series.Series'>

<class 'pandas.core.series.Series'>


In [12]:
# Accessing 2 or more columns

print(df[['Order ID', 'Quantity']])

     Order ID  Quantity
0      166837         2
1      166838         3
2      166839         1
3      166840         3
4      166841         2
..        ...       ...
595    167403         1
596    167404         1
597    167405         1
598    167406         2
599    167407         1

[600 rows x 2 columns]


---
## Rearranging columns in DataFrame

In [13]:
print(df.columns)
df = df[["Product", "Customer Name", "Quantity", "Order ID"]]
print()
print(df.columns)

Index(['Order ID', 'Customer Name', 'Product', 'Quantity'], dtype='object')

Index(['Product', 'Customer Name', 'Quantity', 'Order ID'], dtype='object')


---


## Renaming Columns

### rename(p) method - for changing single col name

In [3]:
df = pd.read_csv("../datasets/sales3.csv")

In [4]:
df.head()

Unnamed: 0,ord id,cust name,cust id,prod name,prod cost
0,192837,Veeru,3,LG Mobile,65999
1,192838,Neelima,19,Apple iPad 10.2-inch,63000
2,192839,Balaji,12,34in Ultrawide Monitor,75999
3,192840,Shahid,20,iPhone 11,60000
4,192841,Vinay,10,Bose SoundSport Headphones,69999


In [5]:
df.columns

Index(['ord id', 'cust name', 'cust id', 'prod name', 'prod cost'], dtype='object')

In [6]:
d = {
 "ord id": "Order Id"
}
df1 = df.rename(columns = d)

In [8]:
df1.columns

Index(['Order Id', 'cust name', 'cust id', 'prod name', 'prod cost'], dtype='object')

here the 'ord id' has changed to 'Order Id'

---

### rename(p) method - Changing multiple column names

In [9]:
d = {
 'ord id': 'Order Id', 
 'cust name': 'Customer Name', 
 'cust id': 'Customer Id', 
 'prod name': 'Product Name', 
 'prod cost': 'Product Cost'
}
df2 = df.rename(columns = d)

In [10]:
df2.columns

Index(['Order Id', 'Customer Name', 'Customer Id', 'Product Name',
       'Product Cost'],
      dtype='object')

here all column names have changed

---
### columns attribute - Changing multiple column names
* We can directly change the column names with columns  attribute

In [11]:
df.columns = [
 "order_id", 
 "customer_name", 
 "customer_id", 
 "product_name", 
 "product_cost"
]

In [12]:
df.columns

Index(['order_id', 'customer_name', 'customer_id', 'product_name',
       'product_cost'],
      dtype='object')

* Here the column values should match the existing column names.. for consistency.
* Otherwise the column names will get mismatched.
* Also the number of columns should match the existing ones.

---
## Renaming Index

### rename(p) - Change the Index values in DataFrame

In [15]:
df.head()

Unnamed: 0,order_id,customer_name,customer_id,product_name,product_cost
0,192837,Veeru,3,LG Mobile,65999
1,192838,Neelima,19,Apple iPad 10.2-inch,63000
2,192839,Balaji,12,34in Ultrawide Monitor,75999
3,192840,Shahid,20,iPhone 11,60000
4,192841,Vinay,10,Bose SoundSport Headphones,69999


In [21]:
count = 77
index = {}
for i in range(len(df)):
    index[i] = count 
    count += 1

df1 = df.rename(index = index)

In [22]:
df1.head()

Unnamed: 0,order_id,customer_name,customer_id,product_name,product_cost
77,192837,Veeru,3,LG Mobile,65999
78,192838,Neelima,19,Apple iPad 10.2-inch,63000
79,192839,Balaji,12,34in Ultrawide Monitor,75999
80,192840,Shahid,20,iPhone 11,60000
81,192841,Vinay,10,Bose SoundSport Headphones,69999


---
### index attribute - Change the index of the DataFrame records

using list comprehension

In [29]:
df.index = [i  for i in range(77,77+len(df))]

In [30]:
df.head()

Unnamed: 0,order_id,customer_name,customer_id,product_name,product_cost
77,192837,Veeru,3,LG Mobile,65999
78,192838,Neelima,19,Apple iPad 10.2-inch,63000
79,192839,Balaji,12,34in Ultrawide Monitor,75999
80,192840,Shahid,20,iPhone 11,60000
81,192841,Vinay,10,Bose SoundSport Headphones,69999


using range() function

In [31]:
df.index = range(77,77+len(df))

In [32]:
df.head()

Unnamed: 0,order_id,customer_name,customer_id,product_name,product_cost
77,192837,Veeru,3,LG Mobile,65999
78,192838,Neelima,19,Apple iPad 10.2-inch,63000
79,192839,Balaji,12,34in Ultrawide Monitor,75999
80,192840,Shahid,20,iPhone 11,60000
81,192841,Vinay,10,Bose SoundSport Headphones,69999


---

## Bonus

### Convert column names to upper/lower case in DataFrame

In [40]:
df.columns

Index(['order_id', 'customer_name', 'customer_id', 'product_name',
       'product_cost'],
      dtype='object')

In [41]:
df.columns.str.upper()

Index(['ORDER_ID', 'CUSTOMER_NAME', 'CUSTOMER_ID', 'PRODUCT_NAME',
       'PRODUCT_COST'],
      dtype='object')

---

Thank You!! \
Happy Learning :)

---