# 1. Basic DataFrame Practice
1.   Install the pandas gdown package
*   **"Pandas"** is a Python library designed for data manipulation and analysis.It provides a versatile data structure called a **DataFrame**, which simplifies the handling of tabular data. The **DataFrame** has a table with rows and columns, similar to Excel spreadsheets. Pandas is widely used for efficiently processing and analyzing data. It offers functionalities to read data from various sources, clean and preprocess data, perform statistical analysis, and even visualize data.
*   **"gdown"** is a Python package used for downloading files from **Google Drive**. This package is user-friendly and particularly useful for directly downloading files hosted on Google Drive.

In [1]:
!pip3 install gdown
!pip3 install pandas



In [3]:
import gdown ## import gdown to call gdown function

gdown.download('https://bit.ly/3736JW1','ns_book6.csv', quiet=False)

Downloading...
From: https://bit.ly/3736JW1
To: /content/ns_book6.csv
100%|██████████| 55.0M/55.0M [00:01<00:00, 28.7MB/s]


'ns_book6.csv'

In [82]:
import pandas as pd  ## Import Pandas to call pandas function as pd

## Below is the Simple Example of Dataframe for exercise
df = pd.DataFrame(
    {"a" : [4,5,6],
     "b" : [7,8,9],
     "c" : [10,11,12]},
    index = pd.MultiIndex.from_tuples(
        [('d',1),('d',2),('e',2)], names = ['n','v']))

df

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b,c
n,v,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
d,1,4,7,10
d,2,5,8,11
e,2,6,9,12


1. **pandas.melt**
2. Useage : pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)
3. Describtion : Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

In [7]:
pd.melt(df) ##

Unnamed: 0,variable,value
0,a,4
1,a,5
2,a,6
3,b,7
4,b,8
5,b,9
6,c,10
7,c,11
8,c,12


1. **pandas.concat**
2. Useage : pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)
3. Describtion : Concatenate pandas objects along a particular axis. Allows optional set logic along the other axes. Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.

In [11]:
df2 = pd.DataFrame(
    {"a" : ["a0","a1","a2"],
     "b" : ["b0","b1","b2"]
     })

df3 = pd.DataFrame(
    {"a" : ["a3","a4","a5"],
     "b" : ["b0","b1","b2"]
     })

In [12]:
df2

Unnamed: 0,a,b
0,a0,b0
1,a1,b1
2,a2,b2


In [13]:
df3

Unnamed: 0,a,b
0,a3,b0
1,a4,b1
2,a5,b2


In [15]:
pd.concat([df2,df3])

Unnamed: 0,a,b
0,a0,b0
1,a1,b1
2,a2,b2
0,a3,b0
1,a4,b1
2,a5,b2


In [16]:
pd.concat([df2,df3], axis =1 )

Unnamed: 0,a,b,a.1,b.1
0,a0,b0,a3,b0
1,a1,b1,a4,b1
2,a2,b2,a5,b2


In [47]:
## New DataFrames
df3 = pd.DataFrame(
    {"a" : ["a0","a1","a2","a3","a4"],
     "b" : ["b0","b1","b2","b3","b4"],
     "c" : ["c0","c1","c2","c3","c4"],
     "d" : ["d0","d1","d2","d3","d4"]
     })

df4 = pd.DataFrame(
    {"a" : ["a0","a1","a2","a3","a4"],
     "b" : ["b0","b4","b5","b6","b7"],
     "c" : ["c0","c2","c3","c4","c5"],
     "d" : ["d0","d1","d2","d3","d4"]
     })

df5 = pd.concat([df3,df4])
df5

Unnamed: 0,a,b,c,d
0,a0,b0,c0,d0
1,a1,b1,c1,d1
2,a2,b2,c2,d2
3,a3,b3,c3,d3
4,a4,b4,c4,d4
0,a0,b0,c0,d0
1,a1,b4,c2,d1
2,a2,b5,c3,d2
3,a3,b6,c4,d3
4,a4,b7,c5,d4


In [48]:
column_df5 = len(df5)
column_df5

10

In [49]:
df5[0:7]  ## Dataframe Display from 0 to 7 columns

Unnamed: 0,a,b,c,d
0,a0,b0,c0,d0
1,a1,b1,c1,d1
2,a2,b2,c2,d2
3,a3,b3,c3,d3
4,a4,b4,c4,d4
0,a0,b0,c0,d0
1,a1,b4,c2,d1


In [52]:
df6=df5.drop_duplicates()
## Elimination of Duplicated row Data
## Drop [a0, b0, c0, d0]
df6

Unnamed: 0,a,b,c,d
0,a0,b0,c0,d0
1,a1,b1,c1,d1
2,a2,b2,c2,d2
3,a3,b3,c3,d3
4,a4,b4,c4,d4
1,a1,b4,c2,d1
2,a2,b5,c3,d2
3,a3,b6,c4,d3
4,a4,b7,c5,d4


In [53]:
len(df6)

9

In [54]:
df6.head(2)

Unnamed: 0,a,b,c,d
0,a0,b0,c0,d0
1,a1,b1,c1,d1


In [56]:
df6.tail(1)

Unnamed: 0,a,b,c,d
4,a4,b7,c5,d4


1. **pandas.DataFrame.sample**
2. Useage : DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)
3. Describtion : Return a random sample of items from an axis of object.

In [57]:
df6.sample(frac = 0.5) ## Sampling 50 % of Data from DataFrame

Unnamed: 0,a,b,c,d
2,a2,b2,c2,d2
1,a1,b1,c1,d1
0,a0,b0,c0,d0
1,a1,b4,c2,d1


In [59]:
df6.sample(n = 2) ## Number of items from axis to return

Unnamed: 0,a,b,c,d
1,a1,b4,c2,d1
1,a1,b1,c1,d1


In [61]:
df6.iloc[1:4]  ## Useful Function named iloc uses to select specific data from DataFrame

Unnamed: 0,a,b,c,d
1,a1,b1,c1,d1
2,a2,b2,c2,d2
3,a3,b3,c3,d3


In [67]:
df6[0:6]

Unnamed: 0,a,b,c,d
0,a0,b0,c0,d0
1,a1,b1,c1,d1
2,a2,b2,c2,d2
3,a3,b3,c3,d3
4,a4,b4,c4,d4
1,a1,b4,c2,d1


In [78]:
col = ['col1','col2','col3']
row = ['row3','row5','row1','row4','row2']
data = [[ 1, 21, 7],
        [ 2, 33, 3],
        [ 2,  7,97],
        [ 4, 56,31],
        [ 5, 18, 5]]
df7 = pd.DataFrame(data=data, index=row, columns=col)
df7

Unnamed: 0,col1,col2,col3
row3,1,21,7
row5,2,33,3
row1,2,7,97
row4,4,56,31
row2,5,18,5


In [79]:
df7.nlargest(n=3,columns='col1',keep='first')

      col1  col2  col3
row2     5    18     5
row4     4    56    31
row5     2    33     3


In [81]:
df7.nlargest(n=3,columns='col1',keep='last')

Unnamed: 0,col1,col2,col3
row2,5,18,5
row4,4,56,31
row1,2,7,97


In [85]:
df7.nsmallest(n=3,columns='col1',keep='first')

Unnamed: 0,col1,col2,col3
row3,1,21,7
row5,2,33,3
row1,2,7,97


In [86]:
df7.nsmallest(n=3,columns='col1',keep='last')

Unnamed: 0,col1,col2,col3
row3,1,21,7
row1,2,7,97
row5,2,33,3


In [90]:
df7.sort_values('col1')

Unnamed: 0,col1,col2,col3
row3,1,21,7
row5,2,33,3
row1,2,7,97
row4,4,56,31
row2,5,18,5


In [91]:
df7.sort_values('col2', ascending = False)

Unnamed: 0,col1,col2,col3
row4,4,56,31
row5,2,33,3
row3,1,21,7
row2,5,18,5
row1,2,7,97


In [92]:
df7.rename(columns = {'col1' : 'scol1'})

Unnamed: 0,scol1,col2,col3
row3,1,21,7
row5,2,33,3
row1,2,7,97
row4,4,56,31
row2,5,18,5


In [93]:
df7.sort_index()

Unnamed: 0,col1,col2,col3
row1,2,7,97
row2,5,18,5
row3,1,21,7
row4,4,56,31
row5,2,33,3


In [94]:
df7.reset_index()

Unnamed: 0,index,col1,col2,col3
0,row3,1,21,7
1,row5,2,33,3
2,row1,2,7,97
3,row4,4,56,31
4,row2,5,18,5
