# Some useful method in pandas
[Documentation](#https://pandas.pydata.org/pandas-docs/stable/reference/index.html)


* [apply()_method](#apply_method)
* [apply() with a function](#apply_function)
* [apply() on multiple columns](#apply_multiple)
* [describe()](#describe)
* [sort_values()](#sort_values)


<a id='apply_method'></a>
## .apply() method
This function allow us to apply and broadcast custom funtions on a dataframe column


In [3]:
import pandas as pd
import numpy as np

In [14]:
ls "..\\..\\data-science-with-python\\0.datasets\\tips.csv"

 Volume in drive P is Practice
 Volume Serial Number is F82D-6E56

 Directory of P:\Projects\data-science-with-python\0.datasets

04-12-2024  03:01 PM            18,752 tips.csv
               1 File(s)         18,752 bytes
               0 Dir(s)  159,582,044,160 bytes free


In [4]:
df = pd.read_csv("..\\..\\data-science-with-python\\0.datasets\\tips.csv")


In [17]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


<a id='apply_function'></a>
### apply() with a function
This function allow us to create custom function and to apply that across pandas series in df

In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   total_bill        244 non-null    float64
 1   tip               244 non-null    float64
 2   sex               244 non-null    object 
 3   smoker            244 non-null    object 
 4   day               244 non-null    object 
 5   time              244 non-null    object 
 6   size              244 non-null    int64  
 7   price_per_person  244 non-null    float64
 8   Payer Name        244 non-null    object 
 9   CC Number         244 non-null    int64  
 10  Payment ID        244 non-null    object 
dtypes: float64(3), int64(2), object(6)
memory usage: 21.1+ KB


'3454'

In [20]:
def last_four(input):
    return str(input)[-4:]

In [30]:
# fetch last four digit of Credit card
df['last_four'] = df['CC Number'].apply(last_four)

In [34]:
# To create a column to rate the customers based on tatal bill
#can decorate these functions
def rating(bill):
    """assign '$' to customers with total bill < 10$
      '$$' to customer with total bill >10 and < 30
      '$$$' to customers with total bill >30
      """
    if bill < 10:
        return '$'
    elif bill >=10 and bill <=30:
        return '$$'
    else:
        return '$$$'

In [35]:
df['rating'] =  df['total_bill'].apply(rating)

In [33]:
df[df['rating'] == '$']

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,rating,last_four
6,8.77,2.0,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,2223727524230344,Sun5985,$,344
30,9.55,1.45,Male,No,Sat,Dinner,2,4.78,Grant Hall,30196517521548,Sat4099,$,1548
43,9.68,1.32,Male,No,Sun,Dinner,2,4.84,Christopher Spears,4387671121369212,Sun3279,$,9212
53,9.94,1.56,Male,No,Sun,Dinner,2,4.97,Curtis Morgan,4628628020417301,Sun4561,$,7301
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,$,5267
92,5.75,1.0,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780,$,6392
111,7.25,1.0,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,$,6887
126,8.52,1.48,Male,No,Thur,Lunch,2,4.26,Mario Bradshaw,4524404353861811,Thur6719,$,1811
135,8.51,1.25,Female,No,Thur,Lunch,2,4.26,Rebecca Harris,4320272020376174,Thur6600,$,6174
145,8.35,1.5,Female,No,Thur,Lunch,2,4.18,Amy Young,4285454264477,Thur9331,$,4477


<a id = 'apply_multiple'></a>
### apply() with Multiple columns

In [41]:
def rate_tip(tip, total_bill):
    """This function will calculate how generous is customer while giving tip"""
    if tip/total_bill > .25:
        return "Generous"
    return "low tip"
   

In [44]:
rate_tip(10,50)

'low tip'

In [45]:
df['tips_rate'] = df[['tip','total_bill']].apply(lambda df: rate_tip(df['tip'],df['total_bill']),axis =1 )

### vectorize()  
vectorize is used **to make python function numpy aware** for efficient handling.  
The rate_tip() function is basic python function hence unaware about numpy array(Panda Series),
since pandas series is also based on numpy array we can use vectorize to make rate_tip() function numpy aware for better efficiency
    

In [47]:
df['tips_rate'] = np.vectorize(rate_tip)(df['tip'],df['total_bill'])

In [49]:
df['tips_rate']

0      low tip
1      low tip
2      low tip
3      low tip
4      low tip
        ...   
239    low tip
240    low tip
241    low tip
242    low tip
243    low tip
Name: tips_rate, Length: 244, dtype: object

In [50]:
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,rating,last_four,tips_rate
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,$$,3410,low tip
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,$$,9230,low tip
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458,$$,1322,low tip
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,$$,5994,low tip
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,$$,7221,low tip
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657,$$,2842,low tip
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766,$$,5404,low tip
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880,$$,7196,low tip
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17,$$,0950,low tip


<a id = 'describe'></a>
### describe()
This function is used to describe the statistical summaries

In [5]:
df.describe()

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
count,244.0,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197,2563496000000000.0
std,8.902412,1.383638,0.9511,2.914234,2369340000000000.0
min,3.07,1.0,1.0,2.88,60406790000.0
25%,13.3475,2.0,2.0,5.8,30407310000000.0
50%,17.795,2.9,2.0,7.255,3525318000000000.0
75%,24.1275,3.5625,3.0,9.39,4553675000000000.0
max,50.81,10.0,6.0,20.27,6596454000000000.0


<a id = 'sort_values'></a>
### sort_values()
Sortes the dataframes based on the column values provided

In [6]:
df.sort_values('tip')

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,Sat5032
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
...,...,...,...,...,...,...,...,...,...,...,...
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590


In [7]:
df.sort_values(['tip','total_bill'])

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,Sat5032
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
...,...,...,...,...,...,...,...,...,...,...,...
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590


In [8]:
df['tip'].max()

10.0

In [9]:
# value_counts() -give the count of each value's occurence in the column
df['sex'].value_counts()

Male      157
Female     87
Name: sex, dtype: int64

In [10]:
df['smoker'].value_counts()

No     151
Yes     93
Name: smoker, dtype: int64

In [11]:
# unique() -- return the unique values in the series/column
df['day'].unique()

array(['Sun', 'Sat', 'Thur', 'Fri'], dtype=object)

In [12]:
# nunique() -- returns the no of unique values in the series/column
df['day'].nunique()

4

In [13]:
df['sex'].replace('Female','F')

0         F
1      Male
2      Male
3      Male
4         F
       ... 
239    Male
240       F
241    Male
242    Male
243       F
Name: sex, Length: 244, dtype: object

In [14]:
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17


In [15]:
df['sex'].replace(['Female','Male'],['F','M'])

0      F
1      M
2      M
3      M
4      F
      ..
239    M
240    F
241    M
242    M
243    F
Name: sex, Length: 244, dtype: object

In [16]:
my_map = { 'Female': 'F' , 'Male': 'M'}
df['sex'].map(my_map)

0      F
1      M
2      M
3      M
4      F
      ..
239    M
240    F
241    M
242    M
243    F
Name: sex, Length: 244, dtype: object

### duplicated() -- it will return True for the duplicated value/record in the series/column or dataframe 
for first occurance of value it will return false


In [19]:
df.duplicated()
df['sex'].duplicated()

0      False
1      False
2       True
3       True
4       True
       ...  
239     True
240     True
241     True
242     True
243     True
Name: sex, Length: 244, dtype: bool

In [20]:
df['sex'].drop_duplicates()

0    Female
1      Male
Name: sex, dtype: object

In [21]:
df['total_bill'].between(10,20,inclusive = True)

0       True
1       True
2      False
3      False
4      False
       ...  
239    False
240    False
241    False
242     True
243     True
Name: total_bill, Length: 244, dtype: bool

In [23]:
# to return the top n rows ordering/sorting df based on the given column/series
df.nlargest(10,'tip')

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954
212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139
141,34.3,6.7,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025
183,23.17,6.5,Male,Yes,Sun,Dinner,4,5.79,Dr. Michael James,4718501859162,Sun6059
214,28.17,6.5,Female,Yes,Sat,Dinner,3,9.39,Marissa Jackson,4922302538691962,Sat3374
47,32.4,6.0,Male,No,Sun,Dinner,4,8.1,James Barnes,3552002592874186,Sun9677
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
88,24.71,5.85,Male,No,Thur,Lunch,2,12.36,Roger Taylor,4410248629955,Thur9003


In [24]:
df.sample(5)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
12,15.42,1.57,Male,No,Sun,Dinner,2,7.71,Chad Harrington,577040572932,Sun1300
89,21.16,3.0,Male,No,Thur,Lunch,2,10.58,Keith Lewis,4356005144080422,Thur6273
17,16.29,3.71,Male,No,Sun,Dinner,3,5.43,John Pittman,6521340257218708,Sun2998
9,14.78,3.23,Male,No,Sun,Dinner,2,7.39,Jerome Abbott,3532124519049786,Sun3775
243,18.78,3.0,Female,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,Thur672
