## Useful Methods

Useful methods and functions built in to pandas.
The [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) is a great resource to continue exploring more methods and functions.

- [apply() method](#apply_method)
- [apply() with a function](#apply_function)
- [apply() with a lambda expression](#apply_lambda)
- [apply() on multiple columns](#apply_multiple)
- [describe()](#describe)
- [sort_values()](#sort)
- [corr()](#corr)
- [idxmin and idxmax](#idx)
- [value_counts](#v_c)
- [replace](#replace)
- [unique and nunique](#uni)
- [map](#map)
- [duplicated and drop_duplicates](#dup)
- [between](#bet)
- [sample](#sample)
- [nlargest](#n)

In [1]:
import pandas as pd
import numpy as np


df = pd.read_csv('tips.csv')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


<a id='apply_method'></a>

### The .apply() method

This allows to apply and broadcast custom functions on a DataFrame column

<a id='apply_function'></a>
#### apply with a function

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   total_bill        244 non-null    float64
 1   tip               244 non-null    float64
 2   sex               244 non-null    object 
 3   smoker            244 non-null    object 
 4   day               244 non-null    object 
 5   time              244 non-null    object 
 6   size              244 non-null    int64  
 7   price_per_person  244 non-null    float64
 8   Payer Name        244 non-null    object 
 9   CC Number         244 non-null    int64  
 10  Payment ID        244 non-null    object 
dtypes: float64(3), int64(2), object(6)
memory usage: 21.1+ KB


In [3]:
def last_four(num):
    return str(num)[-4:]

In [4]:
df['CC Number'][0]

3560325168603410

In [5]:
last_four(3560325168603410)

'3410'

In [7]:
df['last_four'] = df['CC Number'].apply(last_four)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221


#### Using .apply() with more complex functions

In [8]:
df['total_bill'].mean()

19.78594262295082

In [9]:
def how_expensive(price):
    expensive = '$'

    if price >= 10 and price < 30:
        expensive = '$$'

    if price > 30:
        expensive = '$$$'
    
    return expensive

df['expensive'] = df['total_bill'].apply(how_expensive)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221,$$


<a id='apply_lambda'></a>
#### apply with lambda

In [10]:
df['total_bill'].apply(lambda bill: bill * 0.1).head()

0    1.699
1    1.034
2    2.101
3    2.368
4    2.459
Name: total_bill, dtype: float64

<a id='apply_multiple'></a>
#### apply that uses multiple columns

Check this [stackoverflow post](https://stackoverflow.com/questions/19914937/applying-function-with-multiple-arguments-to-create-a-new-pandas-column)

In [11]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221,$$


In [17]:
def tip_quality(total_bill, tip):
    tip_given = 'normal'
    if tip / total_bill > 0.2:
        tip_given = 'Generous'
    
    return tip_given


df['Tip Quality'] = df[['total_bill', 'tip']].apply(lambda df: tip_quality(df['total_bill'], df['tip']), axis=1)
df[['total_bill', 'tip', 'expensive', 'Tip Quality']].head(20)

Unnamed: 0,total_bill,tip,expensive,Tip Quality
0,16.99,1.01,$$,normal
1,10.34,1.66,$$,normal
2,21.01,3.5,$$,normal
3,23.68,3.31,$$,normal
4,24.59,3.61,$$,normal
5,25.29,4.71,$$,normal
6,8.77,2.0,$,Generous
7,26.88,3.12,$$,normal
8,15.04,1.96,$$,normal
9,14.78,3.23,$$,Generous


In [18]:
import numpy as np

# Another way
df['Same Tip Quality'] = np.vectorize(tip_quality)(df['total_bill'], df['tip'])
df[['total_bill', 'tip', 'expensive', 'Tip Quality', 'Same Tip Quality']].head(20)

Unnamed: 0,total_bill,tip,expensive,Tip Quality,Same Tip Quality
0,16.99,1.01,$$,normal,normal
1,10.34,1.66,$$,normal,normal
2,21.01,3.5,$$,normal,normal
3,23.68,3.31,$$,normal,normal
4,24.59,3.61,$$,normal,normal
5,25.29,4.71,$$,normal,normal
6,8.77,2.0,$,Generous,Generous
7,26.88,3.12,$$,normal,normal
8,15.04,1.96,$$,normal,normal
9,14.78,3.23,$$,Generous,Generous


---
Which way is faster?

In [19]:
import timeit 
  
# code snippet to be executed only once 
setup = '''
import numpy as np
import pandas as pd
df = pd.read_csv('tips.csv')
def tip_quality(total_bill, tip):
    tip_given = 'normal'
    if tip / total_bill > 0.2:
        tip_given = 'Generous'
    
    return tip_given
'''
  
# Execution time to be measured 
stmt_one = ''' 
df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: tip_quality(df['total_bill'], df['tip']), axis=1)
'''

stmt_two = '''
df['Tip Quality'] = np.vectorize(tip_quality)(df['total_bill'], df['tip'])
'''
  

In [24]:
appy_with_lambda = timeit.timeit(setup=setup, stmt=stmt_one, number=1000)
numpy_vectorization = timeit.timeit(setup=setup, stmt=stmt_two, number=1000)
print(f'Aplly with Lambda:{appy_with_lambda} Secs, Numpy Vectorization:{numpy_vectorization} Secs')

Aplly with Lambda:3.420898356999942 Secs, Numpy Vectorization:0.24681303600004867 Secs


Vectorization is much faster! [Full Details](https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html)

<a id='describe'></a>
#### df.describe for statistical summaries

In [27]:
df.describe()

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
count,244.0,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197,2563496000000000.0
std,8.902412,1.383638,0.9511,2.914234,2369340000000000.0
min,3.07,1.0,1.0,2.88,60406790000.0
25%,13.3475,2.0,2.0,5.8,30407310000000.0
50%,17.795,2.9,2.0,7.255,3525318000000000.0
75%,24.1275,3.5625,3.0,9.39,4553675000000000.0
max,50.81,10.0,6.0,20.27,6596454000000000.0


In [28]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.78594,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9510998,1.0,2.0,2.0,3.0,6.0
price_per_person,244.0,7.888197,2.914234,2.88,5.8,7.255,9.39,20.27
CC Number,244.0,2563496000000000.0,2369340000000000.0,60406790000.0,30407310000000.0,3525318000000000.0,4553675000000000.0,6596454000000000.0


<a id='sort'></a>
#### sort_values()

In [29]:
df.sort_values('tip').head(20)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,5267,$,Generous,Generous
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,Sat5032,3965,$$,normal,normal
92,5.75,1.0,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780,6392,$,normal,normal
111,7.25,1.0,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,6887,$,normal,normal
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$,normal,normal
215,12.9,1.1,Female,Yes,Sat,Dinner,2,6.45,Jessica Owen,4726904879471,Sat6983,9471,$$,normal,normal
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,Sat2929,5508,$$$,normal,normal
235,10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,Sat4615,7605,$$,normal,normal
75,10.51,1.25,Male,No,Sat,Dinner,2,5.26,Kenneth Hayes,213142079731108,Sat5056,1108,$$,normal,normal
135,8.51,1.25,Female,No,Thur,Lunch,2,4.26,Rebecca Harris,4320272020376174,Thur6600,6174,$,normal,normal


In [32]:
df.sort_values(['tip', 'size', 'sex']).head(20)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,5267,$,Generous,Generous
111,7.25,1.0,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,6887,$,normal,normal
92,5.75,1.0,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780,6392,$,normal,normal
236,12.6,1.0,Male,Yes,Sat,Dinner,2,6.3,Matthew Myers,3543676378973965,Sat5032,3965,$$,normal,normal
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$,normal,normal
215,12.9,1.1,Female,Yes,Sat,Dinner,2,6.45,Jessica Owen,4726904879471,Sat6983,9471,$$,normal,normal
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,Sat2929,5508,$$$,normal,normal
135,8.51,1.25,Female,No,Thur,Lunch,2,4.26,Rebecca Harris,4320272020376174,Thur6600,6174,$,normal,normal
75,10.51,1.25,Male,No,Sat,Dinner,2,5.26,Kenneth Hayes,213142079731108,Sat5056,1108,$$,normal,normal
235,10.07,1.25,Male,No,Sat,Dinner,2,5.04,Sean Gonzalez,3534021246117605,Sat4615,7605,$$,normal,normal


<a id='corr'></a>
#### df.corr() Correlation checks

[Correlation](https://en.wikipedia.org/wiki/Correlation_and_dependence)

In [33]:
df.corr()

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
total_bill,1.0,0.675734,0.598315,0.647554,0.104576
tip,0.675734,1.0,0.489299,0.347405,0.110857
size,0.598315,0.489299,1.0,-0.175359,-0.030239
price_per_person,0.647554,0.347405,-0.175359,1.0,0.13524
CC Number,0.104576,0.110857,-0.030239,0.13524,1.0


In [34]:
df[['total_bill', 'tip']].corr()

Unnamed: 0,total_bill,tip
total_bill,1.0,0.675734
tip,0.675734,1.0


<a id='idx'></a>
#### idxmin and idxmax

In [35]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$,normal,normal
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$,normal,normal
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322,$$,normal,normal
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994,$$,normal,normal
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221,$$,normal,normal


In [38]:
print(f"The Max bill:${df['total_bill'].max()} on Index:{df['total_bill'].idxmax()}")
print(f"The Min bill:${df['total_bill'].min()} on Index:{df['total_bill'].idxmin()}")

The Max bill:$50.81 on Index:170
The Min bill:$3.07 on Index:67


In [39]:
df.iloc[67]

total_bill                      3.07
tip                              1.0
sex                           Female
smoker                           Yes
day                              Sat
time                          Dinner
size                               1
price_per_person                3.07
Payer Name             Tiffany Brock
CC Number           4359488526995267
Payment ID                   Sat3455
last_four                       5267
expensive                          $
Tip Quality                 Generous
Same Tip Quality            Generous
Name: 67, dtype: object

In [40]:
df.iloc[170]

total_bill                     50.81
tip                             10.0
sex                             Male
smoker                           Yes
day                              Sat
time                          Dinner
size                               3
price_per_person               16.94
Payer Name             Gregory Clark
CC Number           5473850968388236
Payment ID                   Sat1954
last_four                       8236
expensive                        $$$
Tip Quality                   normal
Same Tip Quality              normal
Name: 170, dtype: object

<a id='v_c'></a>
#### value_counts

Method to quickly get a count per category. makes sense on categorical columns.

In [41]:
df['sex'].value_counts()

Male      157
Female     87
Name: sex, dtype: int64

<a id='replace'></a>

#### replace

Replace values with another one.

In [42]:
df['Tip Quality'].replace(to_replace='normal',value='Ok').head()

0    Ok
1    Ok
2    Ok
3    Ok
4    Ok
Name: Tip Quality, dtype: object

In [43]:
df['Tip Quality'] = df['Tip Quality'].replace(to_replace='normal', value='Ok')
df.head(20)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$,Ok,normal
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$,Ok,normal
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322,$$,Ok,normal
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994,$$,Ok,normal
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221,$$,Ok,normal
5,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882,Sun9679,7882,$$,Ok,normal
6,8.77,2.0,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,2223727524230344,Sun5985,344,$,Generous,Generous
7,26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,3514785077705092,Sun8157,5092,$$,Ok,normal
8,15.04,1.96,Male,No,Sun,Dinner,2,7.52,Joseph Mcdonald,3522866365840377,Sun6820,377,$$,Ok,normal
9,14.78,3.23,Male,No,Sun,Dinner,2,7.39,Jerome Abbott,3532124519049786,Sun3775,9786,$$,Generous,Generous


<a id='uni'></a>
#### unique

In [44]:
df['size'].unique()

array([2, 3, 4, 1, 6, 5])

In [45]:
df['size'].nunique()

6

In [46]:
df['time'].unique()

array(['Dinner', 'Lunch'], dtype=object)

<a id='map'></a>
#### map

In [50]:
my_map = {'Dinner': 'D', 'Lunch': 'L'}
df['time'].map(my_map).head(20)

0     D
1     D
2     D
3     D
4     D
5     D
6     D
7     D
8     D
9     D
10    D
11    D
12    D
13    D
14    D
15    D
16    D
17    D
18    D
19    D
Name: time, dtype: object

In [51]:
df['time'].head(20)

0     Dinner
1     Dinner
2     Dinner
3     Dinner
4     Dinner
5     Dinner
6     Dinner
7     Dinner
8     Dinner
9     Dinner
10    Dinner
11    Dinner
12    Dinner
13    Dinner
14    Dinner
15    Dinner
16    Dinner
17    Dinner
18    Dinner
19    Dinner
Name: time, dtype: object

<a id='dup'></a>
### Duplicates

`.duplicated()` and `.drop_duplicates()`

In [52]:
# Returns True for the 1st instance of a duplicated row
df.duplicated().head(20)

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
dtype: bool

In [53]:
simple_df = pd.DataFrame([1, 2, 2], ['a', 'b', 'c'])
simple_df

Unnamed: 0,0
a,1
b,2
c,2


In [54]:
simple_df.duplicated()

a    False
b    False
c     True
dtype: bool

In [55]:
simple_df.drop_duplicates()

Unnamed: 0,0
a,1
b,2


<a id='bet'></a>
#### between

- `left:` A scalar value that defines the left boundary
- `right:` A scalar value that defines the right boundary
- `inclusive:` An String, `both` by default. If `neither`, excludes the two passed arguments while checking.

In [59]:
df['total_bill'].between(10, 20, inclusive='both')

0       True
1       True
2      False
3      False
4      False
       ...  
239    False
240    False
241    False
242     True
243     True
Name: total_bill, Length: 244, dtype: bool

In [61]:
df[df['total_bill'].between(10, 20, inclusive='both')].head(20)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$,Ok,normal
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$,Ok,normal
8,15.04,1.96,Male,No,Sun,Dinner,2,7.52,Joseph Mcdonald,3522866365840377,Sun6820,377,$$,Ok,normal
9,14.78,3.23,Male,No,Sun,Dinner,2,7.39,Jerome Abbott,3532124519049786,Sun3775,9786,$$,Generous,Generous
10,10.27,1.71,Male,No,Sun,Dinner,2,5.14,William Riley,566287581219,Sun2546,1219,$$,Ok,normal
12,15.42,1.57,Male,No,Sun,Dinner,2,7.71,Chad Harrington,577040572932,Sun1300,2932,$$,Ok,normal
13,18.43,3.0,Male,No,Sun,Dinner,4,4.61,Joshua Jones,6011163105616890,Sun2971,6890,$$,Ok,normal
14,14.83,3.02,Female,No,Sun,Dinner,2,7.42,Vanessa Jones,30016702287574,Sun3848,7574,$$,Generous,Generous
16,10.33,1.67,Female,No,Sun,Dinner,3,3.44,Elizabeth Foster,4240025044626033,Sun9715,6033,$$,Ok,normal
17,16.29,3.71,Male,No,Sun,Dinner,3,5.43,John Pittman,6521340257218708,Sun2998,8708,$$,Generous,Generous


<a id='sample'></a>
#### sample

In [62]:
df.sample(5)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
126,8.52,1.48,Male,No,Thur,Lunch,2,4.26,Mario Bradshaw,4524404353861811,Thur6719,1811,$,Ok,normal
10,10.27,1.71,Male,No,Sun,Dinner,2,5.14,William Riley,566287581219,Sun2546,1219,$$,Ok,normal
130,19.08,1.5,Male,No,Thur,Lunch,2,9.54,Seth Sexton,213113680829581,Thur1446,9581,$$,Ok,normal
124,12.48,2.52,Female,No,Thur,Lunch,2,6.24,Jordan Diaz,4472778228206399,Thur208,6399,$$,Generous,Generous
37,16.93,3.07,Female,No,Sat,Dinner,3,5.64,Erin Lewis,5161695527390786,Sat6406,786,$$,Ok,normal


In [63]:
df.sample(frac=0.1)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
47,32.4,6.0,Male,No,Sun,Dinner,4,8.1,James Barnes,3552002592874186,Sun9677,4186,$$$,Ok,normal
19,20.65,3.35,Male,No,Sat,Dinner,3,6.88,Timothy Oneal,6568069240986485,Sat9213,6485,$$,Ok,normal
104,20.92,4.08,Female,No,Sat,Dinner,2,10.46,Gabrielle Frederick,4013010878990106,Sat3194,106,$$,Ok,normal
79,17.29,2.71,Male,No,Thur,Lunch,2,8.64,Brian Diaz,4759290988169738,Thur9501,9738,$$,Ok,normal
134,18.26,3.25,Female,No,Thur,Lunch,2,9.13,Karen Rodriguez,4952604748911,Thur75,8911,$$,Ok,normal
7,26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,3514785077705092,Sun8157,5092,$$,Ok,normal
217,11.59,1.5,Male,Yes,Sat,Dinner,2,5.8,Gary Orr,30324521283406,Sat8489,3406,$$,Ok,normal
214,28.17,6.5,Female,Yes,Sat,Dinner,3,9.39,Marissa Jackson,4922302538691962,Sat3374,1962,$$,Generous,Generous
68,20.23,2.01,Male,No,Sat,Dinner,2,10.12,Mr. Travis Bailey Jr.,60406789937,Sat561,9937,$$,Ok,normal
215,12.9,1.1,Female,Yes,Sat,Dinner,2,6.45,Jessica Owen,4726904879471,Sat6983,9471,$$,Ok,normal


<a id='n'></a>
#### nlargest and nsmallest

In [64]:
df.nlargest(10, 'tip')

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,expensive,Tip Quality,Same Tip Quality
170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,8236,$$$,Ok,normal
212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,5212,$$$,Ok,normal
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239,9808,$$$,Ok,normal
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,595,$$$,Ok,normal
141,34.3,6.7,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025,8508,$$$,Ok,normal
183,23.17,6.5,Male,Yes,Sun,Dinner,4,5.79,Dr. Michael James,4718501859162,Sun6059,9162,$$,Generous,Generous
214,28.17,6.5,Female,Yes,Sat,Dinner,3,9.39,Marissa Jackson,4922302538691962,Sat3374,1962,$$,Generous,Generous
47,32.4,6.0,Male,No,Sun,Dinner,4,8.1,James Barnes,3552002592874186,Sun9677,4186,$$$,Ok,normal
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657,2842,$$,Generous,Generous
88,24.71,5.85,Male,No,Thur,Lunch,2,12.36,Roger Taylor,4410248629955,Thur9003,9955,$$,Generous,Generous


---