#Pandas'ta Sık Kullanılan Fonksiyon ve Metodlar

Bu kısımda pandasda bulunan bazı built-in function ve methodları öğreneceğiz. Bu öğrendiklerimiz elbette buz dağının görünen kısmı. Daha gösteremediğim pek çok fonksiyon ve method vardır. Bütün fonksiyon ve methodları görebilmek için bu [dökümantasyon](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) incelenebilir.
İlerleyen aşamalarda daha fazla fonksiyon ve method göreceğiz. Şidilik göreceğimiz fonksiyon ve methodlar :

* [apply() method](#apply_method)
* [apply() with a function](#apply_function)
* [apply() with a lambda expression](#apply_lambda)
* [apply() on multiple columns](#apply_multiple)
* [describe()](#describe)
* [sort_values()](#sort)
* [corr()](#corr)
* [idxmin and idxmax](#idx)
* [value_counts()](#v_c)
* [replace()](#replace)
* [unique and nunique()](#uni)
* [map()](#map)
* [duplicated and drop_duplicates()](#dup)
* [between()](#bet)
* [sample()](#sample)
* [nlargest()](#n)

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('tips.csv')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


#.apply() Methodu

In [None]:
# .apply() methodu bir series ya df nin sütunlarındaki değerlerin hepsine bir anda bir fonksşyonu uygulamaya yarar.
# Arka planda for döngüsü çalışır dolayısıyla çok hızlı değildir. Performans açısından vektörize edilmiş işlemleri uygulamak daha faydalıdır.
# Özellikle vektörize edilmiş işlemler oldukça hızlı çalışır ve for döngüsü kullanımını minimize eder.
# ancak diğer taraftan .apply() methodunu kullanmak oldukça okunaklıdır. Başlangıçta kolaydır ve her duruma uygulanabilir, ancak vektörize işlemler bazı durumlarda yapılamayabilir.

In [None]:
# Amaç : CC Number sütunundaki değerlerin son 4 rakamını almak ve CC Number sütununu bu şekilde güncellemek.

## 1. Yol

In [None]:
#def son4(x):
#  return str(x)[-4:]

In [None]:
#df['CC Number'] = df['CC Number'].apply(son4)
#df.head()

## 2. Yol

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [None]:
#df['CC Number'] = df['CC Number'].apply( lambda x : str(x)[-4:] )

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


## 3. Yol (Vektörized) (En Hızlısı Bu)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [None]:
df['CC Number'].astype(str).str[-4:] # Vektörize edilmiş işlem

0      3410
1      9230
2      1322
3      5994
4      7221
       ... 
239    2842
240    5404
241    7196
242    0950
243    8139
Name: CC Number, Length: 244, dtype: object

In [None]:
# Amaç : 10 dolardan az ödediyse $, 10 dolar ile 30 dolar arasında ödediyse $$, 
# 30 dolardan fazla ödediyse $$$ işareti bulunan Fiyat Kategorisi adında bir sütun oluşturmak.

##1. Yol

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [None]:
def fiyat_kategori(x):
  if x < 10:
    return '$'
  elif 10 <= x < 30:
    return '$$'
  else :
    return '$$$'

In [None]:
df['fiyat_kategori'] = df['total_bill'].apply(fiyat_kategori)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategori
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,$$


##2. Yol

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategori
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,$$


In [None]:
df['fiyat_kategori'] = df['total_bill'].apply(lambda x : '$' if  x < 10 else ('$$' if x <= 30 else '$$$'))
df.sample(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategori
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,$$
128,11.38,2.0,Female,No,Thur,Lunch,2,5.69,Christine Perkins,3548391118913991,Thur8551,$$
31,18.35,2.5,Male,No,Sat,Dinner,4,4.59,Danny Santiago,630415546013,Sat4947,$$
166,20.76,2.24,Male,No,Sun,Dinner,2,10.38,Gordon Lane,4110599849536479,Sun6738,$$
8,15.04,1.96,Male,No,Sun,Dinner,2,7.52,Joseph Mcdonald,3522866365840377,Sun6820,$$
164,17.51,3.0,Female,Yes,Sun,Dinner,2,8.76,Audrey Griffin,3500853929693258,Sun444,$$
11,35.26,5.0,Female,No,Sun,Dinner,4,8.82,Diane Macias,4577817359320969,Sun6686,$$$
134,18.26,3.25,Female,No,Thur,Lunch,2,9.13,Karen Rodriguez,4952604748911,Thur75,$$
109,14.31,4.0,Female,Yes,Sat,Dinner,2,7.16,Amanda Anderson,375638820334211,Sat2614,$$
34,17.78,3.27,Male,No,Sat,Dinner,2,8.89,Jacob Castillo,3551492000704805,Sat8124,$$


##3. Yol Vektorized En hızlısı

In [None]:
df.drop('fiyat_kategori', axis = 1, inplace = True)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [None]:
def fiyat_kategori_vectorized(bill):
  categories = np.full_like(bill , '$', dtype = 'U3')
  
  categories[(bill > 10) & (bill < 30)] = '$$'

  categories[bill >= 30] = '$$$'

  return categories

In [None]:
df['fiyat_kategorisi'] = fiyat_kategori_vectorized(df['total_bill'].values)

In [None]:
df.sample(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
179,34.63,3.55,Male,Yes,Sun,Dinner,2,17.32,Brian Bailey,346656312114848,Sun9851,$$$
194,16.58,4.0,Male,Yes,Thur,Lunch,2,8.29,Benjamin Weber,676210011505,Thur9318,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,$$
177,14.48,2.0,Male,Yes,Sun,Dinner,2,7.24,John Dudley,4565183162071073,Sun6203,$$
228,13.28,2.72,Male,No,Sat,Dinner,2,6.64,Glenn Jones,502061651712,Sat2937,$$
218,7.74,1.44,Male,Yes,Sat,Dinner,2,3.87,Nicholas Archer,340517153733524,Sat4772,$
49,18.04,3.0,Male,No,Sun,Dinner,2,9.02,William Roth,6573923967142503,Sun9774,$$
101,15.38,3.0,Female,Yes,Fri,Dinner,2,7.69,Tiffany Colon,6011012799432041,Fri8382,$$
99,12.46,1.5,Male,No,Fri,Dinner,2,6.23,Edward Carter,347435564751626,Fri5575,$$
63,18.29,3.76,Male,Yes,Sat,Dinner,4,4.57,Chad Hart,580171498976,Sat4178,$$


# .apply() metodu birden fazla sütuna aynı anda uygulanabilir

##1. Yol

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,$$


In [None]:
def kareal(x):
  return x**2

In [None]:
#df[['total_bill','tip']] = df[['total_bill','tip']].apply(kareal)
 df[['total_bill','tip']].apply(kareal)

IndentationError: ignored

##2. Yol

In [None]:
 df[['total_bill','tip']].apply(lambda x : x**2)

Unnamed: 0,total_bill,tip
0,288.6601,1.0201
1,106.9156,2.7556
2,441.4201,12.2500
3,560.7424,10.9561
4,604.6681,13.0321
...,...,...
239,842.7409,35.0464
240,738.7524,4.0000
241,513.9289,4.0000
242,317.5524,3.0625


##3. Yol Vectorized En Hızlısı

In [None]:
np.square(df[['total_bill', 'tip']])

Unnamed: 0,total_bill,tip
0,288.6601,1.0201
1,106.9156,2.7556
2,441.4201,12.2500
3,560.7424,10.9561
4,604.6681,13.0321
...,...,...
239,842.7409,35.0464
240,738.7524,4.0000
241,513.9289,4.0000
242,317.5524,3.0625


In [None]:
df.describe()

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
count,244.0,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197,2563496000000000.0
std,8.902412,1.383638,0.9511,2.914234,2369340000000000.0
min,3.07,1.0,1.0,2.88,60406790000.0
25%,13.3475,2.0,2.0,5.8,30407310000000.0
50%,17.795,2.9,2.0,7.255,3525318000000000.0
75%,24.1275,3.5625,3.0,9.39,4553675000000000.0
max,50.81,10.0,6.0,20.27,6596454000000000.0


In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,$$


In [None]:
df.sort_values(by = 'total_bill') # ascending = true olduğu için küçükten büyüğe doğru sıraladı.

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,$
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780,$
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,$
172,7.25,5.15,Male,Yes,Sun,Dinner,2,3.62,Larry White,30432617123103,Sun9209,$
149,7.51,2.00,Male,No,Thur,Lunch,2,3.76,Daniel Robbins,4823139288341889,Thur6321,$
...,...,...,...,...,...,...,...,...,...,...,...,...
182,45.35,3.50,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$
156,48.17,5.00,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$


In [None]:
df.sort_values(by = 'total_bill', ascending = False, inplace = True)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
156,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
182,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
df.reset_index(inplace = True)

In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
df.sort_values([ 'tip', 'total_bill'], ascending = False)

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.00,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
10,23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239,$$$
2,59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
20,141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025,$$$
...,...,...,...,...,...,...,...,...,...,...,...,...,...
131,0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,$$
195,236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,Sat5032,$$
240,111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,$
242,92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780,$


In [None]:
df.head(10)

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$
5,102,44.3,2.5,Female,Yes,Sat,Dinner,3,14.77,Heather Cohen,379771118886604,Sat6240,$$$
6,197,43.11,5.0,Female,Yes,Thur,Lunch,4,10.78,Brooke Soto,5544902205760175,Thur9313,$$$
7,142,41.19,5.0,Male,No,Thur,Lunch,5,8.24,Eric Andrews,4356531761046453,Thur3621,$$$
8,184,40.55,3.0,Male,Yes,Sun,Dinner,2,20.27,Stephen Cox,3547798222044029,Sun5140,$$$
9,95,40.17,4.73,Male,Yes,Fri,Dinner,4,10.04,Aaron Bentley,180026611638690,Fri9628,$$$


In [None]:
df['price_per_person'].max()

20.27

In [None]:
df['price_per_person'].argmax()

8

In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
df.corr() # makine öğrenmesi modelleri kurarken hayati öneme sahip.
          # numeric columnlar arasındaki doğrusal ilişkileri gösteriyor.

Unnamed: 0,index,total_bill,tip,size,price_per_person,CC Number
index,1.0,0.044526,-0.026709,0.008061,0.042043,-0.117043
total_bill,0.044526,1.0,0.675734,0.598315,0.647554,0.104576
tip,-0.026709,0.675734,1.0,0.489299,0.347405,0.110857
size,0.008061,0.598315,0.489299,1.0,-0.175359,-0.030239
price_per_person,0.042043,0.647554,0.347405,-0.175359,1.0,0.13524
CC Number,-0.117043,0.104576,0.110857,-0.030239,0.13524,1.0


In [None]:
df['sex'].value_counts()

Male      157
Female     87
Name: sex, dtype: int64

In [None]:
df.day.value_counts()

Sat     87
Sun     76
Thur    62
Fri     19
Name: day, dtype: int64

In [None]:
df.day.unique()

array(['Sat', 'Sun', 'Thur', 'Fri'], dtype=object)

In [None]:
df.day.nunique()

4

In [None]:
# len(df.day.unique())

In [None]:
# Amaç : sex column'ındaki Male olanları 'M', Female olanları 'F' haline getirmek

##1. Yol (.apply() Metoduyla)

In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
df['sex'] = df['sex'].apply(lambda x : 'M' if x == 'Male' else 'F')

In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


##2. Yol (.replace() Metoduyla)

In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
df['sex'].replace('Female','F', inplace = True)
df['sex'].replace('Male','M', inplace = True)

In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
df['sex'].replace(['Female' , 'Male'], ['F' , 'M'])

0      M
1      M
2      M
3      M
4      M
      ..
239    M
240    F
241    M
242    F
243    F
Name: sex, Length: 244, dtype: object

##3. Yol (.map() Metoduyla)

In [None]:
df['sex'].map({'Female' : 'F',
                'Male' : 'M'})

0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
      ... 
239    NaN
240    NaN
241    NaN
242    NaN
243    NaN
Name: sex, Length: 244, dtype: object

In [None]:
#ÖNCEKİ DERSSS pandas3

In [None]:
simple_df = pd.DataFrame( data = [[1,2,2,2] , [1,2,3,4] , [1,2,2,2] , [1,2,2,2] ], index = ['a','b','c','d'] , columns = ['col1','col2','col3','col4'])
simple_df

Unnamed: 0,col1,col2,col3,col4
a,1,2,2,2
b,1,2,3,4
c,1,2,2,2
d,1,2,2,2


In [None]:
simple_df.duplicated()

a    False
b    False
c     True
d     True
dtype: bool

In [None]:
simple_df.drop_duplicates(keep = False)

Unnamed: 0,col1,col2,col3,col4
b,1,2,3,4


In [None]:
simple_df

Unnamed: 0,col1,col2,col3,col4
a,1,2,2,2
b,1,2,3,4
c,1,2,2,2
d,1,2,2,2


In [None]:
df.head()

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
2,59,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
3,156,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,3523151482063321,Sun7518,$$$
4,182,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,4112207559459910,Sun2337,$$$


In [None]:
# Amaç : total_bill değeri 10 ile 20 arasındaki satırları getirin.

In [None]:
df[10 < df['total_bill'] < 20] # bu şekilde sorgu yapamıyoruz .between() metodunu kullanırız.

ValueError: ignored

In [None]:
df[df['total_bill'].between(10,20)] # default ayarlarda inclusive = 'both' tur. Yani iki taraf da dahildir.

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
97,24,19.82,3.18,M,No,Sat,Dinner,2,9.91,Christopher Ross,36739148167928,Sat6236,$$
98,191,19.81,4.19,F,Yes,Thur,Lunch,2,9.90,Kristy Boyd,4317015327600068,Thur967,$$
99,154,19.77,2.00,M,No,Sun,Dinner,4,4.94,James Smith,213169731428229,Sun5814,$$
100,29,19.65,3.00,F,No,Sat,Dinner,2,9.82,Melinda Murphy,5489272944576051,Sat2467,$$
101,55,19.49,3.51,M,No,Sun,Dinner,2,9.74,Michael Hamilton,6502227786581768,Sun1118,$$
...,...,...,...,...,...,...,...,...,...,...,...,...,...
222,51,10.29,2.60,F,No,Sun,Dinner,2,5.14,Jessica Ibarra,4999759463713,Sun4474,$$
223,10,10.27,1.71,M,No,Sun,Dinner,2,5.14,William Riley,566287581219,Sun2546,$$
224,226,10.09,2.00,F,Yes,Fri,Lunch,2,5.04,Ruth Weiss,5268689490381635,Fri6359,$$
225,82,10.07,1.83,F,No,Thur,Lunch,1,10.07,Julie Moody,630413282843,Thur4909,$$


In [None]:
df[df['total_bill'].between(10,20, inclusive = 'left')]

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
97,24,19.82,3.18,M,No,Sat,Dinner,2,9.91,Christopher Ross,36739148167928,Sat6236,$$
98,191,19.81,4.19,F,Yes,Thur,Lunch,2,9.90,Kristy Boyd,4317015327600068,Thur967,$$
99,154,19.77,2.00,M,No,Sun,Dinner,4,4.94,James Smith,213169731428229,Sun5814,$$
100,29,19.65,3.00,F,No,Sat,Dinner,2,9.82,Melinda Murphy,5489272944576051,Sat2467,$$
101,55,19.49,3.51,M,No,Sun,Dinner,2,9.74,Michael Hamilton,6502227786581768,Sun1118,$$
...,...,...,...,...,...,...,...,...,...,...,...,...,...
222,51,10.29,2.60,F,No,Sun,Dinner,2,5.14,Jessica Ibarra,4999759463713,Sun4474,$$
223,10,10.27,1.71,M,No,Sun,Dinner,2,5.14,William Riley,566287581219,Sun2546,$$
224,226,10.09,2.00,F,Yes,Fri,Lunch,2,5.04,Ruth Weiss,5268689490381635,Fri6359,$$
225,82,10.07,1.83,F,No,Thur,Lunch,1,10.07,Julie Moody,630413282843,Thur4909,$$


In [None]:
df.nlargest(5, 'tip')

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
10,23,39.42,7.58,M,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239,$$$
2,59,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
20,141,34.3,6.7,M,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025,$$$


In [None]:
df.sort_values('tip', ascending = False).head(5) # ascending false büyükten küçüğe çeker

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
0,170,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954,$$$
1,212,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590,$$$
10,23,39.42,7.58,M,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239,$$$
2,59,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,$$$
20,141,34.3,6.7,M,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025,$$$


In [None]:
df.nsmallest(columns = ['tip', 'total_bill'], n = 3)

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
243,67,3.07,1.0,F,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,$
242,92,5.75,1.0,F,Yes,Fri,Dinner,2,2.88,Leah Ramirez,3508911676966392,Fri3780,$
240,111,7.25,1.0,F,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801,$


In [None]:
df.sample(10)

Unnamed: 0,index,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,fiyat_kategorisi
68,189,23.1,4.0,M,Yes,Sun,Dinner,3,7.7,Richard Stevens,3560193117506187,Sun1821,$$
188,202,13.0,2.0,F,Yes,Thur,Lunch,2,6.5,Ashley Shaw,180088043008041,Thur1301,$$
5,102,44.3,2.5,F,Yes,Sat,Dinner,3,14.77,Heather Cohen,379771118886604,Sat6240,$$$
130,71,17.07,3.0,F,No,Sat,Dinner,3,5.69,Teresa Fisher,5442222963796367,Sat3469,$$
116,49,18.04,3.0,M,No,Sun,Dinner,2,9.02,William Roth,6573923967142503,Sun9774,$$
166,74,14.73,2.2,F,No,Sat,Dinner,2,7.36,Ashley Harris,501828723483,Sat6548,$$
61,119,24.08,2.92,F,No,Thur,Lunch,4,6.02,Melanie Jordan,676212062720,Thur8063,$$
132,18,16.97,3.5,F,No,Sun,Dinner,3,5.66,Laura Martinez,30422275171379,Sun2789,$$
104,243,18.78,3.0,F,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,Thur672,$$
42,77,27.2,4.0,M,No,Thur,Lunch,4,6.8,John Davis,30344778738589,Thur4924,$$


In [None]:
# Done.