# Pandasda sık kullanılan Fonksiyon ve Metodlar

Bu kısımda pandasda bulunan bazı built-in function ve methodları öğreneceğiz. Bu öğrendiklerimiz elbette buz dağının görünen kısmı. Daha gösteremediğim pek çok fonksiyon ve method vardır. Bütün fonksiyon ve methodları görebilmek için bu [dökümantasyon](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) incelenebilir.
İlerleyen aşamalarda daha fazla fonksiyon ve method göreceğiz. Şidilik göreceğimiz fonksiyon ve methodlar :

* [apply() method](#apply_method)
* [apply() with a function](#apply_function)
* [apply() with a lambda expression](#apply_lambda)
* [apply() on multiple columns](#apply_multiple)
* [describe()](#describe)
* [sort_values()](#sort)
* [corr()](#corr)
* [idxmin and idxmax](#idx)
* [value_counts()](#v_c)
* [replace()](#replace)
* [unique and nunique()](#uni)
* [map()](#map)
* [duplicated and drop_duplicates()](#dup)
* [between()](#bet)
* [sample()](#sample)
* [nlargest()](#n)

dk 20 de kaldım pandas3

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv('tips.csv')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


## .apply() Methodu

In [None]:
# .apply() methodu bir series ya da dataframe'in sütunlarındaki değelerin hepsine bir anda bir fonksiyon'u uygulamaya yarar.
# Arka planında for döngüsü çalışır dolayısıyla çok hızlı değildir. Performans açısından vektörize edilmiş işlemleri uygulamak daha faydalıdır.
# Özellikle vektörize edilmiş işlemler oldukça hızlı çalışır ve for döngüsü kullanımını minimize eder.
# Ancak diğer taraftan .apply() methodunu kullanmak oldukça okunaklıdır. Başlangıçta kolaydır ve her duruma uygulanabilir. Ancak vektörize işlemler bazı durumlarda yapılamayabilir.

In [None]:
# Amaç : CC Number sütunundaki değerlerin son 4 rakamını almak ve CC Number sütununu bu şekilde güncellemek.

## 1.Yol(apply)

In [None]:
def son4(x):
  return str(x)[-4:]

In [None]:
df['CC Number'] = df['CC Number'].apply(son4)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,9230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,1322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,5994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,7221,Sun2251


## 2.Yol(lambda)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,9230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,1322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,5994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,7221,Sun2251


In [None]:
df['CC Number'] = df['CC Number'].apply(lambda x : str(x)[-3:])

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251


## 3.Yol (Vektorized)(En hızlısı bu)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251


In [None]:
df['CC Number'].astype(str).str[-2:] # Vektörize edilmiş işlem

Unnamed: 0,CC Number
0,10
1,30
2,22
3,94
4,21
...,...
239,42
240,04
241,96
242,50


In [None]:
## Amaç : 10 dolardan az ödediyse $ , 10 dolar ile 30 dolar arasında ödediyse $$ ,
# 30 dolardan fazla ödediyse $$$ işareti bulunan Fiyat Kategorisi adında bir sütun oluşturun

## 1.Yol

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251


In [None]:
def fiyat_kategori(x):
  if x < 10 :
    return '$'
  elif 10 <= x < 30 :
    return '$$'
  else :
    return '$$$'

In [None]:
df['Fiyat_Kategorisi'] = df['total_bill'].apply(fiyat_kategori)

In [None]:
df['total_bill'].apply(fiyat_kategori).value_counts()

Unnamed: 0_level_0,count
total_bill,Unnamed: 1_level_1
$$,195
$$$,32
$,17


In [None]:
df.head(13)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251,$$
5,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,882,Sun9679,$$
6,8.77,2.0,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,344,Sun5985,$
7,26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,92,Sun8157,$$
8,15.04,1.96,Male,No,Sun,Dinner,2,7.52,Joseph Mcdonald,377,Sun6820,$$
9,14.78,3.23,Male,No,Sun,Dinner,2,7.39,Jerome Abbott,786,Sun3775,$$


## 2.Yol

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251,$$


In [None]:
df['Fiyat_Kategorisi'] = df['total_bill'].apply(lambda x : '$' if x < 10 else ('$$' if 10 <= x < 30 else '$$$'))

In [None]:
df.sample(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
169,10.63,2.0,Female,Yes,Sat,Dinner,2,5.32,Amy Hill,19,Sat1788,$$
162,16.21,2.0,Female,No,Sun,Dinner,3,5.4,Jennifer Baird,693,Sun5521,$$
199,13.51,2.0,Male,Yes,Thur,Lunch,2,6.76,Joseph Murphy MD,275,Thur2428,$$
144,16.43,2.3,Female,No,Thur,Lunch,2,8.22,Linda Jones,658,Thur9002,$$
35,24.06,3.6,Male,No,Sat,Dinner,3,8.02,Joseph Mullins,299,Sat632,$$
123,15.95,2.0,Male,No,Thur,Lunch,2,7.98,Christopher Lang,319,Thur1992,$$
61,13.81,2.0,Male,Yes,Sat,Dinner,2,6.9,Ryan Hernandez,806,Sat3030,$$
148,9.78,1.73,Male,No,Thur,Lunch,2,4.89,David Stewart,399,Thur7276,$
233,10.77,1.47,Male,No,Sat,Dinner,2,5.38,Paul Novak,858,Sat1467,$$
227,20.45,3.0,Male,No,Sat,Dinner,4,5.11,Robert Bradley,910,Sat4319,$$


## 3.Yol Vektorized(En Hızlısı)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251,$$


In [None]:
df.drop('Fiyat_Kategorisi' , axis = 1 , inplace = True)   #en üstteki fonksiyonu çağırmak yerine Fiyat_Kategori'sini drop ettik

In [None]:
df['total_bill'].values

array([16.99, 10.34, 21.01, 23.68, 24.59, 25.29,  8.77, 26.88, 15.04,
       14.78, 10.27, 35.26, 15.42, 18.43, 14.83, 21.58, 10.33, 16.29,
       16.97, 20.65, 17.92, 20.29, 15.77, 39.42, 19.82, 17.81, 13.37,
       12.69, 21.7 , 19.65,  9.55, 18.35, 15.06, 20.69, 17.78, 24.06,
       16.31, 16.93, 18.69, 31.27, 16.04, 17.46, 13.94,  9.68, 30.4 ,
       18.29, 22.23, 32.4 , 28.55, 18.04, 12.54, 10.29, 34.81,  9.94,
       25.56, 19.49, 38.01, 26.41, 11.24, 48.27, 20.29, 13.81, 11.02,
       18.29, 17.59, 20.08, 16.45,  3.07, 20.23, 15.01, 12.02, 17.07,
       26.86, 25.28, 14.73, 10.51, 17.92, 27.2 , 22.76, 17.29, 19.44,
       16.66, 10.07, 32.68, 15.98, 34.83, 13.03, 18.28, 24.71, 21.16,
       28.97, 22.49,  5.75, 16.32, 22.75, 40.17, 27.28, 12.03, 21.01,
       12.46, 11.35, 15.38, 44.3 , 22.42, 20.92, 15.36, 20.49, 25.21,
       18.24, 14.31, 14.  ,  7.25, 38.07, 23.95, 25.71, 17.31, 29.93,
       10.65, 12.43, 24.08, 11.69, 13.42, 14.26, 15.95, 12.48, 29.8 ,
        8.52, 14.52,

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251


In [None]:
def fiyat_kategori_vectorized(bill):
  categories = np.full_like(bill , '$' , dtype= 'U3')    #Bütün sütunu bill boyutunda $ ile doldurduk(full_like   Tümünü şunun gibi doldur demek.Burada bill boyutunu $ ile doldurduk)

  categories[(bill >= 10) & (bill < 30)] = '$$'   #Bu aralıktaysa $$ ile değiştir

  categories[bill >= 30] = '$$$'    #Bu aralıktaysa $$$ ile değiştir

  return categories   #tüm kategoriyi döndür

In [None]:
df['Fiyat_Kategorisi'] = fiyat_kategori_vectorized(df['total_bill'])

In [None]:
df.sample(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
200,18.71,4.0,Male,Yes,Thur,Lunch,3,6.24,Jason Conrad,487,Thur6048,$$
107,25.21,4.29,Male,Yes,Sat,Dinner,2,12.6,Jason Mullen,868,Sat5196,$$
36,16.31,2.0,Male,No,Sat,Dinner,3,5.44,William Ford,398,Sat9139,$$
181,23.33,5.65,Male,Yes,Sun,Dinner,2,11.66,Jason Cox,223,Sun3402,$$
193,15.48,2.02,Male,Yes,Thur,Lunch,2,7.74,Raymond Sullivan,315,Thur606,$$
100,11.35,2.5,Female,Yes,Fri,Dinner,2,5.68,Lori Lynch,492,Fri4106,$$
152,17.26,2.74,Male,No,Sun,Dinner,3,5.75,Gregory Smith,741,Sun5205,$$
179,34.63,3.55,Male,Yes,Sun,Dinner,2,17.32,Brian Bailey,848,Sun9851,$$$
48,28.55,2.05,Male,No,Sun,Dinner,3,9.52,Austin Fisher,587,Sun4142,$$
129,22.82,2.18,Male,No,Thur,Lunch,3,7.61,Raymond Torres,24,Thur9424,$$


# .apply() metodu birden fazla sütuna aynı anda uygulanabilir

## 1.Yol

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251,$$


In [None]:
def kareAl(x):
  return x**2

In [None]:
# df[['total_bill' , 'tip']] = df[['total_bill' , 'tip']].apply(kareAl)
df[['total_bill' , 'tip']].apply(kareAl)

Unnamed: 0,total_bill,tip
0,288.6601,1.0201
1,106.9156,2.7556
2,441.4201,12.2500
3,560.7424,10.9561
4,604.6681,13.0321
...,...,...
239,842.7409,35.0464
240,738.7524,4.0000
241,513.9289,4.0000
242,317.5524,3.0625


## 2.Yol

In [None]:
# df[['total_bill' , 'tip']] = df[['total_bill' , 'tip']].apply(lambda x : x**2)
df[['total_bill' , 'tip']].apply(lambda x : x**2)

Unnamed: 0,total_bill,tip
0,288.6601,1.0201
1,106.9156,2.7556
2,441.4201,12.2500
3,560.7424,10.9561
4,604.6681,13.0321
...,...,...
239,842.7409,35.0464
240,738.7524,4.0000
241,513.9289,4.0000
242,317.5524,3.0625


## 3.Yol Vektorized(En Hızlısı)

In [None]:
# df[['total_bill' , 'tip']] = np.square(df[['total_bill' , 'tip']])
np.square(df[['total_bill' , 'tip']])

Unnamed: 0,total_bill,tip
0,288.6601,1.0201
1,106.9156,2.7556
2,441.4201,12.2500
3,560.7424,10.9561
4,604.6681,13.0321
...,...,...
239,842.7409,35.0464
240,738.7524,4.0000
241,513.9289,4.0000
242,317.5524,3.0625


In [None]:
df.describe()   #DataDrame'in özetini yazar

Unnamed: 0,total_bill,tip,size,price_per_person
count,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197
std,8.902412,1.383638,0.9511,2.914234
min,3.07,1.0,1.0,2.88
25%,13.3475,2.0,2.0,5.8
50%,17.795,2.9,2.0,7.255
75%,24.1275,3.5625,3.0,9.39
max,50.81,10.0,6.0,20.27


In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,230,Sun4608,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,322,Sun4458,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,994,Sun5260,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,221,Sun2251,$$


In [None]:
df.sort_values(by= 'total_bill', ascending=True) # ascending = True olduğu için küçükten büyüğe sıraladı.

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,267,Sat3455,$
92,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,392,Fri3780,$
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,887,Sat4801,$
172,7.25,5.15,Male,Yes,Sun,Dinner,2,3.62,Larry White,103,Sun9209,$
149,7.51,2.00,Male,No,Thur,Lunch,2,3.76,Daniel Robbins,889,Thur6321,$
...,...,...,...,...,...,...,...,...,...,...,...,...
182,45.35,3.50,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$
156,48.17,5.00,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$


In [None]:
df.sort_values('total_bill' , ascending = False , inplace = True)   #Büyükten küçüğe sıraladı ve bunu kalıcı hale getirdim

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
170,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
212,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
156,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
182,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


In [None]:
df.reset_index(inplace = True , drop = True)    #index'i resetleyip sıraladı

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


In [None]:
df.sort_values(['tip' , 'total_bill'] , ascending = False)   #ilk önce tip'e göre sırala sonra total_bill'e göre sırala. Büyükten küçüğe sıralar ascending = False olduğundan dolayı

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.00,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
10,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,808,Sat239,$$$
2,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
20,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,508,Thur1025,$$$
...,...,...,...,...,...,...,...,...,...,...,...,...
131,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,410,Sun2959,$$
195,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,965,Sat5032,$$
240,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,887,Sat4801,$
242,5.75,1.00,Female,Yes,Fri,Dinner,2,2.88,Leah Ramirez,392,Fri3780,$


In [None]:
df.head(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$
5,44.3,2.5,Female,Yes,Sat,Dinner,3,14.77,Heather Cohen,604,Sat6240,$$$
6,43.11,5.0,Female,Yes,Thur,Lunch,4,10.78,Brooke Soto,175,Thur9313,$$$
7,41.19,5.0,Male,No,Thur,Lunch,5,8.24,Eric Andrews,453,Thur3621,$$$
8,40.55,3.0,Male,Yes,Sun,Dinner,2,20.27,Stephen Cox,29,Sun5140,$$$
9,40.17,4.73,Male,Yes,Fri,Dinner,4,10.04,Aaron Bentley,690,Fri9628,$$$


In [None]:
df['price_per_person'].max()   #max eleman

20.27

In [None]:
df['price_per_person'].argmax()   #max elemanın indexi

np.int64(8)

In [None]:
df['price_per_person'].idxmax()   #max elemanın indexi

8

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


In [None]:
df.corr(numeric_only=True) # Makine öğrenmesi modelleri kurarken hayati öneme sahip.
                           # Numeric columnlar arasındaki doğrusal ilişkileri gösteriyor.

Unnamed: 0,total_bill,tip,size,price_per_person
total_bill,1.0,0.675734,0.598315,0.647554
tip,0.675734,1.0,0.489299,0.347405
size,0.598315,0.489299,1.0,-0.175359
price_per_person,0.647554,0.347405,-0.175359,1.0


In [None]:
df['sex'].value_counts()

Unnamed: 0_level_0,count
sex,Unnamed: 1_level_1
Male,157
Female,87


In [None]:
df.day.value_counts()

Unnamed: 0_level_0,count
day,Unnamed: 1_level_1
Sat,87
Sun,76
Thur,62
Fri,19


In [None]:
df.day.unique()   #day'in içinde 4 farklı değer var

array(['Sat', 'Sun', 'Thur', 'Fri'], dtype=object)

In [None]:
df.day.nunique() #kaç farklı unique değer var

4

In [None]:
len(df.day.unique())

4

In [None]:
# Amaç : sex column'undaki Male olanları 'M' , Female olanları 'F' haline getirmek.

## 1.Yol (.apply() metodu ile)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,Male,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,Male,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,Male,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


In [None]:
df['sex'] = df['sex'].apply(lambda x : 'M' if x == 'Male' else 'F')

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


## 2.Yol (.replace() Metodu ile)

In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


In [None]:
df['sex'].replace('Female' , 'F' , inplace = True)
df['sex'].replace('Male' , 'M' , inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['sex'].replace('Female' , 'F' , inplace = True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['sex'].replace('Male' , 'M' , inplace = True)


In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


In [None]:
df['sex'].replace(['Female' , 'Male'] , ['F' , 'M'])   #üstteki örneğin tek satırda yapılmış hali

Unnamed: 0,sex
0,M
1,M
2,M
3,M
4,M
...,...
239,M
240,F
241,M
242,F


## 3.Yol (.map() metodu ile)

In [None]:
df['sex'].map({'Female' : 'F' ,
               'Male' : 'M'})

Unnamed: 0,sex
0,
1,
2,
3,
4,
...,...
239,
240,
241,
242,


--------------------
pandas 4

In [None]:
simple_df = pd.DataFrame( data = [[1,2,2,2] , [1,2,3,4] , [1,2,2,2] , [1,2,2,2] ], index = ['a','b','c','d'] , columns = ['col1','col2','col3','col4'])
simple_df

Unnamed: 0,col1,col2,col3,col4
a,1,2,2,2
b,1,2,3,4
c,1,2,2,2
d,1,2,2,2


In [None]:
simple_df.duplicated()   #duplicated kopya veri demek. Kopya veriler genelde hatadan kaynaklanıyor.
#ilk önce a'ya baktı daha önceden yazılmış kopyası var mı yok o yüzden false yazdı. Sonra b'ye baktı daha önceden yazılmış kopyası var mı yok o zmn false yazdı.
#Sonra c'ye batı daha önce kopyası var mı var o zaman true yazdı. Daha sonra d'ye baktı daha önceden yazılmış kopyası var mı var o zaman true yazdı.

Unnamed: 0,0
a,False
b,False
c,True
d,True


In [None]:
simple_df.drop_duplicates()  #duplicated olanları düşür demek. Yani c ve d silinir. Fakat kalıcı olarak silmedim.(inplace = True yazmadım)
#c ve d a'nın duplicated'i olduğu için düşürüldü default değerden dolayı. Fakat biz keep ile istediğimizi yazar veya hepsini düşürebiliriz.

Unnamed: 0,col1,col2,col3,col4
a,1,2,2,2
b,1,2,3,4


In [None]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
2,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
3,48.17,5.0,M,No,Sun,Dinner,6,8.03,Ryan Gonzales,321,Sun7518,$$$
4,45.35,3.5,M,Yes,Sun,Dinner,3,15.12,Jose Parsons,910,Sun2337,$$$


In [None]:
# Amaç : total_bill değeri 10 ile 20 arasındaki satırları getirin.

In [None]:
df[10 < df['total_bill'] < 20] # Bu şekilde sorgu yapamıyoruz. Bunun için .between() methodu var.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [None]:
df[df['total_bill'].between(10,20)] # Default ayarlarda inclusive = 'both' dur. Yani iki taraf da dahildir.

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
97,19.82,3.18,M,No,Sat,Dinner,2,9.91,Christopher Ross,928,Sat6236,$$
98,19.81,4.19,F,Yes,Thur,Lunch,2,9.90,Kristy Boyd,068,Thur967,$$
99,19.77,2.00,M,No,Sun,Dinner,4,4.94,James Smith,229,Sun5814,$$
100,19.65,3.00,F,No,Sat,Dinner,2,9.82,Melinda Murphy,051,Sat2467,$$
101,19.49,3.51,M,No,Sun,Dinner,2,9.74,Michael Hamilton,768,Sun1118,$$
...,...,...,...,...,...,...,...,...,...,...,...,...
222,10.29,2.60,F,No,Sun,Dinner,2,5.14,Jessica Ibarra,713,Sun4474,$$
223,10.27,1.71,M,No,Sun,Dinner,2,5.14,William Riley,219,Sun2546,$$
224,10.09,2.00,F,Yes,Fri,Lunch,2,5.04,Ruth Weiss,635,Fri6359,$$
225,10.07,1.25,M,No,Sat,Dinner,2,5.04,Sean Gonzalez,605,Sat4615,$$


In [None]:
df[df['total_bill'].between(10,20 , inclusive = 'left')]   #sol dahil. Yani [10,20), right yazsaydık (10,20] olurdu.

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
97,19.82,3.18,M,No,Sat,Dinner,2,9.91,Christopher Ross,928,Sat6236,$$
98,19.81,4.19,F,Yes,Thur,Lunch,2,9.90,Kristy Boyd,068,Thur967,$$
99,19.77,2.00,M,No,Sun,Dinner,4,4.94,James Smith,229,Sun5814,$$
100,19.65,3.00,F,No,Sat,Dinner,2,9.82,Melinda Murphy,051,Sat2467,$$
101,19.49,3.51,M,No,Sun,Dinner,2,9.74,Michael Hamilton,768,Sun1118,$$
...,...,...,...,...,...,...,...,...,...,...,...,...
222,10.29,2.60,F,No,Sun,Dinner,2,5.14,Jessica Ibarra,713,Sun4474,$$
223,10.27,1.71,M,No,Sun,Dinner,2,5.14,William Riley,219,Sun2546,$$
224,10.09,2.00,F,Yes,Fri,Lunch,2,5.04,Ruth Weiss,635,Fri6359,$$
225,10.07,1.25,M,No,Sat,Dinner,2,5.04,Sean Gonzalez,605,Sat4615,$$


In [None]:
df.nlargest(5 , 'tip')     # nlargest en büyük değerleri yazar.

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
10,39.42,7.58,M,No,Sat,Dinner,4,9.86,Lance Peterson,808,Sat239,$$$
2,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
20,34.3,6.7,M,No,Thur,Lunch,6,5.72,Steven Carlson,508,Thur1025,$$$


In [None]:
df.sort_values('tip' , ascending= False).head(5)  #nlargest kullanmadan yazdırma -> df'i tip'e göre sort ettik, ascending ile büyükten küçüğe yazdırdık, head ile ilk kaç eleman almak istedğimizi yazdık

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
0,50.81,10.0,M,Yes,Sat,Dinner,3,16.94,Gregory Clark,236,Sat1954,$$$
1,48.33,9.0,M,No,Sat,Dinner,4,12.08,Alex Williamson,212,Sat4590,$$$
10,39.42,7.58,M,No,Sat,Dinner,4,9.86,Lance Peterson,808,Sat239,$$$
2,48.27,6.73,M,No,Sat,Dinner,4,12.07,Brian Ortiz,595,Sat8139,$$$
20,34.3,6.7,M,No,Thur,Lunch,6,5.72,Steven Carlson,508,Thur1025,$$$


In [None]:
df.nsmallest(columns= ['tip','total_bill'] , n = 3)   #tip ve total_bill'in en küçük 3 elemanını yazdırma

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,Fiyat_Kategorisi
243,3.07,1.0,F,Yes,Sat,Dinner,1,3.07,Tiffany Brock,267,Sat3455,$
242,5.75,1.0,F,Yes,Fri,Dinner,2,2.88,Leah Ramirez,392,Fri3780,$
240,7.25,1.0,F,No,Sat,Dinner,1,7.25,Terri Jones,887,Sat4801,$


In [None]:
#Yasin