# Merhabalar

Bu notlarda python pandas kütüphanesi ile çalışırken aldığım notlar ve örnekler yer almaktadır.


Pandas Numpy dan farklı olarak bize yapısal veri setleriyle daha esnek bir şekilde çalışma imkanı sağlar. Pandas içerisinde daha farklı veri tipleriyle çalışabiliyoruz.

İyi Gezinmeler :)

#### Pandas kütüphanesi nasıl tanımlanır

In [1]:
import pandas as pd 

#### Pandas Serisi Oluşturmak

In [2]:
pd.Series([1,2,3,4,5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

Listenin type methoduyla tipini sorduğumuzda pandas serisi olduğunu görebiliriz.

In [3]:
a = pd.Series([1,2,3,4,5])
type(a)

pandas.core.series.Series

* **axes** : Serinin index bilgilerine erişiriz
* **ndim**: Boyut sayısı
* **shape**: Boyut bilgisi
* **size**: Toplam eleman sayısı
* **dytpe**: Array veri tipi
* **values** : Değerlere tek başına erişmek için kullanılır.

In [4]:
a.axes

[RangeIndex(start=0, stop=5, step=1)]

In [5]:
a.ndim

1

In [6]:
a.shape

(5,)

In [7]:
a.size

5

In [8]:
a.dtype

dtype('int64')

In [9]:
a.values

array([1, 2, 3, 4, 5], dtype=int64)

Pandas ta baştan elemanlarına erişmek için head fonksiyonu kullanılır.

In [10]:
a.head(3)

0    1
1    2
2    3
dtype: int64

Pandas ta sondan elemanlarına erişmek için head fonksiyonu kullanılır.

In [11]:
a.tail(2)

3    4
4    5
dtype: int64

İndex ile seride hangi değerin hangi indexte olduğunu yazabiliriz indexin değerini değiştirebilirz.

In [12]:
pd.Series([24,987,12,34,188,1], index = [1,3,5,7,9,11])

1      24
3     987
5      12
7      34
9     188
11      1
dtype: int64

In [13]:
seri_1 = pd.Series([24,987,12,34,188,1], index = ["bir","iki","uc","dort","bes","alti"])
seri_1

bir      24
iki     987
uc       12
dort     34
bes     188
alti      1
dtype: int64

In [14]:
seri_1["iki"]

987

In [15]:
seri_1["bir":"dort"]

bir      24
iki     987
uc       12
dort     34
dtype: int64

Sözlük üzerinden liste oluşturmak

In [16]:
sozluk = pd.Series({"on":10, "yirmi":20, "otuz":30})
sozluk

on       10
yirmi    20
otuz     30
dtype: int64

In [17]:
seri_2 = {"on":10, "yirmi":20, "otuz":30}
sozluk = pd.Series(seri_2,)
sozluk

on       10
yirmi    20
otuz     30
dtype: int64

İki seriyi birleştirmek için concat fonksşyoınunu kullanırız.

In [18]:
pd.concat([seri_1,sozluk])

bir       24
iki      987
uc        12
dort      34
bes      188
alti       1
on        10
yirmi     20
otuz      30
dtype: int64

#### Eleman İşlemleri

In [19]:
import numpy as np
a = np.array([1,2,3,4,5])
seri_1 = pd.Series(a)
seri_1

0    1
1    2
2    3
3    4
4    5
dtype: int32

In [20]:
seri_1[0]

1

In [21]:
seri_1[1:3]

1    2
2    3
dtype: int32

In [22]:
seri_2 = pd.Series([23,33,53,43,13], index = ["a","b","c","d","e"])
seri_2

a    23
b    33
c    53
d    43
e    13
dtype: int64

In [23]:
seri_2.keys

<bound method Series.keys of a    23
b    33
c    53
d    43
e    13
dtype: int64>

In [24]:
list(seri_2.items())

[('a', 23), ('b', 33), ('c', 53), ('d', 43), ('e', 13)]

In [25]:
seri_2.values

array([23, 33, 53, 43, 13], dtype=int64)

Eleman sorgulama

In [26]:
"d" in seri_2

True

In [27]:
"f" in seri_2

False

In [28]:
seri_2["c"]

53

Fancy ile eleman seçme

In [29]:
seri_2[["b","d"]]

b    33
d    43
dtype: int64

In [30]:
seri_2["a"] = 63
seri_2["a"]

63

In [31]:
seri_2["a":"c"]

a    63
b    33
c    53
dtype: int64

#### Pandan DataFrame Oluşturma

In [32]:
import pandas as pd
a = [1,4,9,16,12,13]
a

[1, 4, 9, 16, 12, 13]

In [33]:
pd.DataFrame(a, columns = ["Günlük Çalışma Saatleri"])

Unnamed: 0,Günlük Çalışma Saatleri
0,1
1,4
2,9
3,16
4,12
5,13


In [34]:
import numpy as np
b = np.arange(1,10).reshape(3,3)
b

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [35]:
df = pd.DataFrame(b, columns = ["a", "b", "c"])
df

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


sutun isimlerinin değiştirmek istersek columns fonksiyonuyla değiştirebiliriz.

In [36]:
df.columns = ("deger1", "deger2", "deger3")
df

Unnamed: 0,deger1,deger2,deger3
0,1,2,3
1,4,5,6
2,7,8,9


In [37]:
type(df)

pandas.core.frame.DataFrame

In [38]:
df.ndim

2

In [39]:
df.shape

(3, 3)

In [40]:
df.size

9

In [41]:
df.axes

[RangeIndex(start=0, stop=3, step=1),
 Index(['deger1', 'deger2', 'deger3'], dtype='object')]

In [42]:
df.values

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [43]:
type(df.values)

numpy.ndarray

In [44]:
df.head(1)

Unnamed: 0,deger1,deger2,deger3
0,1,2,3


In [45]:
df.tail(1)

Unnamed: 0,deger1,deger2,deger3
2,7,8,9


In [46]:
c = np.array([1,2,3,4,5])
pd.DataFrame(c, columns = ["degerler"])

Unnamed: 0,degerler
0,1
1,2
2,3
3,4
4,5


In [47]:
d = np.arange(0,64).reshape(8,8)
d

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

In [48]:
pd.DataFrame(d, columns = ["s", "a", "d", "u", "l","l","a","h"])

Unnamed: 0,s,a,d,u,l,l.1,a.1,h
0,0,1,2,3,4,5,6,7
1,8,9,10,11,12,13,14,15
2,16,17,18,19,20,21,22,23
3,24,25,26,27,28,29,30,31
4,32,33,34,35,36,37,38,39
5,40,41,42,43,44,45,46,47
6,48,49,50,51,52,53,54,55
7,56,57,58,59,60,61,62,63


#### DataFrame Eleman İşlemleri

In [49]:
import numpy as np
a = np.random.randint(10, size = 5)
b = np.random.randint(10, size = 5)
c = np.random.randint(10, size = 5)
d = np.random.randint(10, size = 5)
e = np.random.randint(10, size = 5)

In [50]:
sozluk = {"deger_1": a, "deger_2":b, "deger_3":c, "deger_4":d, "deger_5":e}
sozluk

{'deger_1': array([1, 4, 0, 5, 7]),
 'deger_2': array([9, 3, 3, 1, 5]),
 'deger_3': array([1, 9, 2, 8, 1]),
 'deger_4': array([4, 1, 2, 6, 5]),
 'deger_5': array([2, 4, 4, 6, 5])}

In [51]:
df = pd.DataFrame(sozluk)
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
0,1,9,1,4,2
1,4,3,9,1,4
2,0,3,2,2,4
3,5,1,8,6,6
4,7,5,1,5,5


In [52]:
df[0:2]

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
0,1,9,1,4,2
1,4,3,9,1,4


In [53]:
df.index = ["a", "b", "c", "d", "e"]
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
a,1,9,1,4,2
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [54]:
df["b":"d"]

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6


drop fonksiyonuyla sutunu sileriz ama ana serimizde değişme olmaz

In [55]:
df.drop("a", axis = 0)

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [56]:
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
a,1,9,1,4,2
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [57]:
eğer ana serimizde de değişme olmasını istiyorsak inplace kullanırız.

SyntaxError: invalid syntax (<ipython-input-57-6679d8e56d6f>, line 1)

In [58]:
df.drop("a", axis = 0, inplace = True)
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [59]:
df.drop("deger_2", axis = 1)

Unnamed: 0,deger_1,deger_3,deger_4,deger_5
b,4,9,1,4
c,0,2,2,4
d,5,8,6,6
e,7,1,5,5


In [60]:
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [61]:
silinicek_satirlar = ["c", "e"]

In [62]:
df.drop(silinicek_satirlar, axis = 0)

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
d,5,1,8,6,6


In [63]:
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [64]:
sorgu = ["deger_1","deger_8","deger_5","deger_7"]
for i in sorgu:
    print(i in df)

True
False
True
False


In [65]:
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [66]:
df["deger_6"] = df["deger_1"] / df["deger_2"]
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5,deger_6
b,4,3,9,1,4,1.333333
c,0,3,2,2,4,0.0
d,5,1,8,6,6,5.0
e,7,5,1,5,5,1.4


In [67]:
df.drop("deger_6", axis = 1, inplace = True)
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


In [68]:
silme = ["deger_2", "deger_3"]
df.drop(silme, axis = 1)

Unnamed: 0,deger_1,deger_4,deger_5
b,4,1,4
c,0,2,4
d,5,6,6
e,7,5,5


In [69]:
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
b,4,3,9,1,4
c,0,3,2,2,4
d,5,1,8,6,6
e,7,5,1,5,5


#### Gözlem ve Değişken Seçimi

In [70]:
import numpy as np
import pandas as pd
seri = np.random.randint(1,50, size = (10,5))
df = pd.DataFrame(seri , columns = ["deger_1", "deger_2", "deger_3", "deger_4", "deger_5"])
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
0,23,3,48,14,27
1,14,7,20,39,24
2,12,11,15,27,29
3,12,24,20,25,21
4,10,23,44,40,43
5,36,8,15,44,40
6,14,39,47,10,34
7,22,11,27,31,35
8,7,18,6,5,44
9,33,37,5,43,5


* **loc** : Tanımlandığı şekliyle seçim yapmak için kullanılır.
* **iloc** : Alışık olduğumuz indexleme mantığıyla seçim yapar

In [71]:
df.loc[:3]

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
0,23,3,48,14,27
1,14,7,20,39,24
2,12,11,15,27,29
3,12,24,20,25,21


In [72]:
df.iloc[:3]

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
0,23,3,48,14,27
1,14,7,20,39,24
2,12,11,15,27,29


In [73]:
df.iloc[0,0]

23

In [74]:
df.iloc[:3,:3]

Unnamed: 0,deger_1,deger_2,deger_3
0,23,3,48
1,14,7,20
2,12,11,15


In [75]:
df.loc[0:3, "deger_2"]

0     3
1     7
2    11
3    24
Name: deger_2, dtype: int32

In [76]:
df.iloc[0:3]["deger_2"]

0     3
1     7
2    11
Name: deger_2, dtype: int32

In [77]:
df.iloc[6:,1:3]

Unnamed: 0,deger_2,deger_3
6,39,47
7,11,27
8,18,6
9,37,5


#### Koşullu Eleman İşlemleri

In [78]:
import numpy as np
import pandas as pd
seri = np.random.randint(1,50, size = (10,5))
df = pd.DataFrame(seri , columns = ["deger_1", "deger_2", "deger_3", "deger_4", "deger_5"])
df

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
0,7,40,25,41,37
1,46,35,30,17,35
2,14,6,29,15,42
3,49,43,20,33,37
4,39,48,5,9,45
5,46,42,32,24,8
6,13,27,16,26,49
7,29,12,6,1,25
8,11,35,40,2,28
9,48,33,43,48,48


In [79]:
df["deger_1"][0:3]

0     7
1    46
2    14
Name: deger_1, dtype: int32

In [80]:
df[0:3][["deger_2", "deger_3"]]

Unnamed: 0,deger_2,deger_3
0,40,25
1,35,30
2,6,29


In [81]:
df[df.deger_1 > 20]

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
1,46,35,30,17,35
3,49,43,20,33,37
4,39,48,5,9,45
5,46,42,32,24,8
7,29,12,6,1,25
9,48,33,43,48,48


In [82]:
df[df.deger_1 > 20]["deger_1"]

1    46
3    49
4    39
5    46
7    29
9    48
Name: deger_1, dtype: int32

In [83]:
df[(df.deger_1 > 10) & (df.deger_3 < 15)]

Unnamed: 0,deger_1,deger_2,deger_3,deger_4,deger_5
4,39,48,5,9,45
7,29,12,6,1,25


In [84]:
df.loc[(df.deger_1 > 10) & (df.deger_3 < 15), ["deger_1", "deger_3"]]

Unnamed: 0,deger_1,deger_3
4,39,5
7,29,6


In [85]:
df[(df.deger_1 > 10) & (df.deger_3 < 15)][["deger_1", "deger_3"]]

Unnamed: 0,deger_1,deger_3
4,39,5
7,29,6


#### Birleştirme (Join) İşlemleri

In [86]:
import numpy as np
import pandas as pd
seri = np.random.randint(1,25, size = (5,3))
df = pd.DataFrame(seri , columns = ["deger_1", "deger_2", "deger_3"])
df

Unnamed: 0,deger_1,deger_2,deger_3
0,14,1,3
1,15,11,22
2,3,16,18
3,15,19,10
4,21,17,23


In [87]:
df_2 = df + 11
df_2

Unnamed: 0,deger_1,deger_2,deger_3
0,25,12,14
1,26,22,33
2,14,27,29
3,26,30,21
4,32,28,34


In [88]:
pd.concat([df,df_2])

Unnamed: 0,deger_1,deger_2,deger_3
0,14,1,3
1,15,11,22
2,3,16,18
3,15,19,10
4,21,17,23
0,25,12,14
1,26,22,33
2,14,27,29
3,26,30,21
4,32,28,34


ignore_index ile birleştirme sonrası bozuk olan sıralamayı düzeltebiliriz

In [89]:
pd.concat([df,df_2], ignore_index=True)

Unnamed: 0,deger_1,deger_2,deger_3
0,14,1,3
1,15,11,22
2,3,16,18
3,15,19,10
4,21,17,23
5,25,12,14
6,26,22,33
7,14,27,29
8,26,30,21
9,32,28,34


In [90]:
df_2.columns

Index(['deger_1', 'deger_2', 'deger_3'], dtype='object')

In [91]:
df_2.columns = ["deger_1", "degisken_2", "deger_3"]
df_2

Unnamed: 0,deger_1,degisken_2,deger_3
0,25,12,14
1,26,22,33
2,14,27,29
3,26,30,21
4,32,28,34


In [92]:
df

Unnamed: 0,deger_1,deger_2,deger_3
0,14,1,3
1,15,11,22
2,3,16,18
3,15,19,10
4,21,17,23


Değişken isimlendirmeleri farklı olduğundan birleştirme işlemi sonrası bazı değerlerin nan ifadesi aldığını gördük. Bu tür durumlarda join = "inner" i kullanarak keşisimlerinden oluşan sonuçları görebiliriz.

In [93]:
pd.concat([df,df_2])

Unnamed: 0,deger_1,deger_2,deger_3,degisken_2
0,14,1.0,3,
1,15,11.0,22,
2,3,16.0,18,
3,15,19.0,10,
4,21,17.0,23,
0,25,,14,12.0
1,26,,33,22.0
2,14,,29,27.0
3,26,,21,30.0
4,32,,34,28.0


In [94]:
pd.concat([df,df_2], join = "inner", ignore_index=True)

Unnamed: 0,deger_1,deger_3
0,14,3
1,15,22
2,3,18
3,15,10
4,21,23
5,25,14
6,26,33
7,14,29
8,26,21
9,32,34


#### İleri Birleştirme İşlemleri

In [95]:
import pandas as pd
df1 = pd.DataFrame ({"calisanlar": ["Ali","Veli","Ahmet", "Mehmet"],
                    "calisma_alani" : ["Muhasebe", "Muhendis","Muhendis","Cayci"]})
df1

Unnamed: 0,calisanlar,calisma_alani
0,Ali,Muhasebe
1,Veli,Muhendis
2,Ahmet,Muhendis
3,Mehmet,Cayci


In [96]:
df2 = pd.DataFrame({"calisanlar": ["Ali","Veli","Ahmet", "Mehmet"],
                   "giris_yili": [2016,2020,2012,1998]})
df2

Unnamed: 0,calisanlar,giris_yili
0,Ali,2016
1,Veli,2020
2,Ahmet,2012
3,Mehmet,1998


In [97]:
pd.merge(df1,df2)

Unnamed: 0,calisanlar,calisma_alani,giris_yili
0,Ali,Muhasebe,2016
1,Veli,Muhendis,2020
2,Ahmet,Muhendis,2012
3,Mehmet,Cayci,1998


In [98]:
pd.merge(df1,df2, on = "calisanlar")

Unnamed: 0,calisanlar,calisma_alani,giris_yili
0,Ali,Muhasebe,2016
1,Veli,Muhendis,2020
2,Ahmet,Muhendis,2012
3,Mehmet,Cayci,1998


In [99]:
df3 = pd.merge(df1,df2)
df3

Unnamed: 0,calisanlar,calisma_alani,giris_yili
0,Ali,Muhasebe,2016
1,Veli,Muhendis,2020
2,Ahmet,Muhendis,2012
3,Mehmet,Cayci,1998


In [100]:
df4 = pd.DataFrame({"calisma_alani": ["Muhasebe","Muhendis","Cayci"],
                   "mudur" : ["Yavuz", "Sadullah", "Yasin"]})
df4

Unnamed: 0,calisma_alani,mudur
0,Muhasebe,Yavuz
1,Muhendis,Sadullah
2,Cayci,Yasin


In [101]:
pd.merge(df3,df4)

Unnamed: 0,calisanlar,calisma_alani,giris_yili,mudur
0,Ali,Muhasebe,2016,Yavuz
1,Veli,Muhendis,2020,Sadullah
2,Ahmet,Muhendis,2012,Sadullah
3,Mehmet,Cayci,1998,Yasin


In [102]:
df5 = pd.DataFrame({"calisma_alani": ["Muhasebe","Muhasebe","Muhendis","Muhendis","Muhendis","Cayci"],
                   "yetenekler": ["Excel","Matematik", "Python", "PHP","R","Çay demleme"]})
df5             

Unnamed: 0,calisma_alani,yetenekler
0,Muhasebe,Excel
1,Muhasebe,Matematik
2,Muhendis,Python
3,Muhendis,PHP
4,Muhendis,R
5,Cayci,Çay demleme


In [103]:
pd.merge(df4,df5)

Unnamed: 0,calisma_alani,mudur,yetenekler
0,Muhasebe,Yavuz,Excel
1,Muhasebe,Yavuz,Matematik
2,Muhendis,Sadullah,Python
3,Muhendis,Sadullah,PHP
4,Muhendis,Sadullah,R
5,Cayci,Yasin,Çay demleme


In [104]:
pd.merge(df1,df5)

Unnamed: 0,calisanlar,calisma_alani,yetenekler
0,Ali,Muhasebe,Excel
1,Ali,Muhasebe,Matematik
2,Veli,Muhendis,Python
3,Veli,Muhendis,PHP
4,Veli,Muhendis,R
5,Ahmet,Muhendis,Python
6,Ahmet,Muhendis,PHP
7,Ahmet,Muhendis,R
8,Mehmet,Cayci,Çay demleme


#### Toplulaştırma ve Gruplama

**Toplulaştırma Fonksiyonları**
* count()
* first()
* last()
* mean()
* median()
* min()
* max()
* std()
* var()
* sum()

Seaborn kütüphanesini tanımlama

In [105]:
import seaborn as sns

In [106]:
df = sns.load_dataset("planets")
df

Unnamed: 0,method,number,orbital_period,mass,distance,year
0,Radial Velocity,1,269.300000,7.10,77.40,2006
1,Radial Velocity,1,874.774000,2.21,56.95,2008
2,Radial Velocity,1,763.000000,2.60,19.84,2011
3,Radial Velocity,1,326.030000,19.40,110.62,2007
4,Radial Velocity,1,516.220000,10.50,119.47,2009
...,...,...,...,...,...,...
1030,Transit,1,3.941507,,172.00,2006
1031,Transit,1,2.615864,,148.00,2007
1032,Transit,1,3.191524,,174.00,2007
1033,Transit,1,4.125083,,293.00,2008


In [107]:
df.shape

(1035, 6)

In [108]:
df.ndim

2

In [109]:
df.size

6210

In [110]:
df.mean()

number               1.785507
orbital_period    2002.917596
mass                 2.638161
distance           264.069282
year              2009.070531
dtype: float64

In [111]:
df["orbital_period"].mean()

2002.9175960947584

In [112]:
df["orbital_period"].count()

992

In [113]:
df["orbital_period"].min()

0.09070629

In [114]:
df["orbital_period"].max()

730000.0

In [115]:
df["orbital_period"].sum()

1986894.255326

In [116]:
df["orbital_period"].std()

26014.72830406252

In [117]:
df["orbital_period"].var()

676766088.7341915

In [118]:
df.describe()

Unnamed: 0,number,orbital_period,mass,distance,year
count,1035.0,992.0,513.0,808.0,1035.0
mean,1.785507,2002.917596,2.638161,264.069282,2009.070531
std,1.240976,26014.728304,3.818617,733.116493,3.972567
min,1.0,0.090706,0.0036,1.35,1989.0
25%,1.0,5.44254,0.229,32.56,2007.0
50%,1.0,39.9795,1.26,55.25,2010.0
75%,2.0,526.005,3.04,178.5,2012.0
max,7.0,730000.0,25.0,8500.0,2014.0


In [119]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
number,1035.0,1.785507,1.240976,1.0,1.0,1.0,2.0,7.0
orbital_period,992.0,2002.917596,26014.728304,0.090706,5.44254,39.9795,526.005,730000.0
mass,513.0,2.638161,3.818617,0.0036,0.229,1.26,3.04,25.0
distance,808.0,264.069282,733.116493,1.35,32.56,55.25,178.5,8500.0
year,1035.0,2009.070531,3.972567,1989.0,2007.0,2010.0,2012.0,2014.0


In [120]:
df.dropna().describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
number,498.0,1.73494,1.17572,1.0,1.0,1.0,2.0,6.0
orbital_period,498.0,835.778671,1469.128259,1.3283,38.27225,357.0,999.6,17337.5
mass,498.0,2.50932,3.636274,0.0036,0.2125,1.245,2.8675,25.0
distance,498.0,52.068213,46.596041,1.35,24.4975,39.94,59.3325,354.0
year,498.0,2007.37751,4.167284,1989.0,2005.0,2009.0,2011.0,2014.0


In [121]:
import pandas as pd
df = pd.DataFrame({"gruplar": ["A","B","C","A","B","C"],
                  "veri":[10,15,53,42,12,55]}, columns=["gruplar", "veri"])
df

Unnamed: 0,gruplar,veri
0,A,10
1,B,15
2,C,53
3,A,42
4,B,12
5,C,55


In [122]:
df.groupby("gruplar")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001B6686E1B80>

In [123]:
df.groupby("gruplar").mean()

Unnamed: 0_level_0,veri
gruplar,Unnamed: 1_level_1
A,26.0
B,13.5
C,54.0


In [124]:
df.groupby("gruplar").sum()

Unnamed: 0_level_0,veri
gruplar,Unnamed: 1_level_1
A,52
B,27
C,108


In [125]:
df = sns.load_dataset("planets")
df

Unnamed: 0,method,number,orbital_period,mass,distance,year
0,Radial Velocity,1,269.300000,7.10,77.40,2006
1,Radial Velocity,1,874.774000,2.21,56.95,2008
2,Radial Velocity,1,763.000000,2.60,19.84,2011
3,Radial Velocity,1,326.030000,19.40,110.62,2007
4,Radial Velocity,1,516.220000,10.50,119.47,2009
...,...,...,...,...,...,...
1030,Transit,1,3.941507,,172.00,2006
1031,Transit,1,2.615864,,148.00,2007
1032,Transit,1,3.191524,,174.00,2007
1033,Transit,1,4.125083,,293.00,2008


In [126]:
df.groupby("method")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001B6686E1EB0>

In [127]:
df.groupby("method")["orbital_period"].mean()

method
Astrometry                          631.180000
Eclipse Timing Variations          4751.644444
Imaging                          118247.737500
Microlensing                       3153.571429
Orbital Brightness Modulation         0.709307
Pulsar Timing                      7343.021201
Pulsation Timing Variations        1170.000000
Radial Velocity                     823.354680
Transit                              21.102073
Transit Timing Variations            79.783500
Name: orbital_period, dtype: float64

In [128]:
df.groupby("method")["orbital_period"].sum()

method
Astrometry                       1.262360e+03
Eclipse Timing Variations        4.276480e+04
Imaging                          1.418973e+06
Microlensing                     2.207500e+04
Orbital Brightness Modulation    2.127920e+00
Pulsar Timing                    3.671511e+04
Pulsation Timing Variations      1.170000e+03
Radial Velocity                  4.553151e+05
Transit                          8.377523e+03
Transit Timing Variations        2.393505e+02
Name: orbital_period, dtype: float64

In [129]:
df.groupby("method")["orbital_period"].describe().T

method,Astrometry,Eclipse Timing Variations,Imaging,Microlensing,Orbital Brightness Modulation,Pulsar Timing,Pulsation Timing Variations,Radial Velocity,Transit,Transit Timing Variations
count,2.0,9.0,12.0,7.0,3.0,5.0,1.0,553.0,397.0,3.0
mean,631.18,4751.644444,118247.7375,3153.571429,0.709307,7343.021201,1170.0,823.35468,21.102073,79.7835
std,544.217663,2499.130945,213978.177277,1113.166333,0.725493,16313.265573,,1454.92621,46.185893,71.599884
min,246.36,1916.25,4639.15,1825.0,0.240104,0.090706,1170.0,0.73654,0.355,22.3395
25%,438.77,2900.0,8343.9,2375.0,0.291496,25.262,1170.0,38.021,3.16063,39.67525
50%,631.18,4343.5,27500.0,3300.0,0.342887,66.5419,1170.0,360.2,5.714932,57.011
75%,823.59,5767.0,94250.0,3550.0,0.943908,98.2114,1170.0,982.0,16.1457,108.5055
max,1016.0,10220.0,730000.0,5100.0,1.544929,36525.0,1170.0,17337.5,331.60059,160.0


#### İleri Toplulaştırma İşlemleri (Aggregate,Filter,Transform,Apply)

In [130]:
import pandas as pd
df = pd.DataFrame({"gruplar": ["A", "B", "C", "A", "B", "C"],
                  "degisken1": [10,25,99,53,35,43],
                  "degisken2": [24,48,64,192,255,888]}, columns = ["gruplar","degisken1","degisken2"])
df

Unnamed: 0,gruplar,degisken1,degisken2
0,A,10,24
1,B,25,48
2,C,99,64
3,A,53,192
4,B,35,255
5,C,43,888


In [131]:
df.groupby("gruplar").mean()

Unnamed: 0_level_0,degisken1,degisken2
gruplar,Unnamed: 1_level_1,Unnamed: 2_level_1
A,31.5,108.0
B,30.0,151.5
C,71.0,476.0


In [132]:
df.groupby("gruplar").aggregate([min, np.median, max])

Unnamed: 0_level_0,degisken1,degisken1,degisken1,degisken2,degisken2,degisken2
Unnamed: 0_level_1,min,median,max,min,median,max
gruplar,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A,10,31.5,53,24,108.0,192
B,25,30.0,35,48,151.5,255
C,43,71.0,99,64,476.0,888


In [133]:
df.groupby("gruplar").aggregate({"degisken1": [min,max], "degisken2": np.median})

Unnamed: 0_level_0,degisken1,degisken1,degisken2
Unnamed: 0_level_1,min,max,median
gruplar,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
A,10,53,108.0
B,25,35,151.5
C,43,99,476.0


standart sapması 9 dan büyük alan grupları göstermek istiyoruz bunun için filtreleme işlemi yapmamız lazım

In [134]:
def filter_func(filtrele):
    return filtrele["degisken1"].std() > 9

df.groupby("gruplar").filter(filter_func)

Unnamed: 0,gruplar,degisken1,degisken2
0,A,10,24
2,C,99,64
3,A,53,192
5,C,43,888


bir de normal grupların standart sapmalarına bakalım

In [135]:
df.groupby("gruplar").std()

Unnamed: 0_level_0,degisken1,degisken2
gruplar,Unnamed: 1_level_1,Unnamed: 2_level_1
A,30.405592,118.793939
B,7.071068,146.371104
C,39.59798,582.655988


In [136]:
df_1 = df.iloc[:,1:3]

In [137]:
df_1.transform(lambda x: (x-x.mean()) / x.std())

Unnamed: 0,degisken1,degisken2
0,-1.113821,-0.675011
1,-0.624827,-0.601762
2,1.787547,-0.552929
3,0.287964,-0.162267
4,-0.29883,0.030012
5,-0.038033,1.961958


In [138]:
import pandas as pd
df = pd.DataFrame({"degisken1": [10,25,99,53,35,43],
                  "degisken2": [24,48,64,192,255,888]}, 
                  columns = ["degisken1","degisken2"])
df

Unnamed: 0,degisken1,degisken2
0,10,24
1,25,48
2,99,64
3,53,192
4,35,255
5,43,888


In [139]:
df.apply(np.sum)

degisken1     265
degisken2    1471
dtype: int64

#### Pivot Tablolar

In [140]:
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset("titanic")
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [141]:
titanic.groupby("sex")[["survived"]].mean()

Unnamed: 0_level_0,survived
sex,Unnamed: 1_level_1
female,0.742038
male,0.188908


In [142]:
titanic.groupby(["sex", "class"])[["survived"]].aggregate("mean").unstack()

Unnamed: 0_level_0,survived,survived,survived
class,First,Second,Third
sex,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [143]:
titanic.pivot_table("survived", index = "sex", columns = "class")

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [144]:
titanic.age.head(10)

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
5     NaN
6    54.0
7     2.0
8    27.0
9    14.0
Name: age, dtype: float64

In [145]:
age = pd.cut(titanic["age"], [0, 18, 90])
age.head(10)

0    (18.0, 90.0]
1    (18.0, 90.0]
2    (18.0, 90.0]
3    (18.0, 90.0]
4    (18.0, 90.0]
5             NaN
6    (18.0, 90.0]
7     (0.0, 18.0]
8    (18.0, 90.0]
9     (0.0, 18.0]
Name: age, dtype: category
Categories (2, interval[int64]): [(0, 18] < (18, 90]]

In [146]:
titanic.pivot_table("survived", ["sex", age], "class")

Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,"(0, 18]",0.909091,1.0,0.511628
female,"(18, 90]",0.972973,0.9,0.423729
male,"(0, 18]",0.8,0.6,0.215686
male,"(18, 90]",0.375,0.071429,0.133663


#### Dış Kaynaklardan Veri Okumak

In [147]:
import pandas as pd
pd.read_csv("reading_data/ornekcsv.csv", sep = ";")

Unnamed: 0,a,b,c
0,78,12,1.0
1,78,12,2.0
2,78,324,3.0
3,7,2,4.0
4,88,23,5.0
5,6,2,
6,56,11,6.0
7,7,12,7.0
8,56,21,7.0
9,346,2,8.0


In [148]:
pd.read_csv("reading_data/duz_metin.txt")

Unnamed: 0,1 2
0,2 2
1,3 2
2,4 2
3,5 2
4,6 2
5,7 2
6,8 2
7,9 2
8,10 2


In [149]:
df = pd.read_excel("reading_data/ornekx.xlsx")

In [150]:
type(df)

pandas.core.frame.DataFrame

In [151]:
df.columns = ("A", "B", "C")
df

Unnamed: 0,A,B,C
0,78,12,1.0
1,78,12,2.0
2,78,324,3.0
3,7,2,4.0
4,88,23,5.0
5,6,2,
6,56,11,6.0
7,7,12,7.0
8,56,21,7.0
9,346,2,8.0


In [152]:
tips = pd.read_csv("reading_data/tips.txt")
tips.head(10)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
5,25.29,4.71,Male,No,Sun,Dinner,4
6,8.77,2.0,Male,No,Sun,Dinner,2
7,26.88,3.12,Male,No,Sun,Dinner,4
8,15.04,1.96,Male,No,Sun,Dinner,2
9,14.78,3.23,Male,No,Sun,Dinner,2
