![image-2.png](attachment:image-2.png)

# <font color='green'> <b>Importing Libraries </b><font color='black'>

In [2]:
import numpy as np
import pandas as pd

# <font color='green'> <b>DataFrames </b><font color='black'>

## <font color='blue'> <b>Creating a DataFrame</b><font color='black'>

DataFrame, iki boyutlu bir veri koleksiyonudur. 

Verilerin tablo şeklinde (tabular) saklandığı bir veri yapısıdır. 

Veri kümeleri satırlar ve sütunlar halinde düzenlenir; veri çerçevesinde birden çok veri kümesi depolayabiliriz.

DataFrame'i, aynı dizini paylaşmak için bir araya getirilmiş bir dizi Series nesnesi olarak düşünebiliriz.

Veri çerçevesine sütun/satır seçimi ve sütun/satır ekleme gibi çeşitli aritmetik işlemleri gerçekleştirebiliriz.

DataFrame'leri harici depolamadan içe aktarabiliriz; SQL Veritabanı, CSV dosyası ve bir Excel dosyası.

[SOURCE01](https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm), 
[SOURCE02](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), 
[SOURCE03](https://morioh.com/p/2528ac775b1b), 
[SOURCE04](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python), 
[SOURCE05](https://www.guru99.com/python-pandas-tutorial.html), 
[SOURCE06](https://realpython.com/pandas-dataframe/) &
[SOURCE07](https://towardsdatascience.com/a-simple-guide-to-pandas-dataframes-b125f64e1453)<br>
[VIDEO SOURCE01](https://www.youtube.com/watch?v=zmdjNSmRXF4), 
[VIDEO SOURCE02](https://www.youtube.com/watch?v=F6kmIpWWEdU) &
[VIDEO SOURCE03](https://towardsdatascience.com/pandas-dataframe-basics-3c16eb35c4f3)<br>

### <font color='blue'> <b>Creating a DataFrame Using the Lists of Data & Columns</b><font color='black'>

In [3]:
liste_1 = [[1,2,3],[4,5,6],[7,8,9]]

In [5]:
np.array(liste_1)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [4]:
pd.DataFrame(data = liste_1)

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


In [6]:
pd.DataFrame(data = liste_1, columns= ["A", "B", "C"])

Unnamed: 0,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9


In [7]:
pd.DataFrame(data = liste_1, columns= ["A", "B"])

ValueError: 2 columns passed, passed data had 3 columns

In [8]:
pd.DataFrame(data = liste_1, columns= ["A", "B", "C", "D"])

ValueError: 4 columns passed, passed data had 3 columns

In [10]:
pd.DataFrame(data = liste_1, columns= ["A", "B", "C"], index= ["sıfır", "bir", "iki"])

Unnamed: 0,A,B,C
sıfır,1,2,3
bir,4,5,6
iki,7,8,9


### <font color='blue'> <b>Creating a DataFrame Using a Numpy Arrays</b><font color='black'>

In [11]:
np.random.seed(25)
arr1 = np.random.randint(1,100, size= (3,4))
arr1

array([[ 5, 63, 91, 16],
       [62, 24, 45, 51],
       [ 9, 29,  5, 90]])

In [13]:
arr1.shape

(3, 4)

In [14]:
arr1.ndim

2

In [16]:
pd.DataFrame(arr1, columns= ["sütun1", "sütun2", "sütun3", "sütun4"] )

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
0,5,63,91,16
1,62,24,45,51
2,9,29,5,90


In [17]:
df = pd.DataFrame(arr1, columns= ["sütun1", "sütun2", "sütun3", "sütun4"] )
df

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
0,5,63,91,16
1,62,24,45,51
2,9,29,5,90


In [18]:
df1 = pd.DataFrame(arr1, columns= ["sütun1", "sütun2", "sütun3", "sütun4"], index= ["A", "B", "C"] )
df1

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
A,5,63,91,16
B,62,24,45,51
C,9,29,5,90


### <font color='blue'> <b>Creating a DataFrame Using a Dictionary</b><font color='black'>

In [19]:
dict_1 = {"name": ["Hakan", "Batuhan", "Yunus"], "age": [35, 30, 25], "job": ["data_analyst", "data_scientist", "business_analyst"]}
dict_1

{'name': ['Hakan', 'Batuhan', 'Yunus'],
 'age': [35, 30, 25],
 'job': ['data_analyst', 'data_scientist', 'business_analyst']}

In [20]:
pd.Series(dict_1)

name                             [Hakan, Batuhan, Yunus]
age                                         [35, 30, 25]
job     [data_analyst, data_scientist, business_analyst]
dtype: object

In [22]:
pd.DataFrame(dict_1)

Unnamed: 0,name,age,job
0,Hakan,35,data_analyst
1,Batuhan,30,data_scientist
2,Yunus,25,business_analyst


In [23]:
pd.DataFrame(dict_1, columns= ["isim", "age", "meslek"])

Unnamed: 0,isim,age,meslek
0,,35,
1,,30,
2,,25,


## <font color='blue'> <b>Basic Attributes & Methods of DataFrames</b><font color='black'>

In [24]:
df

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
0,5,63,91,16
1,62,24,45,51
2,9,29,5,90


In [26]:
df.head(1)

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
0,5,63,91,16


In [27]:
df.tail(1)

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
2,9,29,5,90


In [50]:
df.sample(2)

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
2,9,29,5,90
0,5,63,91,16


In [51]:
df.columns

Index(['sütun1', 'sütun2', 'sütun3', 'sütun4'], dtype='object')

In [52]:
df.columns[0]

'sütun1'

In [54]:
df

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
0,5,63,91,16
1,62,24,45,51
2,9,29,5,90


In [53]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [55]:
df.describe()

Unnamed: 0,sütun1,sütun2,sütun3,sütun4
count,3.0,3.0,3.0,3.0
mean,25.333333,38.666667,47.0,52.333333
std,31.817186,21.221059,43.03487,37.018014
min,5.0,24.0,5.0,16.0
25%,7.0,26.5,25.0,33.5
50%,9.0,29.0,45.0,51.0
75%,35.5,46.0,68.0,70.5
max,62.0,63.0,91.0,90.0


In [56]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sütun1,3.0,25.333333,31.817186,5.0,7.0,9.0,35.5,62.0
sütun2,3.0,38.666667,21.221059,24.0,26.5,29.0,46.0,63.0
sütun3,3.0,47.0,43.03487,5.0,25.0,45.0,68.0,91.0
sütun4,3.0,52.333333,37.018014,16.0,33.5,51.0,70.5,90.0


In [57]:
df2 = pd.DataFrame(dict_1)
df2

Unnamed: 0,name,age,job
0,Hakan,35,data_analyst
1,Batuhan,30,data_scientist
2,Yunus,25,business_analyst


In [59]:
df2.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,3.0,30.0,5.0,25.0,27.5,30.0,32.5,35.0


In [60]:
df2.describe(include= "object").T

Unnamed: 0,count,unique,top,freq
name,3,3,Hakan,1
job,3,3,data_analyst,1


In [61]:
np.nan == np.nan

False

In [63]:
df = pd.read_csv("adult_eda.csv")
df

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7.0,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12.0,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9.0,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9.0,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9.0,Never-married,Adm-clerical,,White,Male,0,0,20,United-States,<=50K


In [64]:
df1 = pd.read_csv("C:/Users/a/Desktop/adult_eda.csv")
df1

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7.0,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12.0,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9.0,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9.0,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9.0,Never-married,Adm-clerical,,White,Male,0,0,20,United-States,<=50K


In [65]:
df2 = df.copy()

In [66]:
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7.0,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [67]:
df.sample(20)

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
27937,21,State-gov,478457,Some-college,10.0,Never-married,Other-service,,Black,Female,0,0,12,United-States,<=50K
7192,44,Private,239876,Bachelors,13.0,Divorced,Prof-specialty,Unmarried,Black,Male,0,0,40,United-States,<=50K
7981,24,Private,249957,Some-college,10.0,Never-married,Exec-managerial,,White,Male,0,0,40,United-States,<=50K
20147,45,Private,203653,HS-grad,9.0,Married-civ-spouse,Adm-clerical,Husband,Black,Male,7298,0,40,United-States,>50K
25531,26,Private,360252,Assoc-acdm,12.0,Married-civ-spouse,Adm-clerical,Wife,White,Female,0,0,40,United-States,<=50K
23821,31,Private,174201,HS-grad,9.0,Married-civ-spouse,Transport-moving,Husband,White,Male,0,0,65,United-States,<=50K
6032,34,Private,96483,Assoc-acdm,12.0,Never-married,Adm-clerical,Not-in-family,Asian-Pac-Islander,Female,8614,0,60,United-States,>50K
10894,71,Private,124959,HS-grad,9.0,Widowed,Adm-clerical,Not-in-family,White,Female,0,0,40,United-States,<=50K
27416,22,Private,65225,10th,6.0,Never-married,Other-service,,White,Female,0,0,40,United-States,<=50K
23555,21,Private,180052,Some-college,10.0,Never-married,Transport-moving,,White,Male,0,0,30,United-States,<=50K


In [69]:
df.tail(10)

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
32551,32,Private,34066,10th,6.0,Married-civ-spouse,Handlers-cleaners,Husband,Amer-Indian-Eskimo,Male,0,0,40,United-States,<=50K
32552,43,Private,84661,Assoc-voc,11.0,Married-civ-spouse,Sales,Husband,White,Male,0,0,45,United-States,<=50K
32553,32,Private,116138,Masters,14.0,Never-married,Tech-support,Not-in-family,Asian-Pac-Islander,Male,0,0,11,Taiwan,<=50K
32554,53,Private,321865,Masters,14.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,40,United-States,>50K
32555,22,Private,310152,Some-college,10.0,Never-married,Protective-serv,Not-in-family,White,Male,0,0,40,United-States,<=50K
32556,27,Private,257302,Assoc-acdm,12.0,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9.0,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9.0,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9.0,Never-married,Adm-clerical,,White,Male,0,0,20,United-States,<=50K
32560,52,Self-emp-inc,287927,HS-grad,9.0,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,>50K


In [70]:
df.shape

(32561, 15)

In [71]:
df.size

488415

In [72]:
df.ndim

2

In [73]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   age             32561 non-null  int64  
 1   workclass       32561 non-null  object 
 2   fnlwgt          32561 non-null  int64  
 3   education       32561 non-null  object 
 4   education-num   31759 non-null  float64
 5   marital-status  32561 non-null  object 
 6   occupation      32561 non-null  object 
 7   relationship    27493 non-null  object 
 8   race            32561 non-null  object 
 9   sex             32561 non-null  object 
 10  capital-gain    32561 non-null  int64  
 11  capital-loss    32561 non-null  int64  
 12  hours-per-week  32561 non-null  int64  
 13  native-country  32561 non-null  object 
 14  salary          32561 non-null  object 
dtypes: float64(1), int64(5), object(9)
memory usage: 3.7+ MB


In [74]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,32561.0,38.581647,13.640433,17.0,28.0,37.0,48.0,90.0
fnlwgt,32561.0,189778.366512,105549.977697,12285.0,117827.0,178356.0,237051.0,1484705.0
education-num,31759.0,10.082843,2.576172,1.0,9.0,10.0,12.0,16.0
capital-gain,32561.0,1077.648844,7385.292085,0.0,0.0,0.0,0.0,99999.0
capital-loss,32561.0,87.30383,402.960219,0.0,0.0,0.0,0.0,4356.0
hours-per-week,32561.0,40.437456,12.347429,1.0,40.0,40.0,45.0,99.0


In [75]:
df.describe(include="object").T

Unnamed: 0,count,unique,top,freq
workclass,32561,9,Private,22696
education,32561,16,HS-grad,10501
marital-status,32561,7,Married-civ-spouse,14976
occupation,32561,15,Prof-specialty,4140
relationship,27493,5,Husband,13193
race,32561,5,White,27816
sex,32561,2,Male,21790
native-country,32561,42,United-States,29170
salary,32561,2,<=50K,24720


In [76]:
df.describe(include="all").T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
age,32561.0,,,,38.581647,13.640433,17.0,28.0,37.0,48.0,90.0
workclass,32561.0,9.0,Private,22696.0,,,,,,,
fnlwgt,32561.0,,,,189778.366512,105549.977697,12285.0,117827.0,178356.0,237051.0,1484705.0
education,32561.0,16.0,HS-grad,10501.0,,,,,,,
education-num,31759.0,,,,10.082843,2.576172,1.0,9.0,10.0,12.0,16.0
marital-status,32561.0,7.0,Married-civ-spouse,14976.0,,,,,,,
occupation,32561.0,15.0,Prof-specialty,4140.0,,,,,,,
relationship,27493.0,5.0,Husband,13193.0,,,,,,,
race,32561.0,5.0,White,27816.0,,,,,,,
sex,32561.0,2.0,Male,21790.0,,,,,,,


In [77]:
df.isnull()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32557,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32558,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
32559,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False


In [78]:
df.isnull().sum()

age                  0
workclass            0
fnlwgt               0
education            0
education-num      802
marital-status       0
occupation           0
relationship      5068
race                 0
sex                  0
capital-gain         0
capital-loss         0
hours-per-week       0
native-country       0
salary               0
dtype: int64

In [79]:
df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education-num',
       'marital-status', 'occupation', 'relationship', 'race', 'sex',
       'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
       'salary'],
      dtype='object')

In [80]:
df.rename(columns= {'education-num': 'education_num', 'marital-status': 'marital_status'})

Unnamed: 0,age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7.0,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13.0,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12.0,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9.0,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9.0,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9.0,Never-married,Adm-clerical,,White,Male,0,0,20,United-States,<=50K


In [81]:
df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education-num',
       'marital-status', 'occupation', 'relationship', 'race', 'sex',
       'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
       'salary'],
      dtype='object')

In [82]:
df.rename(columns= {'education-num': 'education_num', 'marital-status': 'marital_status'}, inplace=True)

In [83]:
df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education_num',
       'marital_status', 'occupation', 'relationship', 'race', 'sex',
       'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
       'salary'],
      dtype='object')

## <font color='blue'> <b>Indexing, Slicing & Selection</b><font color='black'>

[Source01](https://pandas.pydata.org/docs/user_guide/indexing.html),
[Source02](https://www.geeksforgeeks.org/slicing-indexing-manipulating-and-cleaning-pandas-dataframe/),
[Source03](https://www.tutorialspoint.com/python_pandas/python_pandas_indexing_and_selecting_data.htm),
[Source04](https://www.dataquest.io/blog/tutorial-indexing-dataframes-in-pandas/)

## <font color='blue'> <b>Creating a New Column</b><font color='black'>

## <font color='blue'> <b>Removing Columns & Rows</b><font color='black'>

drop() yöntemi, bir Pandas DataFrame'den belirtilen satır veya sütunu kaldırmak için kullanılır. 

Eğer sütun kaldırılacaksa, axis='columns' parametresi kullanılır ve belirtilen sütun kaldırılır. 

Benzer şekilde, eğer bir satır kaldırılacaksa, axis='index' parametresi kullanılır ve belirtilen satır kaldırılır.

Bu yöntem, DataFrame'den istenmeyen verileri kaldırmak ve veri manipülasyonu yapmak için sıkça kullanılır.