# PANDAS (Python Data Analysis Library

## Library for data analysis
## Extremely powerful table(DataFrame) system built off of NumPy
## https://pandas.pydata.org/docs/

# What can we do with Pandas?

## reading/writing data between many formats
## grab data based on indexing, logic, subsetting, etc.
## handle missing data
## adjust and restructure data

# Pandas Section Overview
## Series and DataFrames
## Missing Data
## Group By Operations
## Combining DataFrames
## Text Methods and Time Methods
## Inputs and Outputs

# What kind of data does pandas handle?

![image.png](attachment:image.png)

## A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.

## Each column in a DataFrame is a Series

In [None]:
# Bir dataframe deki her bir kolon feature/özellik olarak adlandırılır. 
# Her bir satır için özellikleri ve
# karakteristikleri içeren örnekler de denilebilir.
# satırlar verinin örnekleridir.

# Bu format machine learning algoritmalarına başlamak için gereklidir.

# SERIES

## A series is a data structure in Pandas that holds an array of information along with a named index.

## The named index differentiates this from a simple NumPy array.

## Formal Definition: One-dimensional ndarray with axis labels.

In [None]:
# Numpy arraylerinden farklı olarak Pandas serileri string etiketlerle indexlenmiş tek boyutlu arraylerdir.

# Creating of  Pandas Series

In [3]:
import numpy as np

In [4]:
import pandas as pd

In [5]:
# help(pd.Series)

In [6]:
myindex = [ "USA", "Canada", "Mexico"]

In [31]:
mydata = [1776, 1867, 1821]

## pd.Series(data, index)  ile pandas serisi oluşturma

In [32]:
pd.Series()  # shift+tab+tab ile doc_stringe bakarak detaylı özelliklerini inceleyiniz.

In [None]:
myser = pd.Series(data = mydata)  # Sadece veri girildiğinde default olarak sayısal index oluşturur.

In [11]:
myser

0    1776
1    1867
2    1821
dtype: int64

In [12]:
# veri ile birlikte index bilgisini de girerseniz labeled indexli bir seri oluşturur.

myser = pd.Series(data = mydata, index = myindex)

In [13]:
myser

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [17]:
myser = pd.Series(index = myindex, data = mydata)  # index ve data atama yapılarak verildiğinde sıralamasının bir önemi yoktur.
myser

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [20]:
myser = pd.Series(mydata, myindex)  # index ve data atama yapılmadan verildiğinde sıralamasına dikkat edilmelidir.
myser

USA       1776
Canada    1867
Mexico    1821
dtype: int64

In [14]:
myser[0]  # sayısal indeks ile veri yakalama

1776

In [15]:
myser["USA"]  # labeled index ile veri yakalama

1776

In [5]:
# NaN değer içeren listelerde de Pandas Serisi oluşturulabilir.
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

## Dictionary den pandas serisi oluşturma

In [22]:
ages = {"Sam" : 5, "Frank" : 10, "Spike" : 7}

In [23]:
pd.Series(ages)  # dictionary den pandas serisi oluşturulduğunda parantez içine doğrudan dict. name yazılır.

# dict. keyleri labeled indexe dönüşür.

Sam       5
Frank    10
Spike     7
dtype: int64

## Operations with Pandas Series

In [9]:
# Imaginary Sales Data for 1st and 2nd Quarters for Global Company

q1 = {'Japan': 80, 'China': 450, 'India': 200, 'USA': 250}
q2 = {'Brazil': 100,'China': 500, 'India': 210,'USA': 260}

In [10]:
sales_q1 = pd.Series(q1)

In [11]:
sales_q2 = pd.Series(q2)

In [12]:
type(sales_q1)

pandas.core.series.Series

In [13]:
sales_q1

Japan     80
China    450
India    200
USA      250
dtype: int64

In [14]:
sales_q2

Brazil    100
China     500
India     210
USA       260
dtype: int64

In [15]:
sales_q1["Japan"]  # Case sensitive old. için keylerin doğru yazılması gerekir.

80

In [16]:
sales_q1[0]

80

In [17]:
sales_q1.keys()  # indeks isimlerini sırasıyla doğru görmek için. 

Index(['Japan', 'China', 'India', 'USA'], dtype='object')

## Pandas serileri numpy arraylerden oluşturulduğu için arraylerde yapılan işlemler serilerde de yapılabilmektedir.

In [18]:
sales_q1 * 2  # pandas serileri bir sayı ile çarpıldığında serinin bütün elemanları o sayı ile çarpılır.

Japan    160
China    900
India    400
USA      500
dtype: int64

In [19]:
sales_q1 /10  # pandas serileri bir sayıya bölündüğünde serinin bütün elemanları o sayıya bölünür.

Japan     8.0
China    45.0
India    20.0
USA      25.0
dtype: float64

### İki seri arasında işlem yaparken label indexler(key) ler kullanılır.

In [21]:
sales_q1 + sales_q2  # iki seriyi toplayarak birleştirdiğimizde her iki seride de olmayan değerler için default olarak NaN gelir. 

Brazil      NaN
China     950.0
India     410.0
Japan       NaN
USA       510.0
dtype: float64

In [24]:
# İki seriyi add fonksiyonu ile birleştirdiğimizde bir seride olmayan değerler için fill_value = 0 verilerek 
# diğer serideki değer ile 0'ın toplamı sonuçta yer alır.

sales_q1.add(sales_q2, fill_value = 0)

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64

In [25]:
sales_q1.dtype

dtype('int64')

In [28]:
first_half = sales_q1.add(sales_q2, fill_value = 0)
first_half

Brazil    100.0
China     950.0
India     410.0
Japan      80.0
USA       510.0
dtype: float64

In [29]:
first_half.dtype  # Pandas serileri ile işlem yaptıktan sonraki veri türünüz FLOAT a döner.

dtype('float64')

# DataFrames

## Pandas DataFrame is two-dimensional size-mutable,
## potentially heterogeneous tabular data structure with labeled axes (rows and columns).

## A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

## Pandas DataFrame consists of three principal components, the data, rows, and columns.

## DataFrames are an extremely powerful tool and a natural extension of the Pandas Series. 

## By definition all a DataFrame is:

## A Pandas DataFrame consists of multiple Pandas Series that share index values.

## In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.

## Creating DataFrame

In [None]:
#  pd.DataFrame()  # Docstringini incele!!!

In [38]:
# Creating DataFrame using List

lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']

In [39]:
df = pd.DataFrame(lst)  # index belirtilmediği için default olarak kendisi verdi.

In [40]:
df

Unnamed: 0,0
0,Geeks
1,For
2,Geeks
3,is
4,portal
5,for
6,Geeks


In [61]:
type(df)

pandas.core.frame.DataFrame

In [None]:
# Creating DataFrame using Dictionary

# To create DataFrame from dict of narray/list, all the narray must be of same length.
# If index is passed then the length index should be equal to the length of arrays.
# If no index is passed, then by default, index will be range(n) where n is the array length.

In [46]:
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
        'Age':[20, 21, 19, 18]}

In [47]:
dt = pd.DataFrame(data)

In [48]:
dt

Unnamed: 0,Name,Age
0,Tom,20
1,nick,21
2,krish,19
3,jack,18


In [49]:
# Creating empty DataFrame

emt = pd.DataFrame()
print(emt)

Empty DataFrame
Columns: []
Index: []


In [58]:
# https://www.geeksforgeeks.org/python-pandas-dataframe/

In [65]:
# Cretaing a DataFrame by adding label index

lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']

lblindx = ['L1', 'L2', 'L3', 'L4', 'L5', 'L6', 'L7']

In [66]:
df2 = pd.DataFrame(lst, lblindx)

In [67]:
df2

Unnamed: 0,0
L1,Geeks
L2,For
L3,Geeks
L4,is
L5,portal
L6,for
L7,Geeks


In [None]:
# Creating a DataFrame form Python Objects

In [77]:
# help(pd.DataFrame)

In [78]:
np.random.seed(101)
mydata = np.random.randint(0,101,(4,3))

In [79]:
mydata

array([[95, 11, 81],
       [70, 63, 87],
       [75,  9, 77],
       [40,  4, 63]])

In [80]:
myindex = ['CA','NY','AZ','TX']

In [81]:
mycolumns = ['Jan','Feb','Mar']

In [82]:
df = pd.DataFrame(data=mydata)  # index ve column belirtilimediğinde default olarak index numaralarını label olarak verir.
df

Unnamed: 0,0,1,2
0,95,11,81
1,70,63,87
2,75,9,77
3,40,4,63


In [83]:
df = pd.DataFrame(data=mydata,index=myindex)
df

Unnamed: 0,0,1,2
CA,95,11,81
NY,70,63,87
AZ,75,9,77
TX,40,4,63


In [85]:
df = pd.DataFrame(data=mydata,index=myindex,columns=mycolumns)
df

Unnamed: 0,Jan,Feb,Mar
CA,95,11,81
NY,70,63,87
AZ,75,9,77
TX,40,4,63


In [86]:
df.info()  # DataFrame özelliklerine bakmak için

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, CA to TX
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Jan     4 non-null      int32
 1   Feb     4 non-null      int32
 2   Mar     4 non-null      int32
dtypes: int32(3)
memory usage: 80.0+ bytes


In [12]:
# Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

dates = pd.date_range("20130101", periods=6)
print(dates)

df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list("ABCD"))
print(df)

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')
                   A         B         C         D
2013-01-01  0.730815 -0.928100 -0.035101 -0.062241
2013-01-02  1.001680  0.015805  1.350519  0.546987
2013-01-03 -0.806899  1.911718 -1.396898  0.111166
2013-01-04 -1.031674  0.733916 -0.812865  1.183733
2013-01-05  1.234111  0.689209  0.222210  1.592662
2013-01-06  0.792277 -1.009187 -0.536521  0.652225


In [13]:
df.dtypes # DataFrame in her bir sütunundaki eleman türlerine bakmak için

A    float64
B    float64
C    float64
D    float64
dtype: object

## Reading a .csv file for a DataFrame

## NOTE: We will go over all kinds of data inputs and outputs (.html, .csv, .xlxs , etc...) later on in the course! For now we just need to read in a simple .csv file.

### Understanding File Paths
You have two options when reading a file with pandas:

1. If your .py file or .ipynb notebook is located in the exact same folder location as the .csv file you want to read, simply pass in the file name as a string, for example:

 df = pd.read_csv('some_file.csv')
 

2. Pass in the entire file path if you are located in a different directory. The file path must be 100% correct in order for this to work. For example:

 df = pd.read_csv("C:\\Users\\myself\\files\\some_file.csv") 

### print your current directory

In [17]:
pwd

'C:\\Users\\user\\PANDAS'

### List the files in your current directory with ls

In [27]:
ls

 Volume in drive C is OS
 Volume Serial Number is F683-A709

 Directory of C:\Users\user\PANDAS

22.04.2022  18:59    <DIR>          .
22.04.2022  18:59    <DIR>          ..
20.04.2022  15:14    <DIR>          .ipynb_checkpoints
20.04.2022  15:30           565.402 00-Series.ipynb
21.04.2022  22:03           208.969 01-DataFrames.ipynb
20.04.2022  15:14           194.591 02-Conditional-Filtering.ipynb
20.04.2022  15:14           196.385 03-Useful-Methods.ipynb
20.04.2022  15:14            64.227 04-Missing-Data.ipynb
20.04.2022  15:14           219.627 05-Groupby-Operations-and-MultiIndex.ipynb
20.04.2022  15:14            62.966 06-Combining-DataFrames.ipynb
20.04.2022  15:14            32.972 07-Text-Methods.ipynb
20.04.2022  15:14            93.392 08-Time-Methods.ipynb
20.04.2022  15:14            65.234 09-Inputs-and-Outputs.ipynb
20.04.2022  15:14           101.081 10-Pivot-Tables.ipynb
20.04.2022  15:14            47.135 11-Pandas-Project-Exercise .ipynb
20.04.2022  15:14        

In [5]:
df = pd.read_csv("tips.csv")

In [6]:
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17


## Obtaining basic information about DataFrame

In [30]:
df.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size',
       'price_per_person', 'Payer Name', 'CC Number', 'Payment ID'],
      dtype='object')

In [31]:
df.index

RangeIndex(start=0, stop=244, step=1)

In [32]:
df.head()  # returns first 5 rows unless spesified

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [33]:
df.tail() # returns last 5 rows unless spesified

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.0,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.0,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17
243,18.78,3.0,Female,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,Thur672


In [34]:
df.head(8)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
5,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882,Sun9679
6,8.77,2.0,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,2223727524230344,Sun5985
7,26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,3514785077705092,Sun8157


In [35]:
df.tail(3)

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
241,22.67,2.0,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17
243,18.78,3.0,Female,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,Thur672


In [37]:
df.info()  # df isimli DataFrame hakkındaki bilgileri özet bir şekilde verir.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   total_bill        244 non-null    float64
 1   tip               244 non-null    float64
 2   sex               244 non-null    object 
 3   smoker            244 non-null    object 
 4   day               244 non-null    object 
 5   time              244 non-null    object 
 6   size              244 non-null    int64  
 7   price_per_person  244 non-null    float64
 8   Payer Name        244 non-null    object 
 9   CC Number         244 non-null    int64  
 10  Payment ID        244 non-null    object 
dtypes: float64(3), int64(2), object(6)
memory usage: 21.1+ KB


In [38]:
len(df)

244

In [9]:
df.dtypes  # DataFrame içindeki her bir serinin veri türüne bakmak için

total_bill          float64
tip                 float64
sex                  object
smoker               object
day                  object
time                 object
size                  int64
price_per_person    float64
Payer Name           object
CC Number             int64
Payment ID           object
dtype: object

In [39]:
df.describe()  # describe() shows a quick statistic summary of your data

Unnamed: 0,total_bill,tip,size,price_per_person,CC Number
count,244.0,244.0,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672,7.888197,2563496000000000.0
std,8.902412,1.383638,0.9511,2.914234,2369340000000000.0
min,3.07,1.0,1.0,2.88,60406790000.0
25%,13.3475,2.0,2.0,5.8,30407310000000.0
50%,17.795,2.9,2.0,7.255,3525318000000000.0
75%,24.1275,3.5625,3.0,9.39,4553675000000000.0
max,50.81,10.0,6.0,20.27,6596454000000000.0


In [41]:
# to Transposing your data
df.describe().T  # veya df.describe().transpose() ile de aynı sonuca ulaşılabilir.

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.78594,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9510998,1.0,2.0,2.0,3.0,6.0
price_per_person,244.0,7.888197,2.914234,2.88,5.8,7.255,9.39,20.27
CC Number,244.0,2563496000000000.0,2369340000000000.0,60406790000.0,30407310000000.0,3525318000000000.0,4553675000000000.0,6596454000000000.0


In [12]:
# Sorting by an axis:

df.sort_index(axis=1, ascending=False)

Unnamed: 0,total_bill,tip,time,smoker,size,sex,price_per_person,day,Payment ID,Payer Name,CC Number
0,16.99,1.01,Dinner,No,2,Female,8.49,Sun,Sun2959,Christy Cunningham,3560325168603410
1,10.34,1.66,Dinner,No,3,Male,3.45,Sun,Sun4608,Douglas Tucker,4478071379779230
2,21.01,3.50,Dinner,No,3,Male,7.00,Sun,Sun4458,Travis Walters,6011812112971322
3,23.68,3.31,Dinner,No,2,Male,11.84,Sun,Sun5260,Nathaniel Harris,4676137647685994
4,24.59,3.61,Dinner,No,4,Female,6.15,Sun,Sun2251,Tonya Carter,4832732618637221
...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Dinner,No,3,Male,9.68,Sat,Sat2657,Michael Avila,5296068606052842
240,27.18,2.00,Dinner,Yes,2,Female,13.59,Sat,Sat1766,Monica Sanders,3506806155565404
241,22.67,2.00,Dinner,Yes,2,Male,11.34,Sat,Sat3880,Keith Wong,6011891618747196
242,17.82,1.75,Dinner,No,2,Male,8.91,Sat,Sat17,Dennis Dixon,4375220550950


In [14]:
df.sort_values(by=["tip"], ascending=False)  # DataFrame deki verileri bir veriye göre sıralama işlemini yapar

# Default olarak küçükten büyüğe sıralar. ascending=False olduğunda ise büyükten küçüğe doğru sıralar.

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
170,50.81,10.00,Male,Yes,Sat,Dinner,3,16.94,Gregory Clark,5473850968388236,Sat1954
212,48.33,9.00,Male,No,Sat,Dinner,4,12.08,Alex Williamson,676218815212,Sat4590
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139
141,34.30,6.70,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025
...,...,...,...,...,...,...,...,...,...,...,...
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,Sat5032
111,7.25,1.00,Female,No,Sat,Dinner,1,7.25,Terri Jones,3559221007826887,Sat4801
67,3.07,1.00,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455


## Selection and Indexing

## Columns

In [27]:
# Column selection

employee = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

In [28]:
df3 = pd.DataFrame(employee)

In [17]:
df3

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Delhi,Msc
1,Princi,24,Kanpur,MA
2,Gaurav,22,Allahabad,MCA
3,Anuj,32,Kannauj,Phd


In [19]:
# grab a single column

df3["Name"]

0       Jai
1    Princi
2    Gaurav
3      Anuj
Name: Name, dtype: object

In [26]:
type(df3["Name"])

pandas.core.series.Series

In [None]:
colname = ["Name", "Age"]
df3[colname]

In [75]:
# grab more than one column.
# ÖNEMLİ: Note that column names should written in another brackets
# To select multiple columns, use a list of column names within the selection brackets [].

df3[["Name", "Age", "Address", "Qualification"]]

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Delhi,Msc
1,Princi,24,Kanpur,MA
2,Gaurav,22,Allahabad,MCA
3,Anuj,32,Kannauj,Phd


In [83]:
# df3[1:3]  bu şekilde sadece row selection yapılabiliyor.

In [93]:
# One Row selection  (.DataFrame.loc() fonksiyonu ile)

df3.loc[2]

Name                Gaurav
Age                     22
Address          Allahabad
Qualification          MCA
Name: 2, dtype: object

In [24]:
# More than one row selection
# Note that indexes should written in another brackets

df3.loc[[0,2]]

Unnamed: 0,Name,Age,Address,Qualification
0,Jai,27,Delhi,Msc
2,Gaurav,22,Allahabad,MCA


In [25]:
# to filter specific rows from a DataFrame
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251
...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17


In [51]:
# to filter specific rows from a DataFrame

df[df["sex"] == "Male"]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
5,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882,Sun9679,18.623962
6,8.77,2.00,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,2223727524230344,Sun5985,22.805017
...,...,...,...,...,...,...,...,...,...,...,...,...
236,12.60,1.00,Male,Yes,Sat,Dinner,2,6.30,Matthew Myers,3543676378973965,Sat5032,7.936508
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,Sat2929,3.563814
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657,20.392697
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880,8.822232


In [60]:
df[df["tip"] > 5]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
23,39.42,7.58,Male,No,Sat,Dinner,4,9.86,Lance Peterson,3542584061609808,Sat239,19.228818
44,30.4,5.6,Male,No,Sun,Dinner,4,7.6,Todd Cooper,503846761263,Sun2274,18.421053
47,32.4,6.0,Male,No,Sun,Dinner,4,8.1,James Barnes,3552002592874186,Sun9677,18.518519
52,34.81,5.2,Female,No,Sun,Dinner,4,8.7,Emily Daniel,4291280793094374,Sun6165,14.938236
59,48.27,6.73,Male,No,Sat,Dinner,4,12.07,Brian Ortiz,6596453823950595,Sat8139,13.942407
85,34.83,5.17,Female,No,Thur,Lunch,4,8.71,Shawna Cook,6011787464177340,Thur7972,14.843526
88,24.71,5.85,Male,No,Thur,Lunch,2,12.36,Roger Taylor,4410248629955,Thur9003,23.674626
116,29.93,5.07,Male,No,Sun,Dinner,4,7.48,Shawn Blake,4689079711213722,Sun22,16.939526
141,34.3,6.7,Male,No,Thur,Lunch,6,5.72,Steven Carlson,3526515703718508,Thur1025,19.533528
155,29.85,5.14,Female,No,Sun,Dinner,5,5.97,Madison Wilson,4210875236164664,Sun9176,17.21943


In [61]:
df[df["day"].isin(["Sun", "Sat"])]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765
...,...,...,...,...,...,...,...,...,...,...,...,...
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,Sat9777,13.033771
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657,20.392697
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766,7.358352
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880,8.822232


### When combining multiple conditional statements, each condition must be surrounded by parentheses (). Moreover, you can not use or/and but need to use the or operator | and the and operator &.

In [63]:
df[(df["day"] == "Sun") | (df["day"] == "Thur")]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765
...,...,...,...,...,...,...,...,...,...,...,...,...
202,13.00,2.00,Female,Yes,Thur,Lunch,2,6.50,Ashley Shaw,180088043008041,Thur1301,15.384615
203,16.40,2.50,Female,Yes,Thur,Lunch,2,8.20,Toni Brooks,3582289985920239,Thur7770,15.243902
204,20.53,4.00,Male,Yes,Thur,Lunch,4,5.13,Scott Kim,3570611756827620,Thur2160,19.483682
205,16.47,3.23,Female,Yes,Thur,Lunch,3,5.49,Carly Reyes,4787787236486,Thur8084,19.611415


In [64]:
# The notna() conditional function returns a True for each row the values are not an Null value.

df[df["Payment ID"].notna()]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765
...,...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657,20.392697
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766,7.358352
241,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,Sat3880,8.822232
242,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,Sat17,9.820426


In [29]:
df_index = ["Row1", "Row2", "Row3", "Row4"]
df4 = pd.DataFrame(data=employee, index=df_index)

In [30]:
df4

Unnamed: 0,Name,Age,Address,Qualification
Row1,Jai,27,Delhi,Msc
Row2,Princi,24,Kanpur,MA
Row3,Gaurav,22,Allahabad,MCA
Row4,Anuj,32,Kannauj,Phd


In [110]:
df4.loc["Row4"]

Name                Anuj
Age                   32
Address          Kannauj
Qualification        Phd
Name: Row4, dtype: object

## How do I select specific rows and columns from a DataFrame

In [None]:
# loc/iloc operators are used to select specific rows and columns.
# loc is used When using the column names, row labels or a condition expression.
# iloc is used when specifically interested in certain rows and/or columns based on their position.
# When using loc/iloc, the part before the comma is the rows you want, and the part after the comma is the columns you want to select.

# Ör: titanic.loc[titanic["Age"] > 35, "Name"]
# Ör: titanic.iloc[9:25, 2:5]

In [65]:
df.loc[df["total_bill"] > 20]

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
2,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765
5,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882,Sun9679,18.623962
7,26.88,3.12,Male,No,Sun,Dinner,4,6.72,Robert Buck,3514785077705092,Sun8157,11.607143
...,...,...,...,...,...,...,...,...,...,...,...,...
237,32.83,1.17,Male,Yes,Sat,Dinner,2,16.42,Thomas Brown,4284722681265508,Sat2929,3.563814
238,35.83,4.67,Female,No,Sat,Dinner,3,11.94,Kimberly Crane,676184013727,Sat9777,13.033771
239,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,Sat2657,20.392697
240,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,Sat1766,7.358352


In [66]:
df.iloc[5:10, 6:]

Unnamed: 0,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
5,4,6.32,Erik Smith,213140353657882,Sun9679,18.623962
6,2,4.38,Kristopher Johnson,2223727524230344,Sun5985,22.805017
7,4,6.72,Robert Buck,3514785077705092,Sun8157,11.607143
8,2,7.52,Joseph Mcdonald,3522866365840377,Sun6820,13.031915
9,2,7.39,Jerome Abbott,3532124519049786,Sun3775,21.853857


In [67]:
# df[5:10,6:] arraylerdeki gibi slicelama olmuyor.Hata veriyor.

TypeError: '(slice(5, 10, None), slice(6, None, None))' is an invalid key

In [68]:
df.iloc[8:12, 6]=1

In [69]:
df.iloc[8:12, 6:]

Unnamed: 0,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
8,1,7.52,Joseph Mcdonald,3522866365840377,Sun6820,13.031915
9,1,7.39,Jerome Abbott,3532124519049786,Sun3775,21.853857
10,1,5.14,William Riley,566287581219,Sun2546,16.650438
11,1,8.82,Diane Macias,4577817359320969,Sun6686,14.180374


### Index Basics

In [70]:
df.index

RangeIndex(start=0, stop=244, step=1)

In [74]:
df.set_index('Payment ID')  # index değerlerini dataFrame deki başka bir column ile değiştirmek için .set_index()

# bu değişikliği kalıcı yapmak istiyorsak yeni bir değişkene atamalıyız.

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.944673
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.054159
Sun4458,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,16.658734
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978041
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,14.680765
...,...,...,...,...,...,...,...,...,...,...,...
Sat2657,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,20.392697
Sat1766,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,7.358352
Sat3880,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,8.822232
Sat17,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,9.820426


In [None]:
df_paymentID_index = df.set_index('Payment ID') 

In [75]:
df_paymentID_index 

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.944673
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.054159
Sun4458,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,16.658734
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978041
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,14.680765
...,...,...,...,...,...,...,...,...,...,...,...
Sat2657,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,20.392697
Sat1766,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,7.358352
Sat3880,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,8.822232
Sat17,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,9.820426


In [72]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765


In [77]:
df_new = df_paymentID_index.reset_index()  # reset index to the previous values
df_new

# DataFrame.set_index : Opposite of reset_index.
# DataFrame.reindex : Change to new indices or expand indices.
# DataFrame.reindex_like : Change to same indices as other DataFrame.


Unnamed: 0,Payment ID,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
0,Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.944673
1,Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.054159
2,Sun4458,21.01,3.50,Male,No,Sun,Dinner,3,7.00,Travis Walters,6011812112971322,16.658734
3,Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978041
4,Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,14.680765
...,...,...,...,...,...,...,...,...,...,...,...,...
239,Sat2657,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,20.392697
240,Sat1766,27.18,2.00,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,7.358352
241,Sat3880,22.67,2.00,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,8.822232
242,Sat17,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,9.820426


### create new columns

In [35]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [46]:
df['tip_percentage'] = 100 * df["tip"]/df["total_bill"]

In [47]:
type(df['tip_percentage'])

pandas.core.series.Series

In [48]:
df['tip_percentage'].shape

(244,)

In [37]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765


In [38]:
df['price_per_person'] = df['total_bill'] / df['size']

In [None]:
df.head()

### Adjust existing columns

In [40]:
# Ör: round() function can be used for Series in DataFrames

df["price_per_person"] = np.round(df["price_per_person"], 2)  # round(array/column, decimal=number)

In [41]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765


### Remove columns

In [None]:
# df.drop(column/array name, axis=number) fonksiyonu ile DataFrame'den bir column silinebilir. 
# Ancak bu işlem geçici olarak gerçekleşir. Bunu kalıcı hale getirmek için iki yol vardır.
# 1. inplace=True parametresini kullanmak, df.drop("tip_percentage", axis = 1, inplace=True)
# 2. yapılan değişikliği yeniden değişkene atamak. Bu yöntem en çok tercih ve tavsiye edilen yöntemdir. 

In [43]:
df = df.drop("tip_percentage", axis = 1)

In [44]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251


In [None]:
df.shape  # satır ve sütun sayısını verir

In [None]:
df.shape[0]  #satır sayısını verir.

In [None]:
df.shape[1]  # sütun sayısını verir.

## Rows

In [None]:
# index column primary key yani her satır için unique veri içeren identifier bir column olmalıdır.

In [86]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,tip_percentage
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,5.944673
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,16.054159
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,16.658734
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,13.978041
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,14.680765


In [87]:
# df.set_index('Payment ID')   eğer bir değişkene atamadan bu işlemi yaparsanız geçici bir değişiklik yapmış olursunuz.

# Bu değişikliği kalıcı yapmak için bir değişkene(dataframe'in kendisine) atama yapmalısınız.

df = df.set_index('Payment ID')

In [88]:
df.head()

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.944673
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.054159
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,16.658734
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978041
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,14.680765


In [89]:
# Grab a single row(integer based)

df.iloc[0]

total_bill                       16.99
tip                               1.01
sex                             Female
smoker                              No
day                                Sun
time                            Dinner
size                                 2
price_per_person                  8.49
Payer Name          Christy Cunningham
CC Number             3560325168603410
tip_percentage                5.944673
Name: Sun2959, dtype: object

In [90]:
# Grab a single row(name based)

df.loc["Sun2959"]

total_bill                       16.99
tip                               1.01
sex                             Female
smoker                              No
day                                Sun
time                            Dinner
size                                 2
price_per_person                  8.49
Payer Name          Christy Cunningham
CC Number             3560325168603410
tip_percentage                5.944673
Name: Sun2959, dtype: object

In [92]:
# Grab multiple rows(integer based)

df.iloc[0:4]

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.944673
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.054159
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,16.658734
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978041


In [None]:
# Grab multiple rows(name based)

df.loc[['Sun2959','Sun5260']]

In [None]:
# Remove row

In [94]:
df.head()

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.944673
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.054159
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,16.658734
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978041
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,14.680765


In [97]:
df.drop('Sun2959',axis=0).head()

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sun4608,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,16.054159
Sun4458,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,16.658734
Sun5260,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,13.978041
Sun2251,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,14.680765
Sun9679,25.29,4.71,Male,No,Sun,Dinner,4,6.32,Erik Smith,213140353657882,18.623962


In [99]:
# Error if you have a named index!
# df.drop(0,axis=0).head()

In [100]:
# insert/append a row

one_row = df.iloc[0]

In [None]:
one_row

In [101]:
type(one_row)

pandas.core.series.Series

In [102]:
df.tail()

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sat2657,29.03,5.92,Male,No,Sat,Dinner,3,9.68,Michael Avila,5296068606052842,20.392697
Sat1766,27.18,2.0,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,7.358352
Sat3880,22.67,2.0,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,8.822232
Sat17,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,9.820426
Thur672,18.78,3.0,Female,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,15.974441


In [103]:
df.append(one_row).tail()  # DataFrame'in sonuna yeni bir satır ekler. Eğer eklenen satırdaki columnlar DataFrame de yoksa bu columnları da ekler.

Unnamed: 0_level_0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,tip_percentage
Payment ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Sat1766,27.18,2.0,Female,Yes,Sat,Dinner,2,13.59,Monica Sanders,3506806155565404,7.358352
Sat3880,22.67,2.0,Male,Yes,Sat,Dinner,2,11.34,Keith Wong,6011891618747196,8.822232
Sat17,17.82,1.75,Male,No,Sat,Dinner,2,8.91,Dennis Dixon,4375220550950,9.820426
Thur672,18.78,3.0,Female,No,Thur,Dinner,2,9.39,Michelle Hardin,3511451626698139,15.974441
Sun2959,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,5.944673


In [None]:
df.shape

In [None]:
df.size

In [None]:
df.ndim