___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

<img src=https://i.ibb.co/6gCsHd6/1200px-Pandas-logo-svg.png width="700" height="200">

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:200%; text-align:center; border-radius:10px 10px;">Data Analysis with Python</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:150%; text-align:center; border-radius:10px 10px;">Session - 04</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#4d77cf; font-size:200%; text-align:center; border-radius:10px 10px;">Pandas DataFrames</p>

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Content</p>

* [IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK](#0)
* [DATA FRAMES](#1)
* [CREATING A DATA FRAME](#2)
    * [Creating a DataFrame Using the Lists of Data & Columns](#2.1)
    * [Creating a DataFrame Using a Numpy Arrays](#2.2)
    * [Creating a DataFrame Using a Dictionary](#2.3)
    * [The Examination of Some Attributes on Data](#2.4)
* [INDEXING, SLICING & SELECTION](#3)    
* [CREATING A NEW COLUMN](#4)    
* [REMOVING COLUMNS](#5)
* [REMOVING ROWS](#6)
* [SELECTING ROWS & COLUMNS USING .loc[ ] & .iloc[ ] ](#7)
* [CONDITIONAL SELECTION](#8)
    * [One Conditional Statement](#8.1)
    * [Two or More Conditional Statements](#8.2)
    * [Conditional Selection Using .loc[ ]](#8.3)
* [reset_index() & set_index()](#9)
* [Multi-Index & Index Hierarchy](#10)
* [Some Other Useful Methods with Iris Dataset](#11)
* [THE END OF THE SESSION-04](#12)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Importing Libraries Needed in This Notebook</p>

<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Once you've installed NumPy & Pandas you can import them as a library:

In [1]:
import numpy as np
import pandas as pd

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Data Frames</p>

<a id="1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A DataFrame is a two-dimensional data container, similar to a Matrix, but which can contain heterogeneous data, and for which symbolic names may be associated with the rows and columns. ``DataFrames`` are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. 

### Why use Pandas?

Data scientists make use of Pandas in Python for its **following advantages**:

- Easily handles missing data
- It uses Series for one-dimensional data structure and DataFrame for multi-dimensional data structure
- It provides an efficient way to slice the data
- It provides a flexible way to merge, concatenate or reshape the data
- It includes a powerful time series tool to work with

In a nutshell, Pandas is a useful library in data analysis. It can be used to perform data manipulation and analysis. Pandas provide powerful and easy-to-use data structures, as well as the means to quickly perform operations on these structures.

[SOURCE01](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html), 
[SOURCE02](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), 
[SOURCE03](https://morioh.com/p/2528ac775b1b), 
[SOURCE04](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python), 
[SOURCE05](https://www.guru99.com/python-pandas-tutorial.html), 
[SOURCE06](https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm), 
[SOURCE07](https://realpython.com/pandas-dataframe/) &
[SOURCE08](https://towardsdatascience.com/a-simple-guide-to-pandas-dataframes-b125f64e1453)<br>
[VIDEO SOURCE01](https://www.youtube.com/watch?v=zmdjNSmRXF4), 
[VIDEO SOURCE02](https://www.youtube.com/watch?v=F6kmIpWWEdU) &
[VIDEO SOURCE03](https://towardsdatascience.com/pandas-dataframe-basics-3c16eb35c4f3)<br>

**Now let's use pandas to explore this topic!**

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating a DataFrame</p>

<a id="2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A **``DataFrame``** is a **two-dimension collection of data**. It is a data structure where data is stored in **tabular form**. Datasets are arranged in rows and columns; we can store multiple datasets in the data frame. We can perform various arithmetic operations, such as adding column/row selection and columns/rows in the data frame.

We can import the DataFrames from the external storage; these storages can be referred to as the SQL Database, CSV file, and an Excel file. We can also use the lists, dictionary, and from a list of dictionary, etc.

In this session, we will learn to create the DataFrame in multiple ways. Let's understand these different ways.

**``pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)``**

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using the Lists of Data & Columns</p>

<a id="2.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [2]:
data = [1, 3, 5, 7, 9]

In [3]:
pd.DataFrame(data)

Unnamed: 0,0
0,1
1,3
2,5
3,7
4,9


In [4]:
pd.Series(data)

0    1
1    3
2    5
3    7
4    9
dtype: int64

In [5]:
pd.DataFrame(data=data, columns=["colum1"])

Unnamed: 0,colum1
0,1
1,3
2,5
3,7
4,9


In [6]:
np.array(data)

array([1, 3, 5, 7, 9])

In [7]:
np.array([data])

array([[1, 3, 5, 7, 9]])

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using a Numpy Arrays</p>

<a id="2.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [8]:
data = np.arange(1, 24, 2).reshape(3, 4)
data

array([[ 1,  3,  5,  7],
       [ 9, 11, 13, 15],
       [17, 19, 21, 23]])

In [9]:
df = pd.DataFrame(data=data, columns=["var1","var2","var3","var4"])
df

Unnamed: 0,var1,var2,var3,var4
0,1,3,5,7
1,9,11,13,15
2,17,19,21,23


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using a Dictionary</p>

<a id="2.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [10]:
s1 = np.random.randint(2, 10, size = 4)
s2 = np.random.randint(3, 10, size = 4)
s3 = np.random.randint(4, 15, size = 4)
s1, s2, s3

(array([8, 2, 9, 2]), array([7, 4, 9, 7]), array([11, 11, 10, 12]))

In [11]:
mydict = {"var1": s1, "var2": s2, "var3": s3}
mydict

{'var1': array([8, 2, 9, 2]),
 'var2': array([7, 4, 9, 7]),
 'var3': array([11, 11, 10, 12])}

In [12]:
df=pd.DataFrame(mydict)
df

Unnamed: 0,var1,var2,var3
0,8,7,11
1,2,4,11
2,9,9,10
3,2,7,12


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">The Examination of Some Attributes on Data</p>

<a id="2.4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [13]:
df

Unnamed: 0,var1,var2,var3
0,8,7,11
1,2,4,11
2,9,9,10
3,2,7,12


In [14]:
df.head(2)

Unnamed: 0,var1,var2,var3
0,8,7,11
1,2,4,11


In [15]:
df.tail(2)

Unnamed: 0,var1,var2,var3
2,9,9,10
3,2,7,12


In [16]:
df.sample(2)

Unnamed: 0,var1,var2,var3
2,9,9,10
0,8,7,11


In [17]:
df.columns

Index(['var1', 'var2', 'var3'], dtype='object')

In [18]:
df.mean

<bound method NDFrame._add_numeric_operations.<locals>.mean of    var1  var2  var3
0     8     7    11
1     2     4    11
2     9     9    10
3     2     7    12>

In [19]:
df.index

RangeIndex(start=0, stop=4, step=1)

In [20]:
list(df.index)

[0, 1, 2, 3]

In [21]:
df.columns = ["new1","new2","new3"]
df

Unnamed: 0,new1,new2,new3
0,8,7,11
1,2,4,11
2,9,9,10
3,2,7,12


In [22]:
df.index = ["a","b","c", "d"]
df

Unnamed: 0,new1,new2,new3
a,8,7,11
b,2,4,11
c,9,9,10
d,2,7,12


In [23]:
df.rename(columns = {"new1": "aaa", "new2":"bbb"}) 
# değişiklik kalıcı değil
# kalıcı olması için inplace True yapılır.

Unnamed: 0,aaa,bbb,new3
a,8,7,11
b,2,4,11
c,9,9,10
d,2,7,12


In [24]:
df

Unnamed: 0,new1,new2,new3
a,8,7,11
b,2,4,11
c,9,9,10
d,2,7,12


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Indexing, Slicing & Selection</p>

<a id="3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let's learn a variety of methods to grab data from a DataFrame

In [25]:
df.shape

(4, 3)

In [26]:
df.shape[1]

3

In [27]:
df.size # eleman sayısı 4x3

12

In [28]:
df.ndim # boyut sayısı

2

In [29]:
df.values

array([[ 8,  7, 11],
       [ 2,  4, 11],
       [ 9,  9, 10],
       [ 2,  7, 12]])

In [30]:
df.count()

new1    4
new2    4
new3    4
dtype: int64

In [31]:
type(df["new1"])

pandas.core.series.Series

In [32]:
"new2" in df

True

In [33]:
"new5" in df

False

In [34]:
from numpy.random import randn

In [35]:
np.random.seed(101)
df = pd.DataFrame(randn(5, 4), index = 'A B C D E'.split(), columns = 'W X Y Z'.split())
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [36]:
'A B C D E'.split()

['A', 'B', 'C', 'D', 'E']

In [37]:
df["Y"] # sütunu getirir. seri şeklinde getirir

A    0.907969
B   -0.848077
C    0.528813
D   -0.933237
E    2.605967
Name: Y, dtype: float64

In [38]:
df.Y # sık kullnılmaz bazı durumlarda çalışmaz.

A    0.907969
B   -0.848077
C    0.528813
D   -0.933237
E    2.605967
Name: Y, dtype: float64

In [39]:
df[["Y"]] # sütunu getirir. df şeklinde getirir

Unnamed: 0,Y
A,0.907969
B,-0.848077
C,0.528813
D,-0.933237
E,2.605967


In [40]:
df["X", "Y"] # hata verir

KeyError: ('X', 'Y')

In [None]:
df[["X", "Y"]] # iki boyutlu yazmak lazım

In [None]:
df["W":"Y"] # indexlerde arar bulamadığı için boş döndürür.
# slice işlemini satırlar üzerinde yapar.

In [None]:
df["A":"C"] # A da C ye satırları verir.

In [None]:
df["A", "B"] # hata veriyor liste içinde verilmeli

In [None]:
df["A": "C"][["Y", "W"]] # A dan C ye Y ve W sütünlarını verir.

In [None]:
df[["Y", "W"]]["A": "C"]

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating a New Column</p>

<a id="4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
df

In [None]:
df["new1"] = df["W"] * df["X"] 
# yeni sütün ekleme satır sayısı eşit olmalı
df

In [None]:
df["new2"] = np.arange(5)
df

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Removing Columns</p>

<a id="5"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [None]:
df.drop("new2", axis=1) # kalıcı değil

In [None]:
df

In [None]:
df.drop(["new1", "new2"], axis=1)

In [None]:
df.drop(["new1", "new2"]) # axis verilmeyince hata

In [None]:
df.drop(columns=["new1", "new2"])

In [None]:
df

In [None]:
df.drop(columns=["new1", "new2"])

In [None]:
df.drop("C", axis=0)

In [None]:
df.drop(index="B")

In [None]:
df.drop("D")

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Removing Rows</p>

<a id="6"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Selecting Rows and Columns using .loc[ ] and iloc[ ]</p>

<a id="7"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

#### `.loc[]` → allows us to select data using **labels** (names) of rows (index) & columns

#### `.iloc[]` → allows us to select data using **index numbers** of rows (index) & columns. it's like classical indexing logic

In [None]:
data = np.random.randint(1, 40, size=(8, 4))
df = pd.DataFrame(data, columns = ["var1", "var2", "var3", 'var4'])
df

In [None]:
df.loc[4] # satırı verir. loc[satır, sütün]

In [None]:
df.loc[[4]]

In [None]:
df.loc[2:5] # dikkat loc stop dahil getirir.

In [None]:
df.iloc[2:5] # dikkat iloc stopu dahil etmiyor.

In [None]:
df.index = 'a b c d e f g h'.split()
df

In [None]:
df.iloc[1:4] 
# görünen indexe (label a) bakmaz arkadaki 0 1 2 ... rakamlara bakar

In [None]:
df.loc[1:4] 
# hata verir çünkü loc görünen indese (labela) bakar. 1 ve 4 ü bulamadı


In [None]:
df

In [41]:
df.loc["d", "var3"]

KeyError: 'd'

In [None]:
df.iloc[3, 2] # iloc da satır ve sütün sayısını yazmamız lazım.

In [None]:
df.loc["d":"g", "var2"]

In [None]:
df.loc["d":"g"]["var2"]

In [None]:
df.loc["d":"g"][["var2"]]

In [None]:
df.loc["d":"g"][["var2", "var3"]]

In [None]:
df.iloc[2:5]["var2"] # hata verir

In [None]:
df.iloc[2:5][["var2"]]

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Conditional Selection</p>

<a id="8"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [44]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">One Conditional Statement</p>

<a id="8.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Two or More Conditional Statements</p>

<a id="8.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

**For two or more conditions, you can use | → or, & → and with parenthesis:**

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Conditional Selection Using .loc[ ] and .iloc[ ]</p>

<a id="8.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">reset_index() & set_index()</p>

<a id="9"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Multi-Index & Index Hierarchy</p>

<a id="10"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let us go over how to work with Multi-Index, first we'll create a quick example of what a Multi-Indexed DataFrame would look like:

**``Note``** that all of the MultiIndex constructors accept a names argument which stores string names for the levels themselves. If no names are provided, None will be assigned:

For more information Indexing and Selecting Data, visit [**Pandas Official Documentation**](https://pandas.pydata.org/pandas-docs/version/0.13.0/indexing.html)

Now let's show how to index this! For index hierarchy we use ``df.loc[]``, if this was on the columns axis, you would just use normal bracket notation ``df[]``. Calling one level of the index returns the sub-dataframe:

More information for Multiindex and Advanced Indexing, visit [**Pandas Official Documentation**](https://pandas.pydata.org/docs/user_guide/advanced.html)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Some Other Useful Methods with Iris Dataset</p>

<a id="11"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

### Let's apply functions/attributes/methods we have learnt for "iris dataset" 

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:150%; text-align:center; border-radius:10px 10px;">The End of The Session - 04</p>

<a id="12"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

________