### Manipulating Data Frames with Pandas
* Indexing data frames
* Slicing data frames
* Filtering data frames
* Transforming data frames
* Index objects and labeled data
*  Hierarchical indexing
* Pivoting data frames
* Stacking and unstacking data frames
* Melting data frames
* Categoricals and groupby

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In [None]:
data = pd.read_csv('../input/2016.csv')
data.head()

### INDEXING DATA FRAMES
* Indexing using square brackets
* Using column attribute and row label
* Using loc accessor
* Selecting only some columns

In [None]:
data = data.set_index('Happiness Rank')
data.head()

Indexing using square brackets. We can access a value of what we entered an index

In [None]:
data['Happiness Score'][1]

There is a different way to use column attributes and row label.

In [None]:
data.Region[10]

So when we want to use ' Loc ' accessor :

In [None]:
data.loc[1,['Region']]

When we want to focus especially some columns, we can use this term :

In [None]:
data[['Region', 'Happiness Score']]

### SLICING DATA FRAME
* Difference between selecting columns
    * Series and data frames
* Slicing and indexing series
* Reverse slicing 
* From something to end

Difference between selecting columns: series and dataframes

In [None]:
print(type(data["Region"]))     # series
print(type(data[["Region"]]))   # data frames

Slicing and indexing series : 

In [None]:
data.loc[1:10,"Region":"Freedom"] 

There is the reverse version of slicing : 

In [None]:
data.loc[10:1:-1,"Region":"Freedom"] 

In [None]:
data.loc[1:10,"Trust (Government Corruption)":] #From a column to end

### FILTERING DATA FRAMES
Creating boolean series
Combining filters
Filtering column based others

In [None]:
boolean = data['Happiness Score'] > 7.000
data[boolean]

Combining two different filter.

In [None]:
boolean1 = data['Region'] == 'Western Europe'
boolean2 = data['Freedom'] < 0.5
data[boolean1 & boolean2]

Filtering column based antoher column.

In [None]:
data.Region[data.Generosity < 0.1]

### TRANSFORMING DATA
* Plain python functions
* Lambda function: to apply arbitrary python function to every element
* Defining column using other columns

In [None]:
# Plain python functions
def div(n):
    return n/2
data['Happiness Score'].apply(div)

In [None]:
#lambda function
data['Generosity'].apply(lambda n : n/2)

In [None]:
# Defining column using other columns
data["Confidence"] = (data['Lower Confidence Interval'] + data['Upper Confidence Interval']) / 2
data.head()

### INDEX OBJECTS AND LABELED DATA
index: sequence of label

In [None]:
# our index name is this:
print(data.index.name)
# lets change it
data.index.name = "index_name"
data.head()

In [None]:
# Overwrite index
# if we want to modify index we need to change all of them.
data.head()
# first copy of our data to data3 then change index 
data1 = data.copy()
# lets make index start from 100. It is not remarkable change but it is just example
data1.index = range(100,257,1)
data1.head()

### HIERARCHICAL INDEXING
* Setting indexing

In [None]:
# lets read data frame one more time to start from beginning
data = pd.read_csv('../input/2016.csv')
data.head()
# As you can see there is index. However we want to set one or more column to be index

In [None]:
# Setting index : type 1 is outer type 2 is inner index
data1 = data.set_index(["Happiness Score","Family"]) 
data1.head(100)
# data1.loc["Fire","Flying"] # howw to use indexes


PIVOTING DATA FRAMES

 * Pivoting: reshape tool



In [None]:
dic = {"cure":["A","A","B","B"],"gender":["F","M","F","M"],"response to cure":[10,45,5,9],"age":[23,18,53,49]}
df = pd.DataFrame(dic)
df

In [None]:
# pivoting
df.pivot(index="cure",columns = "gender",values="response to cure")

### STACKING and UNSTACKING DATAFRAME
* deal with multi label indexes
* level: position of unstacked index
* swaplevel: change inner and outer level index position

In [None]:
df1 = df.set_index(["cure","gender"])
df1
# lets unstack it

In [None]:
df1.unstack(level=0)

In [None]:
df1.unstack(level=1)

In [None]:
# change inner and outer level index position
df2 = df1.swaplevel(0,1)
df2

### MELTING DATA FRAMES
* Reverse of pivoting

In [None]:
df

In [None]:
# df.pivot(index="cure",columns = "gender",values="response to cure")
pd.melt(df,id_vars="cure",value_vars=["age","response to cure"])

<a id="42"></a> <br>
### CATEGORICALS AND GROUPBY

In [None]:
df

In [None]:
# according to cure take means of other features
df.groupby("cure").mean()   # mean is aggregation / reduction method
# there are other methods like sum, std,max or min

We can only choose one of the feature.

In [None]:
df.groupby("cure").age.max() 

Or we can choose multiple features.

In [None]:
df.groupby("cure")[["age","response to cure"]].min() 

In [None]:
df.info()
# as you can see gender is object
# However if we use groupby, we can convert it categorical data. 
# Because categorical data uses less memory, speed up operations like groupby
df["gender"] = df["gender"].astype("category")
df["cure"] = df["cure"].astype("category")
df.info()