Process of cleaning messy data is called **data munging or data wrangling** 

# 1. Dataframe Basics

<div class="alert alert-block alert-info">
<b>Dataframe:</b> A main object in Pandas. It used to present data
                  with rows and columns(tabular or excel spreadsheet)
</div>

**1.1 Creating dataframe**

In [1]:
import pandas as pd
#read data from file
df = pd.read_csv('weather_data.csv')
df
#create datafram using dictionaries:
#weather_data = {
#    'day': ['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/5/2017','1/6/2017'],
#    'temperature': [32,35,28,24,32,31],
#    'windspeed': [6,7,2,7,4,2],
#    'event': ['Rain', 'Sunny', 'Snow','Snow','Rain', 'Sunny']
#}
# pd.DataFrame(weather_data)


Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


**1.2 Dealing with rows and columns**

In [4]:
df.shape
#shape means dimesions of dataframe so (6,4) means dataframe has 6 rows and 4 columns

(6, 4)

In [8]:
df.head(3)
#Printing 3 first rows, default is 5 

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow


In [7]:
df.tail(3)
#Printing last 3 rows of dataframe default is 5

Unnamed: 0,day,temperature,windspeed,event
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [9]:
df[2:5]

Unnamed: 0,day,temperature,windspeed,event
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain


In [10]:
df.columns

Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')

In [11]:
df.temperature

0    32
1    35
2    28
3    24
4    32
5    31
Name: temperature, dtype: int64

In [12]:
df['event']

0     Rain
1    Sunny
2     Snow
3     Snow
4     Rain
5    Sunny
Name: event, dtype: object

In [13]:
type(df.event)
#Column in dataframe are basically of type pandas series

pandas.core.series.Series

In [15]:
df[['event','day','temperature']]

Unnamed: 0,event,day,temperature
0,Rain,1/1/2017,32
1,Sunny,1/2/2017,35
2,Snow,1/3/2017,28
3,Snow,1/4/2017,24
4,Rain,1/5/2017,32
5,Sunny,1/6/2017,31


**1.3 Operation**

In [17]:
df.temperature

0    32
1    35
2    28
3    24
4    32
5    31
Name: temperature, dtype: int64

In [18]:
df['temperature'].max()

35

In [19]:
df['temperature'].mean()

30.333333333333332

In [20]:
df['temperature'].std()
# 'std' mean standard deviation "Độ lệch chuẩn"

3.8297084310253524

In [21]:
df.describe()

Unnamed: 0,temperature,windspeed
count,6.0,6.0
mean,30.333333,4.666667
std,3.829708,2.33809
min,24.0,2.0
25%,28.75,2.5
50%,31.5,5.0
75%,32.0,6.75
max,35.0,7.0


**1.4 Conditional Selection**

In [26]:
df[df.temperature >=32]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
4,1/5/2017,32,4,Rain


In [27]:
df[df.event == 'Rain']

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
4,1/5/2017,32,4,Rain


In [31]:
df[['day','temperature']][df.event == 'Rain']

Unnamed: 0,day,temperature
0,1/1/2017,32
4,1/5/2017,32


**1.5 Set Index**

In [3]:
df.index

RangeIndex(start=0, stop=6, step=1)

In [4]:
df.set_index('day', inplace=True)
df

#because method 'set_index' return a new dataframe

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/2017,32,6,Rain
1/2/2017,35,7,Sunny
1/3/2017,28,2,Snow
1/4/2017,24,7,Snow
1/5/2017,32,4,Rain
1/6/2017,31,2,Sunny


In [5]:
df.loc['1/4/2017']

temperature      24
windspeed         7
event          Snow
Name: 1/4/2017, dtype: object

In [6]:
df.reset_index(inplace = True)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [7]:
df.set_index('event', inplace= True)
df

Unnamed: 0_level_0,day,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Rain,1/1/2017,32,6
Sunny,1/2/2017,35,7
Snow,1/3/2017,28,2
Snow,1/4/2017,24,7
Rain,1/5/2017,32,4
Sunny,1/6/2017,31,2


In [8]:
df.loc['Snow']

Unnamed: 0_level_0,day,temperature,windspeed
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Snow,1/3/2017,28,2
Snow,1/4/2017,24,7


<div class="alert alert-block alert-warning">
<b>Success Dataframe Basics</b>
</div>